Welcome to PCAtag 2.1
To be able to comprehensively test the role
of candidate genes in association studies the selection of
informative SNPs is paramount.
it is important to select tagging-SNPs (tSNPs) that represent a
large portion (>90%) of the genetic variation of a gene.
Here we describe a new software tool, PCAtag
that performs tSNP selection using principal component analysis
described in Horne and Camp (2004). The
advantage of PCA analysis for tSNP selection is that LD groups do
not need to be contiguous and can be overlapping. This flexible
framework does not impose over-simplified assumptions on the genetic
architecture structure, and likely fits reality much
Algorithms used by PCAtag
Bayesian method for reconstructing haplotypes is
used by interfacing with the software
(Stephens et al 2006).
- Principal Component Analysis (PCA)
using a varimax rotation is
performed by interfacing with
the FactoMineR add-on package available in
R. R is a language and enviroment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3090051-07-0
- Procedure for determining LD groups and tSNP selection
follow from the two step PCA method outlined in Horne
and Camp (2004) into multi-step PCA.
- The majority of tagging methods use an input of genotype data and phase the data as part of the process of selecting tagging SNPs. For example, many methods are based on pairwise allelic r2. This r2 is a measure of allelic correlation, i.e. the co-occurrence of alleles on a haplotype. Its calculation involves what amounts to phasing the pairwise genotype data to haplotype data before calculating the allelic correlation which are then used to identify tSNPs.
- Our genotype option completely omits the phasing stage and instead uses the correlations between the genotype calls themselves, i.e. the PCA analysis is performed directly on the genotype calls.
Allele frequencies, haplotype
frequencies and LD structure may differ between cases and
- If phenotype data (or any
subset criteria) is entered, tagging will be performed
in the cases and controls separately, as well as
- Knowledge of
such difference at tSNP stage
will allow for more powerful subsequent association analyses.
Last update May 25, 2010