.rgen Parameter File Description

This is an XML file and it uses a DTD file, called ge-rgen.dtd to describe all of the data for the analysis. All elements in this file starts with "ge:". This parameter file has a root element rgen, and a number of sub-elements and attributes. It can be described in two parts. The first part of the file is for setting up analysis parameters and the second part of the file defines the inheritance models to be analyzed.

Analysis Parameters (First Part)

The following table describes all of the required attributes and their values for root element rgen. All attribute values should be enclosed in " ".

Attribute

Att Value

Description

rseed number Random number generator seed value. Specify rseed="random" to have program randomly generate a seed value.
nsims number Number of simulations
top classname The program for generating simulated alleles or haplotypes for all of the top founders. Currently available: AlleleFreqTopSim, HapFreqTopSim, HapMCTopSeparate, HapMCTopTogether, GeneCounterTopSim, IndivWtTopSim and Conditional Gene Drop.
drop classname The program for generating alleles or haplotypes based on the parent's simulated genetic information. Currently available: DropSim, HapMCDropSeparate, HapMCDropTogether and IndivWtDropSim.
report classname Report options; default is standard report(rgen_filename.report) with full tables and detail output. Specify report="summary" for an Ascii space-delimited file (rgen_filename.summary) of results including seed value, specified statistics, corresponding p-values, and 95% confidence intervals for odds ratios for each data file followed by meta statistics, if requested. Specify report="both" to generate standard and summary reports.

The following table describes the sub-element locus and its attributes and values.

Attribute

Att Value

Description

id number The locus id number in the data file
marker name Allows user to attach a marker name to the locus id
dist number Allows user to enter a recombination fraction or a distance between a marker and the proceeding marker. If the dist value is ≤0.5, the value is assumed to be a recombination fraction. If the dist value is >0.5, then the distance between the marker and the proceeding marker is assumed to be in cM

The following table describes the sub-element datafile and its attributes and values.

Attribute

Att Value

Description

studyname name Allows user to attach a study name to the genotype data file.
genotypedata name The directory path and genotype data file name for analysis. Specify each genotype data file with a separate datafile statement.
haplotype name The directory path and frequency data file name. This file allows user to specify allele or haplotype frequency. All frequencies should sum to 1.0.
linkageparameter name The directory path and linkage parameter file name for GeneCounterTopSim option only.
quantitative name The directory path and quantitative data file name for Quantitative Statistic only.

The following table describes the sub-element param and its attributes and values.

Attribute

Att Value

Variable

Description

name ccstat# classname Statistical programs. You can run multiple statistics on the same set of data. Each statistic should have a different ccstat#
name metastat# classname Meta statisitcs for multiple study data files. Each meta statistic should have a different metastat#.
name covar# number The selected Covariate id number in the quantitative datafile
name dumper class name The dumper class for dumping simulated data. TDTDumper class is used with the QTDT interface. GenoDataDumper class for dumping simulated genotype data, output file has same format as Genie input genotype datafile. IndivDumper class is used to output weights for genotyped individuals from the datafile.
name top-sample all/founder Method for calculating allele frequency for assignment to the pedigree founders for simulation. Two options: all, calculates allele frequencies based on all genotyped members in the pedigree data file, or founder, calculates allele frequencies on genotyped founders only. We recommend the all option if there are a large number of pedigrees and the number of genotyped founders in the resource is limited.

List of available statistical programs and their class names

Statistic

Class Name

Chi Squared ChiSquared
Chi Squared Trend ChiSquaredTrend
Odds Ratio ( no Confidence Intervals ) OddsRatios
Odds Ratio with Confidence Intervals OddsRatiosWithCI
CMH Chi Squared (meta) CMHChiSquared
CMH Chi Squared Trend (meta) CMHChiSqTrend
Meta Odds Ratio (no Confidence Intervals) MetaOddsRatios
Meta Odds Ratio with Confidence Intervals MetaOddsRatiosWithCI
Trio TDT TrioTDT
Sib TDT SibTDT
Combined TDT CombTDT
Quantitative  (difference in means test and ANOVA) Quantitative
Hardy Weinberg Equilibrium HWE
Q Test Odds Ratio Statistic QTestOR
 

Subset Analyses (Second Part)

The second part of the .rgen parameter file defines the subset analyses and the models to be analyzed. Users may enter markers to be tested separately (i.e., a single locus at a time approach, where each marker is assumed to be in linkage equilibrium with other markers), as well as testing markers jointly in a composite genotype or haplotype analysis.
cctable has a sub-element col, or column definition. Within the col, the user can optionally assign a weight, wt, to a particular column. Thus, wt is an attribute of col and the value of wt is defined to be a number . The col has a further sub-element g, or allele group. The g has a further sub-element a, or allele definition. The a defines the genetic pattern to be tested in PedGenie at a single locus. Each a corresponds to a locus defined in the sub-element locus. All of the a's are grouped together into a single g, the g's are grouped together into a single col, and optionally weighted, wt. If more than one group, g, is in the col, an "or" regular expression will apply to all of the groups for testing in the column, col.
The following table describles the element cctable, its optional attributes and values.

Attribute

Att Value

Description

loci number(s) Allows user to specify the locus, or loci, or a loci range for a subset analysis based on the locus id number. Default is all loci. For specifying loci range, enter the begining locus id, separated by a "-", and folllow by the ending locus id.
stats number(s) Allows user to define which statistics to run for a particular subset analysis. The stats number is selected from the list of ccstat#'s. Default is all ccstat.
metas number(s) Allows user to define which meta statistics to run for a particular subset analysis. The meta number is selected from the list of metastat#'s. Default is all metastat.
model text Allows user to define a model for a subset analysis. Model name will be printed in the report for a particular analysis.
type text Allows user to specify the type of analyze, Genotype or Allele for this subset of data, default value is "Genotype". If user specified type="Allele", a single allele code should be entered as the variable for the sub-element a, and each a corresponds to a locus. Default is type="Genotype".
 

Single locus at a time analysis approach

Various modes of inheritance may be modeled by weighting genotypes in a particular fashion. For a biallelic marker, a dominant (0,1,1), a recessive (0,0,1), and an additive mode of inheritance may be analyzed by simply weighting the genotype data as follows:

Model

Wt = 0

Wt = 1

Wt = 2

Dominant   (1/1)   (1/2), (2/1), or (2/2)     
Recessive (1/1), (1/2), or (2/1) (2/2)  
Additive (1/1) (1/2) or (2/1) (2/2)
The weights may be modified to be any integer value. For programming purposes, a (1/.) indicates a genotype of 1 and any other value. Thus for this biallelic model, the code (1/.) will pull (1/1) and (1/2) genotype data. Care must be taken to ensure that this file has no errors. Please see the SingleLocus.rgen for the format of this file.
 
Mulitallelic Markers The XML code for this file is flexible to allow any combination of markers or grouping of markers. For multiallelic markers, weights for a particular genotype are again used to indicate which group is the reference group and which is the comparison group. For example, given a locus that is multiallelic (Alleles 1, 2, and 3), a single allele (Allele 3) may be compared against all other alleles under a dominant mode of inheritance as follows:

Model

Wt = 0

Wt = 1

Dominant   (1/1), (1/2), (2/1), or (2/2)   (3/.), (./3)  
 

Composite Genotype and Hapotype Analyses

Analysis of composite genotype and haplotype data are similar to the single locus at a time approach with a few exceptions. For both composite genotype and haplotype tests, haplotypes are dropped from the founders rather than alleles. The method HapFreqTopSim is entered as the Mendelian gene drop method (see above under top). The haplotype frequencies are entered into PedGenie as a separate file. PedGenie will look for a file in the same directory as the pedigree file with the same name as the pedigree file but with the extension .hap instead of .dat. Hence, good estimates of haplotype frequencies for both the composite genotype and haplotype analyses are recommended.
For composite genotype and haplotype analyses, linkage disequilibrium between markers should be taken into account. Under the sub-element locus, dist values indicating LD (i.e., <0.5) should be listed.
 
Composite Genotype: Composite genotype tests allow a user to enter multiple inheritance models for multiple loci. For example, one can test a model that requires a dominant inheritance at one SNP locus (i.e., 1/2, 2/1 or 2/2 vs. 1/1) and a recessive mode of inheritance at another locus (i.e., 2/2 vs. 1/2, 2/1 or 1/1). Weights are again used to indicate the groupings. See the PedGenieCompGenotype.rgen for examples of composite genotype tests. The various statistical tests that can be performed by PedGenie may be selected as desired to analyze the results. The advantage of using a composite genotype test is that phase information for the observed data is not required as individual genotypes are being compared rather than haplotypes. However, haplotypes are dropped from pedigree founders and LD between the markers is taken into account for the simulated data. Thus haplotype information is utilized for creation of the empirical null distribution, but statistical comparisons are made using unphased genotype data.
 
Haplotype Tests: For haplotype tests, phase information is required for the observed data and haplotypes are dropped from the pedigree founders to create the empricial null distribution. Thus, assignment of phased genotype data or haplotypes to pedigree members with a high probability in the observed data is essential. Again, LD between markers is taken into account by setting the dist ≤0.5. For testing purposes, a single haplotype may be compared to all other haplotypes or to the most common haplotype. See the PedGenieHaplotype.rgen for examples of haplotype tests.

Home   Example Files