Software


Notes
Useful links
Brief program descriptions


Notes

My programs can all be downloaded in a single jar file called genepi.jar. If you download and add this jar file to your CLASSPATH, you should be able to run all the programs.

Note that I use the LINKAGE data format for most input files, and for some output files. Go here for more information on this format.

New additions (27/10/08) are the programs for performing shared genomic segment analysis, or IBD mapping. These include the ability to evaluate statistical significance using models that account for linkage disequilibrium. These models are estimated from control data using IntervalLD. Through various model restrictions, this program now works in time and storage linear in the number of markers. I've used it to estimate LD models from sets of 60 unrelated individuals genotyped under the HapMap project for over 200000 loci.

There are also several programs for general data handling that I've added recently.

Previous additions were, McLink and McLinkLD (25/08/06). McLink implements an MCMC scheme for linkage analysis with an option to assume a specified model for linkage disequilibrium between the markers. McLinkLD samples from the joint distribution of LD models and linkage lod scores given the observed pedigree and genotypes. These programs are provided for users to experiment with and results from them should be interpreted with caution. It is not clear under what conditions, if any, the MCMC methods implemented by these programs result in good mixing properties and reliable results.

The top level program, which can all be run by typing something like

    % java ClassName input1 input2 ...
are described fully in their class descriptions in the "Unnamed" package of the Javadocs web pages.

All these programs are written in Java version 1.5. so you need an appropriate Java virtual machine to run them.

Note that several of the programs are computationally demanding and may take considerable time to run. If they throw an error indicating that there was insufficient storage, increase this with the -Xms and -Xmx options to java.


Useful links





Programs for shared genomic segment analysis
  • SGS

    finds regions of heterozygous sharing in sets of genotyped individuals. For mapping using identity by descent methods in pedigrees and founder populations.

  • HGS

    finds regions at which sets of individuals are homozygous. Potentially useful for identifying deletions.

  • SimSGS

    a program for simulating data to match that observed using SGS in order to assess the significance. Allows modelling of linkage disequilibrium using graphical models estimated by IntervalLD .

  • SimHGS

    a program for simulating data to match that observed using HGS in order to assess the significance. Allows modelling of linkage disequilibrium using graphical models estimated by IntervalLD .

  • MakeProbands

    a program that specifies which individuals in a pedigree to consider are probands. It is sharing between these individuals that is considered by SGS and HGS.

  • IntervalLD

    a variant of the HapGraph program that restricts the models allowed to a specific subset of graphical models with conditional independence graphs that are interval graphs. This program scales linearly with the number of loci and can be used with over 100000 loci. The output can be used by SimSGS, SimHGS, GeneDrop, and GeneDrops.

  • GeneDrop

    a program to simulate a single instance of genetic data to match that seen in a pedigree. This uses multi locus gene drop under linkage equilibrium, or under linkage disequilibrium using models estimated by IntervalLD.

  • GeneDrops

    a program to simulate a multiple instances of genetic data to match that seen in a pedigree. This uses multi locus gene drop under linkage equilibrium, or under linkage disequilibrium using models estimated by IntervalLD.


General pedigree analysis utilities
  • CheckFormat

    a program that checks the format of LINKAGE parameter and pedigree input files.

  • CheckParameters

    a program that checks the format of LINKAGE paramter files. Basically the first half of CheckFormat.

  • cMorgansToTheta

    a program that converts interlocus genetic distances from centi Morgans to recombination fractions.

  • CheckPedigree

    a program that checks the format of LINKAGE pedigree files. Basically the second half of CheckFormat.

  • CheckTriplets

    a program that checks a list of individual, father, mother triplets for consistency with the usual pedigree rules.

  • CheckErrors previously called GMCheck

    a program that uses graphical modelling or Bayesian network methods to calculate the posterior probability of genotype or phenotype errors in pedigrees.

  • ObligateErrors

    like CheckErrors but only reports obligate errors, not likely ones. Needs less space to run than CheckErrors.

  • DownCodeAlleles

    a program that removes alleles unobserved in genotype data from the specified model for the locus.

  • GeneCountAlleles

    a program that implements gene counting, or the EM algorithm, to obtain maximum likelihood estimates for allele frequencies from genotypes of related individuals.

  • SelectLoci

    a program for selecting subsets of loci from LINKAGE input files.

  • GetPolymorphisms

    a program for selecting subsets of loci from LINKAGE input files that removes loci for which only 1 allele is seen in the data.

  • HetCutOff

    a program that selects subsets of loci for which the heterozygisity score is higher than a specified threshold.

  • Heterozygosities

    a program that computes and reports the heterozygosity scores for the loci in a LINKAGE parameter file.

  • SelectKindreds

    a program for selecting subsets of kindreds from LINKAGE input files. You can probably do the same thing with a grep command.

  • TrimPed

    a program to remove individuals from a pedigree if they have insufficient observed data.


Linkage analysis programs
  • OnePoints

    a program for calculating the one point lod score for a locus. That is, just the likelihood of the data at the locus given the specified locus parameters.

  • TwoPointLods

    a program for calculating simple two point lod scores on a grid of values for the recombiation parameter.

  • MaxTwoPointLods

    a program for finding the maximum lod score. Note that the search includes values of the recombination fraction between 0.5 and 1.

  • McLink

    a program for calculating multi locus linkage statistics in extended pedigrees using Markov chain Monte Carlo integration. There is an option to run assuming linkage disequilibrium between the markers which can be specified as a model output from HapGraph. As this is a Markov chain Monte Carlo implementation with unknown mixing properties it may not give reliable results in all cases. This program is provided primarily for those who want to experiment with MCMC pedigree analysis.

  • McLinkLD

    a program that combines McLink and HapGraph. This iteratively updates inheritance states in a pedigree and the graphical model for linkage disequilibrium giving, in effect, linkage statistics model averaged over estimated linkage disequilibrium models. This is very computationally intensive. If you can estimate a linkage disequilibrium model using HapGraph and input it to McLink that is probably a more tractable solution. As this is a Markov chain Monte Carlo implementation with unknown mixing properties it may not give reliable results in all cases. This program is provided primarily for those who want to experiment with MCMC pedigree analysis.


Haplotyping programs
  • HapGraph

    a program for fitting a graphical model for linkage disequilibrium to haplotype data, and a general graphical model fitting program. HapGraph now estimates graphical models from genotype data. It also estimates haplotype frequencies and reconstructs phase.

  • GCHap

    a program for calculating maximum likelihood estimates of haplotype frequencies from a sample of genotyped individuals. This uses a staged gene counting, or EM, method starting with a small number of loci and adding one at each stage.

  • ApproxGCHap

    a program for calculating rough maximum likelihood estimates of haplotype frequencies from a sample of genotyped individuals. It is the same as GCHap except that to save time and space, haplotypes with low frequency are eliminated at each stage.

  • LinkageToPhase

    a program to convert LINKAGE formated data into files suitable for inputting into the PHASE programs.

  • LinkageToFastPhase

    a program to convert LINKAGE formated data into files suitable for inputting into the FASTPHASE programs.


Viewing programs
  • ViewGraph

    a general program for viewing and editing graphs.

  • ViewPed

    a program for viewing pedigrees when the input is in the form of a standard triplet file.

  • ViewLinkPed

    a program for viewing pedigrees when the input is in the form of a LINKAGE pedigree file.