
treemix
This repository has moved to BitBucket and is no longer being maintained here
TreeMix is a method for inferring the patterns of population splits and mixtures in the history of a set of populations. In the underlying model, the modern-day populations in a species are related to a common ancestor via a graph of ancestral populations. We use the allele frequencies in the modern populations to infer the structure of this graph.
The details of the TreeMix model are presented in:
Pickrell JK and Pritchard JK. http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002967'>Inference of population splits and mixtures from genome-wide allele frequency data.
Some extensions are presented in:
Pickrell JK, Patterson N, Barbieri C, Berthold F, Gerlach L, Güldemann T, Kure B, Mpoloka SW, Nakagawa H, Naumann C, Lipson M, Loh PR, Lachance J, Mountain J, Bustamante CD, Berger B, Tishkoff SA, Henn BM, Stoneking M, Reich D, Pakendorf B. http://www.ncbi.nlm.nih.gov/pubmed/23072811'>The genetic prehistory of southern Africa.
We describe an application of this model to looking for natural selection in humans and dogs at http://www.genomesunzipped.org/2012/03/identifying-targets-of-natural-selection-in-human-and-dog-evolution.php'>Genomes Unzipped.
What's new:
6/5/13:
TreeMix 1.12 released.
- Fixes a bug that caused the reported relative likelihoods to be incomparable between trees and graphs. Many thanks to Mait Metspalu and Mike DeGiorgio for working through this.
- Also adds a -seed option for setting the random seed from the command line
11/20/12:
The TreeMix paper has been published in PLoS Genetics:
Pickrell JK and Pritchard JK http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002967'>Inference of population splits and mixtures from genome-wide allele frequency data
10/22/12:
Release of version 1.11.
- Fixes a bug that sometimes caused crashes when using microsatellite data
10/1/12:
Release of version 1.1.
- Allows input of microsatellite data. For a description of the microsatellite model, see here (pdf)
- Allows incorporation of known migration events
- Small other bug fixes
7/25/12:
Preprint: http://arxiv.org/abs/1207.5552'>"The genetic prehistory of southern Africa" is available on arXiv. The new features in TreeMix described in this preprint will be available in the next release (estimated Sept. 2012).
5/24/12:
Release of version 1.04.
- Forces migration edges to have weight less than 0.5
- Include three- and four- population tests for treeness from Reich et al. 2009 (programs are called threepop and fourpop, respectively)
To run threepop or fourpop, the input is standard TreeMix input. Then run (e.g.)
>threepop -i input.gz -k 500
This will print f3 statistics for all populations to stdout, and calculate standard errors in blocks of 500 SNPs. For example, running this on the test input files will give a set of output like:
Estimating f_3 in 59 blocks of size 500
total_nsnp 29999 nsnp 29999
Dai;Han,Sardinian 0.00112445 0.000276542 4.06609
Han;Sardinian,Dai 0.000536062 0.000211323 2.53669
Sardinian;Han,Dai 0.0289054 0.000867602 33.3165
The line Sardinian;Han,Dai 0.0289054 0.000867602 33.3165
tells you that f3(Sardinian;Han,Dai) is ~0.03, with a standard error of 0.0009, which corresponds to a z-score of 33. For information on how to interpret these tests, see Reich et al. (2009).
3/12/12:
Added a small script to convert stratified allele frequencies output from plink into TreeMix format. This will be incorporated into the next release, but for the moment must be downloaded separately. To run this, let's say you have data in plink format (e.g., data.bed, data.bim, data.fam) and a plink cluster file matching each individual to a population (data.clust).
Now you run:
>plink --bfile data --freq --missing --within data.clust
>gzip plink.frq
>plink2treemix.py plink.frq.gz treemix.frq.gz
The file treemix.frq.gz can now be used as input for TreeMix.
Version 1.0.3:
- small bug fixes
Version 1.0.2:
- removed an unnecessary header that sometimes caused compilation problems
- small big fixes
Version 1.0.1:
Project Information
The project was created on Sep 14, 2011.
- License: GNU GPL v3
- 2 stars
- svn-based source control