popabc

popABC is a program to infer historical demographic parameters.

Introduction

PopABC is a computer package to estimate historical demographic parameters of closely related species/populations (e.g. population size, migration rate, mutation rate, recombination rate, splitting events) within a Isolation with migration model. The software performs coalescent simulation in the framework of approximate Bayesian computation (ABC, Beaumont et al, 2002). PopABC can also be used to perform Bayesian model choice to discriminate between different demographic scenarios. The program can be used either for research or for education and teaching purposes.

MODELS

Statistical model

Although there are a number of different flavours of ABC, PopABC follows the standard rejection/regression approach, which has been frequently tested against full-likelihood alternatives. The base algorithm (Pritchard et al, 1999) can be summarized as: 1. Sample parameters, P, from the priors: P_i ~ p(P); 1. Simulate data, D, given P_i: D_i ~ p(D|P_i); 1. Summarize D_i with a set of chosen summary statistics to obtain S_i; go to 1. until N sample points from the joint distribution p(S,P) have been created; 1. Accept the points whose S_i is within a distance dist from s’, the real data summarized by the same set of summary statistics, |S_i – s’| < dist;

Population models

PopABC is based on the Isolation-with-Migration model (Nielsen and Wakeley, 2001), and encompasses population vicariance without migration, and also an equilibrium migration model with some choice of migration matrix. In principle any number of populations, which may comprise related species, can be analysed, but in practice this is limited by the number of summary statistics that need to be calculated.

Evolutionary models

Two different mutation models can be used according to the DNA data type considered: The Infinite Sites model (Kimura, 1969) for DNA sequences; and the Stepwise Mutation Model (Kimura and Ohta, 1978) for microsatel-lites. The microsatellite loci can be assumed to be either segregating inde-pendently or linked. A recombination rate can be assumed between the linked microsatellites. The DNA sequence loci are assumed to be segregat-ing independently, but recombination events within a locus can be mod-elled. Mitochondrial DNA, nuclear DNA, Y- or X-linked data, can be used separately or jointly.

Parameters

The program allows for the estimation of demographic parameters (effective population size; time of splitting events between sister populations; migration rates and topology of populations trees) and genetic parameters (mutation and recombination rates).

Model-choice

It is possible to estimate the Bayesian posterior probability for different models (i.e. models with or without migration; models with or without recombination and between topologies of population models). Note that Bayesian model choice can be highly dependent on the priors chosen for the parameters in the respective models.

Summary Statistics

We chose summary statistics that have had extensive use in population genetic inference. For microsatellite data: the heterozygosity; variance and kurtosis of allele length; the number of different alleles; an index of gene diversity; and an FST-based estimator of the number of migrants. For DNA sequences: the mean pairwise difference; the number of segregating sites; the number and frequency of private segregating sites; the number of dif-ferent haplotypes; an index of haplotype diversity; the mean and standard deviation of the mutation frequency spectrum; and an FST-based estimator of the number of migrants. Summary statistics are computed for each popu-lation, and then for all pairs of populations combined.

ASSUMPTIONS

If assuming “Isolation with Migration” the following should be verified (Nielsen and Wakeley, 2001): there should not be other populations that are more closely related to the sampled populations than they are to each other; and there should not be unsampled populations exchanging genes with the studied populations or their own ancestors. Other assumptions come from the Coalescent Method employed: the variation within the DNA data has to be neutral; free recombination be-tween DNA sequence loci is assumed; and mutations follow the mutation model considered.

REFERENCES

Beaumont, M.A., et al. (2002) Approximate Bayesian computation in population genetics, Genetics, 162, 2025–2035.

Kimura, M. (1969) The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations, Genetics, 61, 893–903.

Kimura, M. and Ohta, T. (1978) Stepwise mutation model and distribution of allelic frequencies in a finite population, Proc. Natl. Acad. Sci. U. S. A., 75, 2868–2872.

Nielsen, R. and Wakeley, J. (2001) Distinguishing migration from isolation: a Markov chain Monte Carlo approach, Genetics, 158, 885–896.

Pritchard, J.K., et al. (1999) Population growth of human Y chromosomes: a study of Y chromosome microsatellites, Mol. Biol. Evol., 16, 1791–1798.

Project Information

License: GNU GPL v3
git-based source control

Labels:
Populationgenetics Phylogeography Isolationwithmigration ApproximateBayesianComputation ABC

Code

Archive