The inference of past demographic parameters from current genetic polymorphism is a fundamental problem in population genetics. The standard techniques utilize a reconstruction of the gene-genealogy, a cumbersome process that may be applied only to small numbers of sequences. We present a method that compares the total number of haplotypes (distinct sequences) with the model prediction. By chopping the DNA sequence into pieces we condense the immense information hidden in sequence space into a function for the number of haplotypes versus subsequence size. The details of this curve are robust to statistical fluctuations and are seen to reflect the process parameters. This procedure allows for a clear visualization of the quality of the fit and, crucially, the numerical complexity grows only linearly with the number of sequences. Our procedure is tested against both simulated data as well as empirical mtDNA data from China and provides excellent fits in both cases.
Bibliographical noteFunding Information:
Acknowledgement This work was supported by the EU 6th framework CO3 pathfinder. NMS and YM acknowledge many useful discussions with John Wakeley on scalable approaches to population genetics.
- Galton-Watson theory
- Haplotype statistics
- Population genetics