TY - JOUR

T1 - Universal features of surname distribution in a subsample of a growing population

AU - Maruvka, Yosi E.

AU - Shnerb, Nadav M.

AU - Kessler, David A.

PY - 2010/1/21

Y1 - 2010/1/21

N2 - We examine the problem of family size statistics (the number of individuals carrying the same surname, or the same DNA sequence) in a given size subsample of an exponentially growing population. We approach the problem from two directions. In the first, we construct the family size distribution for the subsample from the stable distribution for the full population. This latter distribution is calculated for an arbitrary growth process in the limit of slow growth, and is seen to depend only on the average and variance of the number of children per individual, as well as the mutation rate. The distribution for the subsample is shifted left with respect to the original distribution, tending to eliminate the part of the original distribution reflecting the small families, and thus increasing the mean family size. From the subsample distribution, various bulk quantities such as the average family size and the percentage of singleton families are calculated. In the second approach, we study the past time development of these bulk quantities, deriving the statistics of the genealogical tree of the subsample. This approach reproduces that of the first when the current statistics of the subsample is considered. Surname statistics for the US in 1790 and 2000 and for Norway in 2008 are analyzed in the light of the theory and show satisfactory agreement, when the time-dependence of the growth rate is taken into account for the two contemporary data sets.

AB - We examine the problem of family size statistics (the number of individuals carrying the same surname, or the same DNA sequence) in a given size subsample of an exponentially growing population. We approach the problem from two directions. In the first, we construct the family size distribution for the subsample from the stable distribution for the full population. This latter distribution is calculated for an arbitrary growth process in the limit of slow growth, and is seen to depend only on the average and variance of the number of children per individual, as well as the mutation rate. The distribution for the subsample is shifted left with respect to the original distribution, tending to eliminate the part of the original distribution reflecting the small families, and thus increasing the mean family size. From the subsample distribution, various bulk quantities such as the average family size and the percentage of singleton families are calculated. In the second approach, we study the past time development of these bulk quantities, deriving the statistics of the genealogical tree of the subsample. This approach reproduces that of the first when the current statistics of the subsample is considered. Surname statistics for the US in 1790 and 2000 and for Norway in 2008 are analyzed in the light of the theory and show satisfactory agreement, when the time-dependence of the growth rate is taken into account for the two contemporary data sets.

KW - Coalescent

KW - Distribution

KW - Family size

KW - Growing population

UR - http://www.scopus.com/inward/record.url?scp=70450231621&partnerID=8YFLogxK

U2 - 10.1016/j.jtbi.2009.09.022

DO - 10.1016/j.jtbi.2009.09.022

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

C2 - 19769992

AN - SCOPUS:70450231621

SN - 0022-5193

VL - 262

SP - 245

EP - 256

JO - Journal of Theoretical Biology

JF - Journal of Theoretical Biology

IS - 2

ER -