Gene Expression

How big does the N need to be?

Estimating the number of unseen variants in the human genome:

…Consistent with previous descriptions, our results show that the African population is the most diverse in terms of the number of variants expected to exist, the Asian populations the least diverse, with the European population in-between. In addition, our results show a clear distinction between the Chinese and the Japanese populations, with the Japanese population being the less diverse. To find all common variants (frequency at least 1%) the number of individuals that need to be sequenced is small (∼350) and does not differ much among the different populations; our data show that, subject to sequence accuracy, the 1000 Genomes Project is likely to find most of these common variants and a high proportion of the rarer ones (frequency between 0.1 and 1%). The data reveal a rule of diminishing returns: a small number of individuals (∼150) is sufficient to identify 80% of variants with a frequency of at least 0.1%, while a much larger number (> 3,000 individuals) is necessary to find all of those variants. Finally, our results also show a much higher diversity in environmental response genes compared with the average genome, especially in African populations.

The details of this matters for genetic architecture, especially for complex traits such as height & IQ.


  1. #1 dreikin
    March 31, 2009

    Isn’t the highlighted portion (diminishing returns of sample size) basic statistics? Seems odd to phrase it like the diminishing returns is the notable part instead of the remarkably low number needed for that percentage.

  2. #2 statsquatch
    April 1, 2009

    That there are diminishing returns for increasing sample size is well known by statisticians but some biologist need reminding. The results depend on the variants being distributed beta-binomial and the paper is an application of a 30 year old result by Efron. still, pretty impressive.

  3. #3 Trin Tragula
    April 3, 2009

    Reliabilities of identifying positive selection by the branch-site and the site-prediction methods
    Masafumi Nozawa, Yoshiyuki Suzuki and Massatoshi Nei
    Published online before print April 1, 2009
    doi: 10.1073/pnas.0901855106

The site is currently under maintenance and will be back shortly. New comments have been disabled during this time, please check back soon.