How big does the N need to be?

Estimating the number of unseen variants in the human genome:

...Consistent with previous descriptions, our results show that the African population is the most diverse in terms of the number of variants expected to exist, the Asian populations the least diverse, with the European population in-between. In addition, our results show a clear distinction between the Chinese and the Japanese populations, with the Japanese population being the less diverse. To find all common variants (frequency at least 1%) the number of individuals that need to be sequenced is small (â¼350) and does not differ much among the different populations; our data show that, subject to sequence accuracy, the 1000 Genomes Project is likely to find most of these common variants and a high proportion of the rarer ones (frequency between 0.1 and 1%). The data reveal a rule of diminishing returns: a small number of individuals (â¼150) is sufficient to identify 80% of variants with a frequency of at least 0.1%, while a much larger number (> 3,000 individuals) is necessary to find all of those variants. Finally, our results also show a much higher diversity in environmental response genes compared with the average genome, especially in African populations.

The details of this matters for genetic architecture, especially for complex traits such as height & IQ.

Tags

More like this

The successes of genome-wide association studies (GWAS) in identifying genetic risk factors for common diseases have been heavily publicised in the mainstream media - barely a week goes by these days that we don't hear about another genome scan that has identified new risk genes for diabetes, lupus…
A few days ago I pointed to a paper which suggests the possible utility of looking at selection on standing genetic variation on quantitative traits to get a sense of the role of adaptation in the human genome. We humans like to think we're a complex species, so I see no a priori reason why our…
The latest issue of Nature is just as it should be: nearly wall-to-wall human genomics, with a special focus on personal genomics (more on that later). The main event is a potential historical milestone: quite possibly the last two papers ever to be published in a major journal describing the…
Cho, Y., Go, M., Kim, Y., Heo, J., Oh, J., Ban, H., Yoon, D., Lee, M., Kim, D., Park, M., Cha, S., Kim, J., Han, B., Min, H., Ahn, Y., Park, M., Han, H., Jang, H., Cho, E., Lee, J., Cho, N., Shin, C., Park, T., Park, J., Lee, J., Cardon, L., Clarke, G., McCarthy, M., Lee, J., Lee, J., Oh, B.,…

Isn't the highlighted portion (diminishing returns of sample size) basic statistics? Seems odd to phrase it like the diminishing returns is the notable part instead of the remarkably low number needed for that percentage.

That there are diminishing returns for increasing sample size is well known by statisticians but some biologist need reminding. The results depend on the variants being distributed beta-binomial and the paper is an application of a 30 year old result by Efron. still, pretty impressive.

By statsquatch (not verified) on 01 Apr 2009 #permalink