How big does the N need to be?

By razib on March 31, 2009.

Estimating the number of unseen variants in the human genome:

...Consistent with previous descriptions, our results show that the African population is the most diverse in terms of the number of variants expected to exist, the Asian populations the least diverse, with the European population in-between. In addition, our results show a clear distinction between the Chinese and the Japanese populations, with the Japanese population being the less diverse. To find all common variants (frequency at least 1%) the number of individuals that need to be sequenced is small (â¼350) and does not differ much among the different populations; our data show that, subject to sequence accuracy, the 1000 Genomes Project is likely to find most of these common variants and a high proportion of the rarer ones (frequency between 0.1 and 1%). The data reveal a rule of diminishing returns: a small number of individuals (â¼150) is sufficient to identify 80% of variants with a frequency of at least 0.1%, while a much larger number (> 3,000 individuals) is necessary to find all of those variants. Finally, our results also show a much higher diversity in environmental response genes compared with the average genome, especially in African populations.

The details of this matters for genetic architecture, especially for complex traits such as height & IQ.

More like this

Why do genome-wide scans fail?

The successes of genome-wide association studies (GWAS) in identifying genetic risk factors for common diseases have been heavily publicised in the mainstream media - barely a week goes by these days that we don't hear about another genome scan that has identified new risk genes for diabetes, lupus…

The parallel lives of threespine stickleback

A few days ago I pointed to a paper which suggests the possible utility of looking at selection on standing genetic variation on quantitative traits to get a sense of the role of adaptation in the human genome. We humans like to think we're a complex species, so I see no a priori reason why our…

African and Asian genome sequences: the last of the single human genome papers?

The latest issue of Nature is just as it should be: nearly wall-to-wall human genomics, with a special focus on personal genomics (more on that later). The main event is a potential historical milestone: quite possibly the last two papers ever to be published in a major journal describing the…

Genetics of complex traits in Europeans and East Asians: similarities and differences

Cho, Y., Go, M., Kim, Y., Heo, J., Oh, J., Ban, H., Yoon, D., Lee, M., Kim, D., Park, M., Cha, S., Kim, J., Han, B., Min, H., Ahn, Y., Park, M., Han, H., Jang, H., Cho, E., Lee, J., Cho, N., Shin, C., Park, T., Park, J., Lee, J., Cardon, L., Clarke, G., McCarthy, M., Lee, J., Lee, J., Oh, B.,…

Isn't the highlighted portion (diminishing returns of sample size) basic statistics? Seems odd to phrase it like the diminishing returns is the notable part instead of the remarkably low number needed for that percentage.

That there are diminishing returns for increasing sample size is well known by statisticians but some biologist need reminding. The results depend on the variants being distributed beta-binomial and the paper is an application of a 30 year old result by Efron. still, pretty impressive.

Reliabilities of identifying positive selection by the branch-site and the site-prediction methodsMasafumi Nozawa, Yoshiyuki Suzuki and Massatoshi NeiPublished online before print April 1, 2009doi: 10.1073/pnas.0901855106

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Remember to switch RSS feeds

April 3, 2010

If you link to this weblog from your weblog, please update links: http://blogs.discovermagazine.com/gnxp/ If you have not updated your feeds, please do so now: http://feeds.feedburner.com/GeneExpressionBlog The old feed address will point for another week or so to the new feed, but eventually it…

I'm moving to Discover

March 26, 2010

Update your bookmarks: http://blogs.discovermagazine.com/gnxp And RSS: http://feeds.feedburner.com/GeneExpressionBlog If you have a weblog that links to ScienceBlogs GNXP, I would appreciate you update the link for the sake of PageRank. There isn't much to say about the move. There wasn't one big…

Canada is not a "free society"

March 24, 2010

That's all I have to say to Eric Michael Johnson's post, Ann Coulter, Hate Speech, and Free Societies. OK, seriously, from what I recall Eric is an American, though resident in the forgotten north. American absolutist stances on free speech are not shared by most Western societies, so demanding…

Others in Siberia

March 24, 2010

The complete mitochondrial DNA genome of an unknown hominin from southern Siberia: With the exception of Neanderthals, from which DNA sequences of numerous individuals have now been determined...the number and genetic relationships of other hominin lineages are largely unknown. Here we report a…

The biophysical limits of cognitive computation

March 23, 2010

In this diavlog with Glenn Loury the behavioral economist Sendhil Mullainathan recounts the results of an experiment. - If given the option of paying $100 for an item vs. $80 for an item, but in the second case having to go across town for the item, respondents choose $80 and going across town - If…