Purcell et al. (2009). Common polygenic variation contributes to risk of schizophrenia and bipolar disorder Nature DOI: 10.1038/nature08185
Neil Walker has been doing a spectacular job of serving up useful information in the comments recently, so I asked him to write the first ever guest post on Genetic Future – something that (as I will be announcing shortly) I intend to do fairly regularly over the next couple of months.
The topic is a paper that has created a rather perplexed buzz recently in the complex disease genetics community: the genome-wide association study (GWAS) for schizophrenia published in Nature last week. This paper takes a novel and (at first glance) rather alarming approach to exploring the genetic basis of this complex disease, so I asked Neil to provide some insight into what he thought about the approach used in this paper and what it means for complex disease genetics.
Without further comment, I present Neil’s post:
Common polygenic variation contributes to risk of schizophrenia and bipolar disorder
If you’ve not read this recent Nature paper from the International Schizophrenia Consortium, a quick summary of findings is available (from the The Mental Health Social Worker), as Schizophrenia and Bipolar Disorder Share Genetic Roots.
Perhaps it went something like this?
[Scene: Anonymous (poorly furnished, chaotic and paper-filled) academic offices round the world. A teleconference at some ungodly hour to accomodate the Swedes, British, Irish, Portugese and Americans. Groups gathered around speaker phones, waiting for copies of slides to come in by email.]
Chair: Right, welcome to the umpteenth meeting of the International Schizophrenia Consortium Management Committee , and if we’re all here, let’s get straight to the standing agenda item: do we have anything to show for our $5 million GWAS  Shaun? 
Shaun Purcell: Well we got 2 hits at genome-wide significance  – one in some new gene I’ve never heard of on chromosome 22 – MYO18B – and a whole bunch in the MHC 
Chair: But we did at least replicate the previously identified regions?
SP: Not entirely 
Chair: What about CNVs? Imputation?
SP: Nothing new 
Chair: OK. Well, the preliminary data suggested as much. You’ll all recall that you asked me to get in touch with the Molecular Genetics of Schizophrenia and SGENE consortia  to see if they’d play ball in a meta-analysis …
Chair: … and while we couldn’t agree to share the raw data, we got hold of summary data for SNPs showing a trend – 10^-3. So Shaun, with 8,000 cases and 19,000 controls, there must be something, or we can kiss goodbye to the genetics of schiziophrenia …
SP: Nothing. The MHC SNP barely replicated. But let’s not panic - I’ve got a cracking new idea …
And that idea is this paper.
After a page of preliminaries (as crudely characatured above), the paper kicks off:
Our second approach was to evaluate whether common variants have an important role en masse, directly testing the classic theory of polygenic inheritance, previously hypothesized to apply to schizophrenia. Although our GWAS analysis did not identify a large number of strongly associated loci, there could still be potentially thousands of very small individual effects that collectively account for a substantial proportion of variation in risk. We summarized variation across nominally associated loci into quantitative scores, and related the scores to disease state in independent samples.
The basic thesis is this: if you take the good quality independent SNPs in a GWAS, and divide them in 2 on the basis of association results in half of your GWAS sample, then if the more associated SNPs in that half are also the more associated SNPs in the other half of the sample, that means something.
The rest of the paper, and the exemplary 46 pages of Supplementary Information – while reading a little like an extended advert for Shaun Purcell’s PLINK - are an attempt to convince first the authors, then the Management Committee, then the reviewers, and now us, that this is not just a nasty fudge – making a virtue of necessity – but a genuine finding.
Other bloggers/journalists were (much) quicker off the mark:
- Whalefall’s A Schizophrenia Gene Debacle
From a journalistic perspective, there are two possible stories here. First, the straight story: schizophrenia is extraordinarily complicated, and genetics can’t now explain it in any useful way. And two, the contextual angle: for years, the public has expected, and scientists have sometimes promised, that genetics would illuminate this disease – and it failed, just as it has for nearly every disease.
- Nicholas Wade’s Hoopla, and Disappointment, in Schizophrenia Research
The journal Nature held a big press conference in London Wednesday, at the World Conference of Science Journalists, to unveil three large studies of the genetics of schizophrenia. Press releases from five American and European institutions celebrated the findings, one using epithets like “landmark,” “major step forward,” and “real scientific breakthrough.” It was the kind of hoopla you’d expect for an actual scientific advance.
It seems to me the reports represent more of a historic defeat, a Pearl Harbor of schizophrenia research.
The defeat points solely to the daunting nature of the adversary, not to any failing on the part of the researchers, who were using the most advanced tools available. Still, who is helped by dressing up a severely disappointing setback as a “major step forward”?
The principal news from the three studies is that schizophrenia is caused by a very large number of errant genes, not a manageable and meaningful handful.
In the last few years gene hunters in one common disease after another have turned up a few causative variant genes, after vast effort, but the variants generally account for a small percentage of the overall burden of illness. With most common diseases, it turns out, the disease is caused not by ten very common variant genes but by 10,000 relatively rare ones.
(This last being David B. Goldstein’s view)
which is in turn quoted by:
- Fists Full of Science’s Missed opportunity – not debacle – in bogus schizophrenia genes coverage
Given that researchers had been looking for meatier schizophrenia genes for years and years without finding anything substantial, this was to be expected, especially if you’re the kind of person who questions whether there are very many genes “for” anything.
It then drifts off into being another story about poor science coverage – we’ve had a lot of those lately – although it shows signs of life in the what’s-it-all-for stakes, which in turn shows up in:
- Wiring The Brain’s Hot new in the genetics of schizophrenia
which basically suggests geneticists are looking in the wrong place, and its all about rare variants
Why add to this weight of verbiage?
Well, first, Daniel and I just didn’t believe the result at all - and it’s taken detailed reading to get beyond that.
Second, the paper ends, bullishly:
We identified fewer unambiguously associated variants than studies of some non-psychiatric diseases of comparable size. Nonetheless, for other diseases replicated variants typically account for only a modest fraction of risk. The nature of this “missing heritability” is a general problem now faced by complex disease geneticists. For schizophrenia, our data point to a genetic architecture that includes many common variants of small effect. The extent to which similar models characterize genetic variation within and across other complex diseases remains to be investigated.
which deserves some response – even if only I don’t think we’ll bother!
So, a few points:
1. “polygenic” doesn’t mean anything interesting. Its usage here comes from the pre-history of molecular biology (1967), when researchers began to fail to find single gene causes of heritable diseases and wanted a new name for their disappointment. It is used in contrast with “monogenic” – but no-one believes other complex, multifactorial diseases are driven by single, highly penetrant genes, and not many people (Goldstein apart) think that complex disorders are driven by a series of rare highly penetrant variants. We’ll know soon enough – see prediction 1.
2. News reports that wrote this up as 30,000 SNPs are associated with schizophrenia are wide of the mark, and specifically rejected by the paper:
We use the term score, instead of risk, as we cannot differentiate the minority of true risk alleles from unassociated variants.
- this is just top half vs bottom half of associations.
3. the most obvious reason why 2 halves of a GWAS would look alike in their associations, is that the case and control samples, or case and control subjects differ in some way other than by the disease status of the subjects. In particular DNA quality and/or preparation could lead to the genotypes being scored differently – a particularly pernicious problem is non-random missingness – or the case and control subjects could be poorly matched – by geography or ethnicity.
With regards to quality control, the Supplementary Information (section S1) goes into elaborate detail as to what was done to weed out suspect samples and SNPs. The attempt to merge data across 3 generations of Affymetrix chip has led the SNP QC to be too complicated to be entirely convincing, but when it came to generating the master set of SNPs used in the big “score” analysis, extra cutoffs were used – genotyping call rate >= 99% (which meant the SNPs needed to be on all chip types), MAF >= 2% and low LD (r^2 <= 0.25 in 200-SNP sliding window) – which is all fine.
Samples were removed both if they were too similar, or too different from other samples – which should take care of both overrepresentation of extended family pedigrees, and subjects who do not match ethnically.
Some population stratification is expected, and an analysis was performed to control for the multiple sites used, leaving a no-bigger-than-expected “genomic inflation figure” of 1.09 – the WTCCC collections reported on the range of 1.03-1.11, with the comment that “overall effect of population structure on our association results seems to be small, once recent migrants from outside Europe are excluded.” At the time, this was considered somewhat surprising.
4. without the replication in other disease collections – and with no evidence whatsoever – I’d still be suspicious that Shaun and co had come up with some new way of generating an artifactual result. It is plain others didn’t believe it either, as there are 7 pages of Supplementary Information – S13, pp24-30 – on “Addressing population stratification and other possible confounders”.
OK, so the result is real. What does it mean?
At heart, this is a stats paper. There is an excellent discussion, from data simulation experiments, of what magnitude of associations, and what the allele frequencies of these would need to be to produce such data – dismissing both the multiple-common-variants and all-rare-variants alternatives.
However, before the upbeat ending, the authors note:
A highly polygenic model suggests that genetically influenced individual differences across domains of brain development and function may form a diathesis for major psychiatric illness, perhaps as multiple growth and metabolic pathways influence human height. Our results may also reflect heterogeneity, such that some patients have aetiologically distinct diseases. The shared genetic liability between schizophrenia and bipolar disorder, previously suggested by clinical and genetic epidemiology, opens up the possibility of genetically based refinements in diagnosis.However, the scores derived here have little value for individual risk prediction, meaning that application to clinical genetic testing for schizophrenia would be unwarranted. In the future, measures of polygenic burden, along with known risk loci and non-genetic factors such as season of birth, life stress, obstetrical complications, viral infections and epigenetics, could open new avenues for studying gene-gene and gene-environment interactions.
So, with no real hope of picking gene pathways when your best association result is in half the SNPs tested, and (therefore) with no predictions, this is not the paper Nature thought it was selling us, and not the one some people thought they were buying.
But it is still a tour de force and deserves to be read and discussed.
 yes, with weekly calls not uncommon, these would be turned into a bunch of acronyms;
 crudely estimated from cost of GWAS chip on 7,000 subjects plus staff and overheads;
 I’m assuming Shaun Purcell – more widely known for the development of PLINK: whole genome analysis toolset - presented the analysis: only those involved in large Data Analysis group will know who did what;
 e.g. 5 * 10^-7 in WTCCC
 a wildly complicated region implicated in almost all auto-immune diseases, and with a bizarre haplotype naming scheme.Phylogenetic analysis suggests the earliest haplotype split was at least 40 million years ago
 22q11.2 deletion region and ZNF804A replicated – the rest did not, see Supplementary Information section 5
 there is some disagreement about how to impute data – mostly around whether 60 CEU founders in the HapMap are enough for a brute force attempt. The 1000 Genomes project should see this off.
 most disease groups seem to have several competing international consortia rather than a single one, but who nonetheless can be persuaded to bury the hatchet when mutual interest requires it.
Neil Walker is “Head of Data Services” – a job title invented to avoid any pay and grading mishap, at a time when “data manager” was seen as synonymous with “data entry clerk” – at the Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, and had a major role in the project management and QC of the first WTCCC experiment. His stats are, frankly, weak but he is very good at not believing results until he has been shown the data.