Daniel G. Hert, Christopher P. Fredlake, Annelise E. Barron (2008). Advantages and limitations of next-generation sequencing technologies: A comparison of electrophoresis and non-electrophoresis methods Electrophoresis, 29 (23), 4618-4626 DOI: 10.1002/elps.200800456
The dideoxy termination method of DNA sequencing (often called Sanger sequencing after the technique’s inventor, Fred Sanger) has been the workhorse of pretty much every molecular biology lab for the last 30 years. However, over the last few years the method has been increasingly supplanted by so-called next-generation sequencing technologies, which allow incredibly rapid generation of large amounts of sequence data. Sanger sequencing is still widely used for small-scale experiments and for “finishing” regions that can’t be easily sequenced by next-gen platforms (e.g. highly repetitive DNA), but most people see next-gen as the future of genomics.
However, perhaps rumours of the death of Sanger sequencing have been somewhat exaggerated. In a recent review article in Electrophoresis and an interview for In Sequence (subscription required) Stanford’s Annelise Barron argues that Sanger sequencing will persist, albeit in a revamped and scaled-up format.
The issue is read length.
All sequencing platforms generate sequence data in the form of many independent reads, which then must be assembled together to form a complete sequence. For Sanger sequencing these reads are routinely 800-1000 base pairs long; next-gen methods produce much larger quantities of sequence, but in the form of much smaller reads (the two best-performing platforms generate 35-75 base pair reads, while a third, lower-throughput platform can manage 400 base pairs).
Read length is absolutely crucial when it comes to assembling accurate sequence, especially for genomes as complex and repetitive as the human genome. If a repetitive region is much longer than a platform’s read length, it can’t really be accurately assembled – so human genomes sequenced with current next-gen platforms actually consist of hundreds of thousands of accurately sequenced fragments interspersed by gaps. That’s good enough for most purposes, but it’s by no means a complete genome sequence.
Barron argues that scaled-up platforms employing Sanger-based sequencing – allowing up to 50,000 reads to be generated at once, rather than the 96 reads permitted by current systems – could actually be cost-competitive with next-gen sequencing for some applications, and also provide the benefit of longer reads. The two applications Barron describes in detail in her article are sequencing the human leukocyte antigen (HLA) region, and large-scale genotyping of microsatellite markers (highly repetitive and variable regions of the genome).
An up-scaled Sanger-based approach would certainly be useful for sequencing projects targeting a small region in a large number of individuals (rather than sequencing whole genomes in a smaller number of individuals). In the In Sequence interview Barron explains:
You don’t necessarily always want to sequence an entire genome. You sort of have to spend, at this time, $7,000 if you are working with 454, and you get the whole genome [at very low coverage]. What if you want 10 exons, and you want to spend 4 cents each? That’s the kind of thing a doctor might want. I think that the advantage of the electrophoresis technologies is [that] they are scalable in that way; you can do it on a per-channel basis. And that is much more suited to looking at limited gene regions for individual patients.
That makes sense to me (especially as someone currently trying to use next-gen platforms to do the same thing, which turns out to be fairly painful). There’s also a lot to be said for supplementing short-read platforms with Sanger sequencing for de novo genome assembly, to paper over some of the gaps in repetitive regions.
However, I’m not convinced that scaled-up Sanger sequencing will necessarily be competitive with emerging next-next-gen platforms. Companies like Pacific Biosciences, Oxford Nanopore and Visigen are currently developing technologies that promise long reads generated from single DNA molecule; any one of these platforms may be able to generate the high-throughput long-read results required to make Sanger sequencing completely obsolete for large-scale projects. Whether these companies will actually be able to fulfil their promises, of course, remains to be seen…