Sanger sequencing is not dead?

By dgmacarthur on January 7, 2009.

Daniel G. Hert, Christopher P. Fredlake, Annelise E. Barron (2008). Advantages and limitations of next-generation sequencing technologies: A comparison of electrophoresis and non-electrophoresis methods Electrophoresis, 29 (23), 4618-4626 DOI: 10.1002/elps.200800456

The dideoxy termination method of DNA sequencing (often called Sanger sequencing after the technique's inventor, Fred Sanger) has been the workhorse of pretty much every molecular biology lab for the last 30 years. However, over the last few years the method has been increasingly supplanted by so-called next-generation sequencing technologies, which allow incredibly rapid generation of large amounts of sequence data. Sanger sequencing is still widely used for small-scale experiments and for "finishing" regions that can't be easily sequenced by next-gen platforms (e.g. highly repetitive DNA), but most people see next-gen as the future of genomics.

However, perhaps rumours of the death of Sanger sequencing have been somewhat exaggerated. In a recent review article in Electrophoresis and an interview for In Sequence (subscription required) Stanford's Annelise Barron argues that Sanger sequencing will persist, albeit in a revamped and scaled-up format.

The issue is read length.

All sequencing platforms generate sequence data in the form of many independent reads, which then must be assembled together to form a complete sequence. For Sanger sequencing these reads are routinely 800-1000 base pairs long; next-gen methods produce much larger quantities of sequence, but in the form of much smaller reads (the two best-performing platforms generate 35-75 base pair reads, while a third, lower-throughput platform can manage 400 base pairs).

Read length is absolutely crucial when it comes to assembling accurate sequence, especially for genomes as complex and repetitive as the human genome. If a repetitive region is much longer than a platform's read length, it can't really be accurately assembled - so human genomes sequenced with current next-gen platforms actually consist of hundreds of thousands of accurately sequenced fragments interspersed by gaps. That's good enough for most purposes, but it's by no means a complete genome sequence.

Barron argues that scaled-up platforms employing Sanger-based sequencing - allowing up to 50,000 reads to be generated at once, rather than the 96 reads permitted by current systems - could actually be cost-competitive with next-gen sequencing for some applications, and also provide the benefit of longer reads. The two applications Barron describes in detail in her article are sequencing the human leukocyte antigen (HLA) region, and large-scale genotyping of microsatellite markers (highly repetitive and variable regions of the genome).

An up-scaled Sanger-based approach would certainly be useful for sequencing projects targeting a small region in a large number of individuals (rather than sequencing whole genomes in a smaller number of individuals). In the In Sequence interview Barron explains:

You don't necessarily always want to sequence an entire genome. You sort of have to spend, at this time, $7,000 if you are working with 454, and you get the whole genome [at very low coverage]. What if you want 10 exons, and you want to spend 4 cents each? That's the kind of thing a doctor might want. I think that the advantage of the electrophoresis technologies is [that] they are scalable in that way; you can do it on a per-channel basis. And that is much more suited to looking at limited gene regions for individual patients.

That makes sense to me (especially as someone currently trying to use next-gen platforms to do the same thing, which turns out to be fairly painful). There's also a lot to be said for supplementing short-read platforms with Sanger sequencing for de novo genome assembly, to paper over some of the gaps in repetitive regions.

However, I'm not convinced that scaled-up Sanger sequencing will necessarily be competitive with emerging next-next-gen platforms. Companies like Pacific Biosciences, Oxford Nanopore and Visigen are currently developing technologies that promise long reads generated from single DNA molecule; any one of these platforms may be able to generate the high-throughput long-read results required to make Sanger sequencing completely obsolete for large-scale projects. Whether these companies will actually be able to fulfil their promises, of course, remains to be seen...

Subscribe to Genetic Future.

More like this

Nice description Daniel.

The other issue besides read length is the number of samples that can be processed. Right now, each Solexa or SOLiD run generates lots of data but only for a small number of samples. This won't work well (yet) for doing large numbers of clinical assays.

Read length is a problem that Roche is working on, trying to get longer read lengths for the 454.

Hey Sandra,

Good point about sample number, and one I didn't emphasise enough in my post (I said that a Sanger-based approach "would certainly be useful for sequencing projects targeting a small region in a large number of individuals", but didn't point out that this is one major area where current platforms fall down). I'm currently working on a Solexa-based project using bar-coding to analyse 96 samples per lane (for a ~25 kb region) and it's been a real pain - I'm sure the glitches will be worked out soon, but in the meantime Sanger sequencing is probably the optimal approach there.

All three of the platforms are definitely bumping up their read lengths - e.g. we're currently routinely getting 50 bp on Solexa and moving towards 75. Obviously Roche has the edge in that respect, but I get the feeling that they're really starting to lose their place in the market - their longer read length just doesn't make up for their cost per base being so much higher than Solexa/SOLiD. It will be interesting to see how they go in 2009.

Interesting article. Looking forward to your promised post on trends in 2009. Interesting DTC genetic testing conference coming up in June, are you likely to attend and cover it:
http://www.consumergeneticsshow.com/Speakers.php

With appreciation,

Hey Daniel, happy holidays and welcome back from your posting hiatus.

Can we start a convention where next-next-gen is referred to as "3rd generation"? So Sanger would be 1st generation, Solexa/454 2nd generation.

At least for now, Sanger isn't going anywhere and will continue to survive in the kind of niches you describe. Another is large-scale assembly - we need Sanger backbone reads that are large enough to get past repetitive elements like Alus.

Can we start a convention where next-next-gen is referred to as "3rd generation"? So Sanger would be 1st generation, Solexa/454 2nd generation.

Yeah, most journals are rejecting the "next-gen" terminology and asking that you describe the current platforms as "massively parallel sequencing". As you point out, next-gen now means PacBio, et al doing single molecule stuff.

RKirk,

The prediction post is on its way. As for the conference - I'd love to go, if I can find someone to pay for me. :-)

Andro,

Good to be back (although unfortunately the hiatus had as much to do with work commitments as holidays).

I kind of like the futuristic sound of "next-next-gen", but I can see that it sets a trend that will rapidly become ridiculous (I don't want to be laboriously typing out "next-next-next-next-next-gen" in 2023!) - so 3rd gen it is.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Glyphosate reduces soil biodiversity and decreases the proportion of native species (French)

More by this author

Genetic Future is moving

January 18, 2011

After a semi-hiatus due to various distractions, I'm about to restart blogging in earnest again over at the new home of Genetic Future on Wired Science. Please update your RSS feed: my new one is here. And a reminder: you can always keep track of new posts here as well as other nuggets of…

One more step towards the end of recessive diseases

January 13, 2011

In the last century infant mortality has declined precipitously in the Western world, thanks in large part to the development of antibiotics and vaccination. Yet as the suffering and death from infectious disease has reduced, the burden from genetic disease has become proportionately greater:…

New FireFox plugin for 23andMe customers

January 11, 2011

Software company 5AM Solutions has just launched a neat little FireFox plug-in for customers of consumer genomics company 23andMe. The idea is very simple: Download your raw data from 23andMe (or use one of the files from me or my colleagues at Genomes Unzipped); Install the plug-in from here…

Why you CAN have your $1000 genome - so long as you learn what to do with it

January 7, 2011

As part of his Gene Week celebration over at Forbes, Matthew Herper has a provocative post titled "Why you can't have your $1000 genome". In this post I'll explain why, while Herper's pessimism is absolutely justified for genomes produced in a medical setting, I'm confident that I'll be obtaining…

Bioscience Resource Project critique of modern genomics: a missed opportunity

December 15, 2010

Late last week I stumbled across a press release with an attention-grabbing headline ("The Causes of Common Diseases are Not Genetic Concludes a New Analysis") linking to a lengthy blog post at the Bioscience Resource Project, a website devoted to food and agriculture. The post, written by two…