Digital Biology Friday: What sequences do you believe?

During the past few Fridays (or least here and here), we've been looking at a paper that was published from China with some Β-lactamase sequences that were supposedly from Streptococcus pneumoniae. The amazing thing about these particular sequences is that Β-lactamase has never been seen in S. pneumoniae before, making this a rather significant (and possibly scary) discovery.

If it's correct.

tags: , , ,

The way this sequence was identified as Β-lactamase was through a blastn search at the NCBI. And in fact, it was correct to conclude that this sequence is Β-lactamase. There are only three bases that differ between this Β-lactamase and the one from a common E. coli cloning vector.


Click to view larger image

This picture shows the two sequences aligned to each other with dots representing identical bases. I colored the different bases yellow to make the differences easier to spot.

The problem, is that are there are only three bases that differed between this sequence and one from a common cloning vector (what is a vector?). And, as others have pointed out, that same vector is also used to produce Taq polymerase, an enzyme used in the procedure for identifying the sequence. PCR is very sensitive (what is PCR?), and not just for detecting the DNA that you want to see. It's quite good at detecting contaminants as well.

You can see in the blast results below that quite a few sequences matched cloning vectors pretty well.


Click to view larger image

In fact, I wonder if those top two E. coli sequences from "clinical isolates," in Russia and France, were real results or just more PCR contamination. I'm suspicious of that Acinetobacter sequence from China, too.

And, I'm suspicious of the Klebsiella sequence from St. Louis, Mo. That one was part of a Klebsiella genome project and annotated by a nice friendly computer that probably doesn't have one skeptical program on it's hard drive.

I think it's a problem that as we sequence more stuff, we end up with more sequences in the database whose identification isn't confirmed by any other kind of supporting data.

One of the hard things about science, is that you can't just do the experiments that will give you results you want to see. You also have to think of all kinds of other possible explanations for your results - and test them.


So, now I've explained why I don't believe some of the results. Well, actually I do believe that the sequences are properly identified, I just don't believe that they really originated from the bacterial species on record.

I'll give you all one more chance to think of some other experiments that could be done to test whether the sequences really came from the bacteria that are listed in GenBank

And sequencing more DNA is NOT the answer.

More like this

I was going to say just grow the bacteria on plates containing ampicillin, but I see the strain is known to be resistant to penicillin so would probably grow. Therefore, do a western blot: boil the cells in sample buffer, run them on an SDS-PAGE gel, transfer to a membrane and probe with this antibody. As a positive control do the same with cells containing the pBR322 plasmid with the bla gene. A negative control would be untransformed cells that you know to be sensitive to ampicillin.

I suppose one could argue that the bla gene could be subject to positional silencing within the genome and therefore not be expressing beta-lactamase or expressing very low levels, such that you might not pick them up on a western blot (of course, then you could hardly say that the presence of the bla gene is what makes this bacteria resistant to penicillin, but anyway). In which case, I agree with the southern blot idea.

One other caveat is that since the bacteria are resistant to penicillin, they could be making some sort of beta-lactamase type thing which is recognized by the antibody in the western blot. Then you would definitely need the southern blot.

You bring up an interesting point, Mrs.Whatsit.

If we wrinkle time and look back at the abstract, the authors never say that the bacteria were resistant to beta-lactamase. They only say that they detected the gene.

Hey, and what about my proposal.
Easy to grow the bacteria, extract the DNA, make a few RE digests, (there are at least 18 common RE single sites within AF427133, and that makes two fragments to be identified), migrate, transfer and hybridize with the PCR product.
For each digest two band should be identified by the probe (I don't know what primers where used to define which RE would be the best choices, so the length of the probe would nicely resist at stringent washings)

Now, on the other hand, if there were a contamination from the vector used for Taq polymerase production, shouldn't the negative controls be also positive? I hope they run "negative controls" for each PCR set, without any DNA input except the primers, at least to monitor for airborne contaminants.

I want to give more people time to comment, so I'll write more about this next week.