Sequencing the campus at the Johns Hopkins University

i-88f6464d4c019e769897ee428c21834a-Picture 5.0.jpgA few years ago, the General Biology students at the Johns Hopkins University began to interrogate the unseen world. During this semester-long project, they study the ecosystems of the Homewood campus, and engage in novel research by exploring the microbial ecosystems in different sections of the campus. Biology lab students gather environmental samples from different campus ecosystems, isolate DNA, amplify 16s ribosomal DNA by PCR, and check their PCR results by gel electrophoresis.

DNA samples are next sent to the university's Genetic Resources Core Facility , where scientific staff, in the DNA Analysis Facility, prepare the DNA templates for sequencing, and load the completed reactions onto an Applied Biosystems 3730 Genetic Analyzer.

The past few years have seen some changes in this process. Data used to be retrieved by logging into an FTP site that allowed anyone to access data from any investigator.

In recent years, this JHU core facility obtained a Geospiza Finch Server, so now, instead of using an FTP site, they upload experimental data, in the form of electropherogram files (aka chromatogram or trace files), into a secure system (the Finch Server) for analysis and delivery. Students can log-in, but now they can only access student data. During the past two years, almost 500 JHU students have logged into the Finch Server to retrieve and view their data.

In the next part of the project, students use BLAST (and our BLAST for beginner's tutorial) to query GenBank at the NCBI and determine which bacterial species were isolated.

All about Phred
One issue that comes up, though, is the quality of the data. Data quality can be a problem when students do PCR for the first time in a lab class. The image below shows a screen shot from a Finch Server illustrating the distribution of high quality and lower quality data from this year's set of 87 chromatograms.

i-8c4bec936c0b8e5319a3054016d047ba-Picture 7.0.png

You can see from the histogram that about 20% of the chromatograms have fewer than 50 high quality bases. We're defining high quality, as base calls with a Phred score greater than 20. Phred, KB, and TraceTuner are programs that measure the probability of an incorrect base call. A Phred score of 20 corresponds to a 1% chance of a base-calling error.

What this histogram fails to show, though, is how the high quality bases are distributed in a DNA sequence and where they're located. The Finch Suite has programs that will trim poor quality regions of a sequence, but sometimes its still nice to see what your data look like.

I want my FinchTV
In the next step, students look at their chromatogram data in yet another Geospiza program, available for free, called FinchTV. As you can see, below, they select the high quality region of the sequence, and they can query different databases at the NCBI, just by choosing BLAST sequence.

i-5c8403a3a835e477cc79b9d41e34f702-Picture 2.png

(Warning potential bias alert: I do work for Geospiza, but I still think this is cool!).

Through this process, students learn, first-hand, about the diversity of microbial life in the campus all around them and the genetic code that's used to store information in DNA. They also learn about DNA sequence analysis and bioinformatics. Since many of these students plan to attend medical school, this lab serves a critical need in acquainting future doctors with molecular diagnostics.

Both the Finch Server and the core lab staff in the DNA Analysis Facility were important for success of the project. "We never could have done this without the advice and help we got from the people in core lab," said Dr. Rebecca Pearlman, course instructor, "We were able to get our data, talk about quality, and complete a BLAST search in a single class period."

Pearlman adds, "Our students relish the opportunity to do genuine research. They get really excited when they learn they're using the same techniques for bacterial identification that are used by the public health departments."

Of course, I'm writing about this, partly because I get to help out, too. We're making custom BLAST-formatted databanks from each session of the course, so we can do quick comparisons between different data sets, among other things. Over time, these data will allow students at JHU to study changes in bacterial composition from year to year.

And who knows? There are loads of bacteria in every little bit of dirt. What could be cooler than discovering a new species in your first quarter of college biology?

technorati tags: , , ,

Copyright Geospiza, Inc.

More like this

How did the human genome ever get finished if every one of the three billion bases had to be reviewed by human eyes? In the early days of the human genome project, laboratory personnel routinely scanned printed copies of chromatograms, editing and reviewing all DNA sequences by eye. For more…
If you've read the previous posts on this topic, here and here, you're probably aware by now that I have this weird (okay, maybe fanatical) obsession with data. Or at least, with knowing if my data are right so I can get on with life, do the analysis and figure out the results. My results from…
What do genetic testing and genealogy have in common? The easy answer is that they're both used by people who are trying to find out who they are, in more ways than one. Another answer is that both tests can involve DNA sequence data. And that leads us to another question. If the sequence of my…
Sometimes asking a question can be a mistake. Especially when your question leads to more questions and having to question things that you didn't want to question, and pretty soon you begin to regret ever opening the file and looking at the data and asking the question in the first place. Sigh.…