bioinformatics

Goodbye desktop, we're off to see the web. Both my students and I have been challenged this semester by the diversity of computer platforms, software versions, and unexpected bugs. Naturally, I turned to the world and my readers for help and suggestions. Some readers have suggested we could solve everything by using Linux. Others have convincingly demonstrated that Open Office is a reasonable alternative. But, now there's something new and cool on the web. Okay, it's still in the beta stages, and apparently it can only be used by a limited number of people at a time, but it's certainly…
Is it real or is it April Fools? The March 21st issue of Science has an interesting news article by Elizabeth Pennisi and a letter to the editor about a proposal to wikify GenBank. Currently, the NCBI holds the original authors responsible for editing or correcting entries and this does cause problems when those authors fail to return to the scene and fix what they've submitted. Some researchers are suggesting that third parties be allowed to fix some of those mistakes or at least add comments to records, to warn the unwary. There are some good arguments on both sides and it's certainly…
I made this video (below the fold) to illustrate the steps involved in making a phylogenetic tree. The basic steps are to: Build a data set Align the sequences Make a tree In the class that I'm teaching, we're making these trees in order to compare sequences from our metagenomics experiment with the multiple copies of 16S ribosomal RNA (rRNA) genes that we can find in single bacterial genomes. Bacteria contain between 2 to 13 copies of 16S rRNA genes and we're interested in knowing how much they differ from each other. Later, we'll compare the 16S ribosomal RNA genes from multiple species of…
Have you ever wondered how to view and annotate molecular structures? At least digital versions? It's surprisingly easy and lots of fun. Here's a movie I made that demonstrates how you can use Cn3D, a free structure-viewing program from the NCBI. Luckily, Cn3D behaves almost the same way on both Windows and Mac OS X. Introduction to Cn3D from Sandra Porter on Vimeo.
One of my colleagues has a two part series on FinchTalk (starting today) that discusses uncertainty in measurement and what that uncertainty means for the present and Next Generation DNA sequencing technologies. I've been running into this uncertainty myself lately. I have always known that DNA sequencing errors occur. This is why people build tools for measuring the error rate and why quality measurements are so useful for determining which data to use and which data to believe. But, some of the downstream consequences didn't really hit home for me until a recent project. This project…
I think all of us; me, the students the OO advocates, a thoughtful group of commenters, some instructors; I think many of us learned some things that we didn't anticipate the other day and got some interesting glimpses into the ways that other people view and interact with their computers. Some of the people who participated in the challenge found out that it was harder than they expected. Lessons learned Okay, what did we learn? 1. The community is the best thing about Open Source The Open Office advocates enjoy a challenge and are truly, quite helpful. That was something that adventure…
It's a Solexa data directory. I've held off on blogging about Next Generation Sequencing here, but now that one of my colleagues has started blogging about it, it seems like a good time to write a little about FinchTalk, our company blog. We've decided that we can serve an educational role for people who are interested in Next Generation DNA Sequencing. Certainly, FinchTalk is our company blog and it is a place where you can expect to read about our products. But, we've noticed that quite often, the sexy technologies and fancy graphs get the press and the practical aspects - how do…
Okay OpenOffice fans, show me what you can do. Earlier this week, I wrote about my challenges with a bug in Microsoft Excel that only appears on Windows computers. Since I use a Mac, I didn't know about the bug when I wrote the assignment and I only found out about it after all but one of my students turned in assignment results with nonsensical pie graphs. So, I asked what other instructors do with software that behaves differently on different computing platforms. I never did hear from any other instructors, but I did hear from lots of Linux fans. And, lots of other people kindly…
I read about this in Bio-IT World and had to go check it out. It's called the Genome Projector and it has to be the coolest genome browser I've ever seen. They have 320 bacterial genomes to play with. Naturally, I chose our friend E. coli. The little red pins in the picture below mark the positions of ribosomal RNA genes (It's not perfect, at least one of these genes is a ribosomal RNA methyltransferase and not a 16S ribosomal RNA.) I'm not entirely happy about finding it now, after I've already written and posted all the assignments for my class, but still, I'll post a link for my…
Over at evolgen, ScienceBlogling RPM discusses a paper that describes a new barcoding technique for plants. It struck me while reading his post that barcoding has two very different meanings, even though both techniques are used in genomics--and often, at the same time. One meaning of barcoding, and the one discussed by RPM, is the use of a gene to assign different groups of organisms a taxonomic DNA label (or barcode...). In other words, we're replacing Latin bionomials, like Escherichia coli or Homo sapiens, with a DNA sequence from a single gene (or a set of closely related sequences).…
The other day, I wrote that I wanted to make things easier for my students by using the kinds of software that they were likely to have on their computers and the kinds that they are likely to see in the business and biotech world when they graduate from college. More than one person told me that I should have my students install an entirely different operating system and download OpenOffice to do something that looks a whole lot harder in Open Office than it is in Microsoft Excel. I guess they missed the part where I said that I wanted to make the students work a little easier. Before I go…
The NASA Earth Observing System is an incredible resource for both science and education. One of the amazing things about it is all the different kinds and quantities of data are assembled together into pictures that even grade school kids can immediately comprehend. How do they do it? Each of the EOS satellites delivers a terabyte or more of data per day from many different instruments. How do they take satellite imagery, rainfall statistics, temperature information, and other kinds of data and assemble these data into meaningful pictures? The answer is HDF (hierarchical data format…
Three (or more) operating systems times three (or more) versions of software with bugs unique to one or systems (that I don't have) means too many systems for me to manage teaching. Thank the FSM they're not using Linux, too. (Let me see that would be Ubuntu Linux, RedHat Linux, Debian Linux, Yellow Dog Linux, Vine, Turbo, Slackware, etc.. It quickly gets to be too exponential.) Nope, sorry, three versions of Microsoft Office on three different operating systems are bad enough. This semester, I'm teaching an on-line for the first time ever. The subject isn't new to me. I've taught…
Here's a fun puzzler for you to figure out. The blast graph is here: The table with scores is here, click the table to see a bigger image: And here is the puzzling part: Why is the total score so high? If you want to repeat this for yourself, go here. You can use this sequence as a query (it's the same one that I used). >301.ab1 CTAGCTCTTGGGTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCCGATGGAG GGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGTGGGGGA CCTTCGGGCCTCACACCATCGGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGG CTCACCTAGGCGACGATCCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAGA…
http://plazi.org/ Donat Agosti's group has launched Plazi, a set of tools that translates flat paper taxonomy into dynamic web content. This technology is significant: it means the content of old literature can be extracted automatically into databases. Taxonomic names are tracked and linked to external information, and collecting locations are linked to maps. This will be a valuable time-saver for taxonomic research. As an example, my doctoral thesis was a fairly traditional piece of work: a book length taxonomic revision, all done in flat text on a word processor. Plazi has turned it…
Do different kinds of biomes (forest vs. creek) support different kinds of bacteria? Or do we find the same amounts of each genus wherever we look? Those are the questions that we'll answer in this last video. We're going to use pivot tables and count all the genera that live in each biome. Then, we'll make pie graphs so that we can have a visual picture of which bacteria live in each environment. The parts of this series are: I. Downloading the data from iFinch and preparing it for analysis. (this is the video below) (We split the data from one column into three). II. Cleaning up the data…
This is third video in our series on analyzing the DNA sequences that came from bacteria on the JHU campus. In this video, we use a pivot table to count all the different types of bacteria that students found in 2004 and we make a pie graph to visualize the different numbers of each genus. The parts of this series are: I. Downloading the data from iFinch and preparing it for analysis. (this is the video below) (We split the data from one column into three). II. Cleaning up the data III. Counting all the bacteria IV. Counting the bacteria by biome Part III. Pivot tables from Sandra Porter on…
What do you do after you've used DNA sequencing to identify the bacteria, viruses, or other organisms in the environment? What's the next step? This four part video series covers those next steps. In this part, we learn that a surprisingly large portion of bioinformatics, or any type of informatics is concerned with fixing data entry errors and spelling mistakes. The parts of this series are: I. Downloading the data from iFinch and preparing it for analysis. (this is the video below) (We split the data from one column into three). II. Cleaning up the data III. Counting all the bacteria…
For the past few years, I've been collaborating with a friend, Dr. Rebecca Pearlman, who teaches introductory biology at the Johns Hopkins University. Her students isolate bacteria from different environments on campus, use PCR to amplify the 16S ribosomal RNA genes, send the samples to the JHU core lab for sequencing, and use blastn to identify what they found. Every year, I collect the data from her students' experiments. Then, in the bioinformatics classes I teach, we work with the chromatograms and other data to see what we can find. This is the first part of a four part video series…
I love the way you show me secret things. All I do is type: Select * from name_of_a_table And you share everything with me. Without you, my vision is obscured, and all I see is the display on the page. In fact, this was the push that finally made me decide to learn SQL. In our bacterial metagenomics experiment, I realized that my students could use FinchTV to enter their blast results into our iFinch database. That was cool, but with the web interface, we could only view one result at a time. On the other hand, if we use the right SQL query in the iFinch query window, we can see…