Bioinformatics for biotech students: my favorite computer programs

The bioinformatics classes that I teach use web services and web sites as much as possible, but I still find that it's helpful to have programs on our classroom computers. Here is a list of my favorite desktop programs for those of you who might want to add some bioinformatics activities to your biology courses.

Why not use the Web?
Before going on, I should probably explain, why we use desktop programs, we have so many things available on the web. We do use the web whenever we can. Web services are nice because you can shift the computation burden to someone else's computer. (I think this perfectly fair in most cases, since we've paid for many of these with our tax dollars.) However, sometimes, the desktop versions are much nicer than the versions that you can run through web services. JalView (www.jalview.org) is much nicer as a desktop program than the version we can run through the Embl web server. Some programs, like FinchTV, only run on desktop computers. And then there is the problem of speed and predictability. If you're at work and a network is unbearably slow, you might find something else to do. If you're teaching a class, with a limited time, it can be excruciatingly painful to have all of your students wait for an unpredictable amounts of time for their results.

My favorite desktop programs
1. A web browser I like Firefox because of it's ability to search for text and highlight all instances of that text in a noticable color. Plus, there are links at the NCBI web site that don't work in Safari.

2. Adobe Acrobat Reader We can't read pdf documents without it, and I have to use pdf documents because other file types behave in too many different ways with different web browsers.

3. FinchTV - We use FinchTV to look at trace files and learn about data quality (or lack thereof). Examples are here and here. Plus it's free and runs on most kinds of computers (Windows, Mac, Linux).

4. JalView We use JalView for viewing multiple alignments of DNA or protein sequences. The alignments themselves are created in either Clustal or Muscle - you can either open the aligned sequences in JalView or load the sequences and then, if you're on the Internet, select the option to generate the alignment.

The only downside of using JalView is that the documentation isn't complete (in my opinion). For example, if we use the option to generate alignments with Clustal, we don't get any information about the parameters that were used for this step. We don't know if the right matrix was used, etc. So - that part is typical, but it isn't good.

You can see some examples of JalView in action here and here.

5. Cn3D. My favorite structure-viewing program. You can do so much more with Cn3D than you can from using web-plug-ins.

6. Microsoft Word and Excel. Believe it or not, I used Word to help assemble a DNA sequence, several years ago. For people who are never going to need or want to program, Word is nice and easy. You can search for strings of text, count characters, work with text, etc. And lots of bioinformatics activities involve those kinds of things, just on a larger scale.

Plus, students can take notes in Word and e-mail the notes to themselves at the end of class.

We use Excel to sort, count, calculate, and graph. Again, it's simple and familiar, and for an introductory course, or people who are never going to need to program, it works just fine.

7. NJ plot - this is a really nice application for viewing and working with phylogenetic trees.

8. Sometimes we also use the Phylip package of programs from Joe Felsenstein at the University of Washington.

Notice - I'm not listing programs like blast since we often run that at the NCBI.


What your favorite desktop programs for teaching with bioinformatics?

More like this

I am not doing real bioinformatics but rather simple sequence analysis and assembly for gene targeting vectors.
After working with GCG for more then 10 years it was uninstalled from our central server two years ago. So I had to shift to another program and ended up with Vector NTI. We had to pay quite some money for our licenses a few years ago but you can get a free license from Invitrogen now wich allows you to run the program on three different computers. I finally got used to all this clicking through different menues and windows. Indeed, putting sequences together is much easier then with GCG. What I really appreciate are the vector maps if one just uses them to get an overview.
Still, sometimes I really miss GCG with all its limitations. E.g., when I have to find some cloning or southern stratgies for long targeting constructs I still prefer the old GCG mapplot. For this purpose I currently use watcut (http://watcut.uwaterloo.ca/watcut/watcut/template.php) which gives mapplot like results. Please note that it does not allow ambiguities.

I doubt that vector NTI is the most useful program to teach bioinformatics (I hope they have improved the handbook since VNTI 7). However, since many Biotech companies use VNTI I guess one should ast least have an idea about its basic function to qualify for jobs tere.

You should try out MEGA (note, I know the developers, so I may be a bit biased). It has Clustal built in to the program (and they're really proud that their implimentation of clustal does not crash). You can do all your alignments in MEGA, then easily export them into the analysis part of the program and built trees, calculate distance matrices, etc. It's the best desktop molecular evolution software for teaching (and it's pretty good for dealing with small data sets in actual research). It's also set up to blast sequences against NCBI from within the program. It does not run on Macs, though.

For popgen analysis, DnaSP is really good. If you ever have polymorphism data, it can do a lot of useful molecular popgen statistics.

I'd like to find a simple, desktop dotplot program that is freeware. I often use the dotplot option in the DNAstar software package, but that software is VERY expensive and crashes a lot.

one bad thing about VNTI. Sometimes it messes up exon numbering when you import sequences that contain several genes directly from genbank.

Rosie - thanks! I especially like using Word to teach about restriction mapping.

Sparc - What is "real" bioinformatics? I would say that you're doing it or at least using it. Are you sure that Accelrys isn't selling some of the old GCG programs? I thought GCG items were some of their most popular products. I'll have to try mapplot that sounds like fun. I haven't really used Vector NTI, but I suppose down the road, I might.

RPM - I will check out MEGA since I do get tired of Clustal crashing. In fact, that's been one of the reasons that I like JalView - I have no idea what parameters Clustal is using, but the server that it uses seems to work pretty well.

I DO like DnaSP. I used it for a consulting project that I worked on and it was nice. The only downside was having to use Virtual PC so that I could run it on my Mac.

For dotplots - some people use a freeware program called "Dotter" I haven't tried it myself, so I don't know how well it works.

For drawing dotplots in research projects, we use a program that my husband wrote, called "DrawMap." But DrawMap only runs on UNIX, and it's not supported by anyone, so while I would use it for myself, I wouldn't use it for teaching.

Are you sure that Accelrys isn't selling some of the old GCG programs?

I don't know. Our institute shut down GCG for several reasons:

1. it is quite expensive

2. they set up their own blast server and mirrored the EMBL database with some custom features which is now daily updated. GCG had its own data format and it took quite some time to transform the EMBL updates into this format

3. At that time (I guess it was 2001) GCG programs were limited to 30 kb sequneces, which of course is insufficient if you do genome sequencing as some departments in our institute do

4. EMBOSS freeware became available, which offers similar features. You may test EMBOSS programs here:
http://bioweb.pasteur.fr/intro-uk.html
and a list of the programs here:
http://bioweb.pasteur.fr/seqanal/EMBOSS/

This is a great post. I'm a biotech engineering student and our program doesn't have any bioinformatics classes, so I've been trying to get into this on my own. It's still sort of a blur for me...
I'll keep on reading and hope to grasp it a bit better soon enough...

Like Rosie and yourself, I used to use Word for simple sequence analysis and manipulation, but I long ago discovered the joy of BBEdit. It can open genome-sized sequences that Word barfed on, and has powerful grep-based search/replace tools. I would hate to go back to Word, except for fancy formatting uses (BBEdit is a strict text editor). BBEdit only runs on Macs, but several WinTel-using colleagues swear by UltraEdit for similar reasons.

BBEdit: http://www.barebones.com/products/bbedit/
UltraEdit: <http://www.ultraedit.com/index.php?name=UE_MoreFeatures

By Guy Plunkett III (not verified) on 24 Oct 2007 #permalink

I want to make a distinction between the programs that I use for my work, where I'm working with a computer 8 or more hours a day, and the programs that I think are best suited for a class where students rarely work with computers and if they do, they're using different tools and techniques.

For my work, my favorite text editor is AlphaX, which I found to be much nicer than BBEdit. But I will use vi on occasion, too. It just depends what I'm doing.

One of the principles of good teaching to make connections and build on things that your students already know. Since they already know how to use Word (at least a little), I start from there.

I don't want to overwhelm them with having to learn several different programs all at once. Especially since I'm trying to teach them how to analyze the data, not how to work with text editors.