A long standing debate in my field is whether or not biologists, who work with computers, need to learn how to program. I usually say "no." Let the programmers program, the biologists interpret the results, and let everyone can benefit from each other's expertise.
Well, I've changed my mind in one respect. Most biologists need to work with some kind of database these days and I've discovered that it's really helpful to know something about SQL. Even a tiny bit of SQL, like "SELECT * from table" goes a long, long way.
This revelation didn't happen overnight and when I decided a few months ago that I wanted to learn SQL, I wasted a bit of time trying to find the best place for learning. I began by picking up a copy of "SQL for dummies" and starting to read. And ask questions.
It wasn't long before my husband took the book out of my hand, glanced through it, and told me to throw it away.
He suggested I consult Google instead and check out some SQL tutorial sites. Today, I'm going to share what I found.
In many respects, I'm a kinesthetic or "hands-on" learner. Reading is okay, but if I go to bed with an O'Reilly programming book, all I get from it is a good night's sleep. The information just doesn't stick unless I try to use it. Even some of the biological phenomena that fascinate me, like transposons that jump between molecules of DNA, were really, really bothersome, until I figured out how to make paper models and mimic the process myself.
Naturally, my patience and interest quickly vanish when I hit web sites that read like a UNIX man page.
Yawn.
They are helpful, once I understand what I'm reading. But they didn't help me get started. Still, I do use them as a reference.
- The introduction to SQL from W3schools.com is fine if you want the Dragnet version of SQL ("just give us the facts ma'am, just the facts.")
- You can fill in the details with SQL guides.com. They even use brightly colored text! I always fall for things like that.
But just reading about SQL was not enough for me. I needed a safe place to practice. And, I found it at the SQLzoo. They have nice tutorials and sandboxes where you can do the essential thing and practice writing your own SQL statements.
Plus, for those of us who must be multilingual when it comes to flavors of database, SQLzoo covers all kinds of databases from postgreSQL to Oracle. If you want to have some with SQL, I highly recommend the SQLzoo.
- Log in to post comments
I love messing around with programming, and it has served me pretty well with my research. Of course, basic SQL is not programming, it is just using a database. Those of us who started out in Unix or Dos, and with dBase, used SQL as the only way to access our data (spreadsheets were not powerful enough at the time, and of course still have limitations).
The resources you suggest are good. I also suggest, if you want a book, you try this:
http://www.oreilly.com/catalog/sqlnut/
O'Reilly books are generally very good.
The bookshelf next to my bed has about 20 books on programming, including a few PHP/SQL titles (the two are often used together because the main use of SQL seems to be in building dynamic web pages, if you go just by what books are available), a few titles on BASH programming, a couple each for Python and Perl, and my favorite, Awk and Sed. I love awk and sed.
Oh, let me add this: Regarding the first on line resource you cite for SQL ... if you go to the top of that site, you will find that the same site covers HTML and other important areas, also very nicely. That is my main HTML reference site.
Well, okay, querying a database isn't really programming. But as far as most biologists are concerned, it's close enough. :-)
The worm community used AceDB for 10+ years and a good fraction of them learned its query language, AQL. So it can be done. So put an SQL database on their desktops full of data they need and sure biologists *can* learn. But only a small fraction of them will like it or take to it.
For databases I think it is better to give biologists a big spreadsheet of data. Most datasets fit easily in a spreadsheet. More intuitive, easier to use and all around more useful.
So where are the biologists who are using Excel well? Someone near and dear to me is very good in Excel, has no trouble finding work whenever she feels like rebuilding other people's cranky and ill-designed spreadsheets til they work, and work better and faster.
I'd love to point her toward people who use Excel for scientific work to do more interesting things than chase dollars around.
Jim: I agree. Most biologists are not going to try SQL unless they have to. I'm an example of that. I think a combination of SQL and Excel works the best.
Hank: I think almost all biologists use Excel or something like it. Of course, that doesn't mean they use it very well or that they'll hire anyone to do it better. If biologists are going to hire people to work on spread sheets for them, they would expect the people they hire to have wet lab skills, too, and to be good lab techs.
Sandra, you've brought up a point that is near and dear to my heart. I have a background in programming, desktop publishing, wet lab, bioinformatics (metagenomic sequencing analysis)and of course, Excel, Access, and PowerPoint. All handy knowledge for start-ups. I've been working for others for a long time and decided to start my own Virtual Assistance business and work out of my home. I really enjoy the variety of work that I'm doing, but would love to add more bioinformatics to the mix. Finding start-ups that need my help, have the money to pay for my services, and the ability to overlook my lack of academic qualifications is tricky :-) Any suggestions?
Thanks Melodee,
I know a few people around who have small, consulting businesses like yours. Our company started with a similar premise before we decided to focus more on building products.
I'm probably not telling you anything new, but I suspect the best way to find potential customers is to network like crazy. Go to as many conferences as you can. Get involved with your local biotech association so you can meet people. Advertise at your local University. And make use of tools like LinkedIn.
Greg Laden: SQL *is* programming. Even simple SQl commands are programs. As you probably know, SQL stands for Structured Query Language. So a better way to think about it is a simple SQL statement is a simple program. The confusion might be that SQL is a functional language, and not a imperative programming language. If you are only familiar with imperative languages or their cousin object oriented languages (C, Java, C++, Basic, etc), you might not recognize a SQL statement as a program. SQL is also a domain specific language, and not Turing complete, so anything cannot be expressed in SQL. But still...it is programming. Just a very specific type of programming.
SQL, like all computer languages, is designed for the convenience of the machine, not the user. Far better, in principle, to use a good graphical interface, allowing you to select fields and operations simply by clicking. Squirrel SQL is one possibility. I haven't tried it, but it's in the right direction.
Ropty: I agree with you, but I don't think the presence of the word "language" is a strong argument for calling SQL a language. If you continue to follow that logic, then presentation languages like HTML and XML would be programming languages too, and they're not.
John: It is better to have a good graphical interface. But, if a graphical interface is the only tool for making queries, it can be too limiting. I have this problem with the NCBI databases all the time.
If a graphical interface is your only tool, it might be hard to allow scientists to ask novel questions that haven't been anticipated by the software architects.
I want the ability to ask new questions. SQL gives me that ability.
I hate to break this to you, but IIRC since the 1999 revision of the standard, SQL supports recursion.
That makes it Turing complete, which makes it a programming language.
So, every-time you've written a query you've been programming.
Not that there's anything wrong with that. =)
Um, no. The machine has no concept of convenience, and would be just as happy (or not) to get its input as 1s and 0s or hex. Computer languages are written for the convenience of humans. Those humans are just programmers, not end users.
IIRC, SQL in particular was designed to be more human-friendly than other computer languages... and by and large, it is. At least compared to C, or Perl regular expressions, IMO.
I'm always trying to push SQL on fellow students. So many times I see people hand editing data in text files or Excel spreadsheats that would be so simple to manipulate with a SELECT statement and a JOIN.
That and find/replace with regular expressions are the two things scientists should need to know to be considered computer literate.
As a programmer, I've run into this limitation with every graphical programming interface I've ever used. A good GUI is indispensable when you're trying to make the easy things as easy as possible, but writing directly to a text representation of the program (or query, or whatever) gives you the freedom to build things that no GUI designer has anticipated, and that power always ends up being worth the slightly higher entry barrier.
Hank: I think almost all biologists use Excel or something like it. Of course, that doesn't mean they use it very well or that they'll hire anyone to do it better.
Almost all biologists use Excel, but I've never heard of any of them building the kind of complicated Excel machinery that you get in business, and to which I understood Hank to be referring.
I'd go a step further, and recommend learning a scripting language like ruby, python or perl. All of these languages easily support database queries, and can be very handy for mangling data into different formats.
Once you've picked up a scripting language, you start to grasp the power of automation at your hands. There's literally no form of data you can't manipulate via available libraries.
Ropty:
(regarding "SQL is not prgramming")
You are correct. I can't believe I said that. I don't know why I said that. I'm the kind of guy who could be convinced that HTML is "programming" because it is code. I look back at that post and say "What a troll this guy is, who wrote that?" then I look and it is me, and I stare in disbelief.
In fact, you could probably write pretty much any program (functionally) in SQL as long as you had a few outside procedures you could call. You might be using databased to store some data structures that would in another language, etc. and it would be clunky, but you could do it.
You wouldn't want to do that, but you could.
Imagine this: Get a medium-complex single function computer program project, and set up a team to complete the project in something appropriate (like C or whatever), and another team in SQL and another in, say, Awk, oh, and a team working in BASH. How cool would that be.
(this probably happens in dorm rooms at MIT every other saturday night)
SQL is a very handy tool. While it takes some time to design a schema, create tables, import data, etc, once you have it in place get data out is fairly straightforward. And while Excel has its uses, I really haven't used it in 4 years.
A good spreadsheet, like gnumeric (or Excel if you don't mind giving up a lot of computation and mathematical flexibility and don't care about the graphs) is one thing ... two (kinda three) dimensional, visual, and allows for in place modeling. A database is lousy at all those things but is really really good at data (big surprise!).
People use spreadsheets as databases all the time, and it can work, but it can also go very badly. For the last (still onging) big data project I've worked on, we went with the spreadsheet model. I am about to convert everything back to data for a final statistical analysis. The spreadsheed did not fail us, I'm just worried about standards and documentation and such.
@engtech - I definitely agree. In fact, I don't know HOW anyone can do any work without being able to program (at least trivially). Knowing even the smallest bit of Python (or R) is a life-saver.
I shudder every time I watch people in my department laboriously cutting and pasting and manipulating data in something like Excel or SPSS (both EVIL pieces of crap).
--Simon