Teaching in the digital world, part 1a. Excel, Open Office, or Google Docs?

In part I, I wrote about my first semester of teaching on-line and talked about our challenges with technology. Blackboard had a database corruption event during finals week and I had all kinds of struggles with the Windows version of Microsoft Excel. Mike wrote and asked if I thought students should be working more with non-Microsoft software and what I thought the challenges would be in doing so.

I can answer with a totally unqualified "it depends."

First, I think knowing how to use a spread-sheet program is an advantage in many different kinds of fields and even in real-life, outside of school. I've been using Microsoft Excel for 19 years for many different kinds of things and it's very useful. Lately, I've been using Google Docs, too, since I can share my spread sheets in Google Docs, more easily with other people.

But that's me. If it were totally up to me, I'd say that all the students should just use what I'm using. That would certainly make my life easier. But I also think that would be incredibly selfish and would defeat the purpose of why I teach. I don't teach classes to benefit myself, I teach classes to help the students.

If students had wanted to use Open Office or some alternative, that would have been fine with me. My goal was for them to properly analyze the data.

What about the students?

For students, the choice of a program should depend on these items:

  1. What program will students be expected to know when they graduate?
  2. What program has the features that are needed?
  3. What program are the other instructors using?

Let's continue.

1. What program will students be expected to know when they graduate?

If I taught in a software engineering program or programming area, I would use Open Office without hesitation. I don't know if these kinds of fields ask students to do much in the way of data analysis, but I do know that lots of programmers like open source kinds of things and that's a kind of standard in the programming field.

If I'm teaching in a biotechnology program or bioinformatics, I would use Excel.

I have heard several people from the biotechnology industry describe the skills that they want from future employees. Every single person has said "they should know how to use Excel." When I ran a biotechnology program, our industrial advisory board members all said the same things and in fact, fluency with Excel and excellent lab notebooks were probably the two most important skills that helped our graduates get good jobs.

If students want jobs, they need to be able to write that they know how to use Excel on their resume'. Whom am I to deny them that opportunity when everyone else demands it?

Why is Microsoft Excel the industry-standard program in biotechnology?
Because Excel can be validated. People know how to validate it and how to work with it.

If a company makes products related to human health, and is successful, it will be governed at some point by a complex regulations, and the software by 21 CFR part 11. It is much easier for biotech and pharmaceutical companies to work with software that can be validated.

2. What program has the features that are needed?
Unfortunately, while I can use Google Docs widgets to make pivot tables, it's not easy, and I can't parse data into different columns or clean it up very easily. So, while I like it, it's not the complete answer just yet.

As far as Open Office, I asked the Open Office community for help with this. The OO community demonstrated, quite convincingly, that Open Office can do the things that I need it to do. I do plan to check it out this summer, though I have some reservations (see #1).

Microsoft Excel can do the things that I need, but there is a very problematic bug with the Windows version and people have warned me about using newer versions of Office for the Mac.

I may end up with a combination of Excel and Google Docs but the jury is still out.

3. What software program is used by the other instructors?

This last point is important from a pedagogical standpoint.

I first started teaching students how to use software about 15 years ago. I didn't intend to do this. I taught a course in Recombinant DNA technology and I had every expectation that the students would learn about computing their laboratory computer class.

But I thought that same thing math. In both cases, I was dead wrong.

I fought it at first, but I knew that would be a disservice to the students. Consequently, I taught lots of math and computing skills throughout the time I was a biotech instructor and even afterward, when I've taught extension courses.

Through this experience, I learned that it is better for students to learn one kind of program (word-processing, spread-sheet, drawing, web-browser, slide-making, molecular viewing) really well. After they have learned one program really well, they will have the confidence and knowledge to transfer those skills to other programs.

Most college courses assume that students already know how to use the programs. They treat using software like college courses used to treat typing when I was an undergraduate. They expected that everyone knew how to do it before starting college, but it wasn't actually a high school requirement because it was considered a vocational subject.

Everyone types don't they?

Most college classes don't have enough assignments, with data analysis, to give students a chance to learn any one program really well. Therefore, since all the college computing labs are usually equipped with Excel, and since students can buy it a highly discounted rate, and since most instructors know how to use it, there is a strong case, for using Excel.

If students learn one program well, they can transfer that knowledge to another; after they graduate.

I guess I'm an open source atheist. If the open source programs are the standard, and work the best, that's what I'll use. If Microsoft programs are the standard, and work the best that's what I'll use.

In the end, I care more about students than I do about ideology.


More like this

I struggle with the same issues. I use Linux for all of my own work, but my campus is an MS shop. I end up using Excel in my classes, even though OpenOffice.org more than meets my needs (I'm an ecologist). I agree with you that it seems a good thing to not only teach students to use a spreadsheet of some kind, but also to teach them to use the very one they're likely to see in their jobs. At least, that was my opinion up until Office 2007.

Office 2007 is radically different than previous versions, and I'm spending a lot of time re-learning how to do things I already know how to do in Office 2003. At this point it's easier for somebody who is good with Office 2003 to use OpenOffice.org than it is to use Office 2007. So, even students who are trained to use a particular version of Excel are going to have to be able to re-learn their skills as new versions are released. If Microsoft doesn't mind forcing students to re-learn how to use a spreadsheet, I don't see why I should worry about it. Ideally, I'd like my students to recognize that any decent spreadsheet will have a standard deviation function (for example), and be able to figure out the function on the particular package they're using. Personally, I'm seriously considering using OpenOffice.org in the future.

Better to understand what is a standard deviation and when to use it. Pulling the right choice from the spreadsheet's formula well could be done by a chimpanzee (no offense meant to the chimps) no matter who can see the spreadsheet's source code.

I suggest the reason this choice (Excel vs the world) is difficult is precisely because it's not important to your mission.

Learn to cope with diversity and encourage students to do the same.

By Matt Platte (not verified) on 01 Jun 2008 #permalink

"Because Excel can be validated. People know how to validate it and how to work with it."

No, not really. Because it's closed source, it's not possible for others to validate that it's doing what it's supposed to, except by redoing all calculations elsewhere (and hoping there are no mistakes). Further, it's already been invalidated - there are several known statistical problems that have persisted in the program for years, and there are still strange errors that come up with it - eg, search for Excel and 65535 (although that one has been fixed)

Thanks, Sandra.

Where I work, Excel is the only application available for spreadsheets and the corporation has not upgraded to 2007 for several reasons. Of course backwards compatibility is a huge thing, a new learning curve is another, and expense is a third.

Before I came to work where I am, the question of how strong my Excel skills are came up in every interview so I can vouch for your position.

I understand what Matt is saying about how understanding the process of deciding what statistical tools to use to analyze results and data are more important than the software tools, but practical limitations and the ability to share data do require additional considerations.

Further, Sandra, your point on how students don't come to college prepared to use the tools required in college indicates to me that the universities are spending so much time on remedial education (including the maths and even language arts) that I wonder at how they manage to achieve their missions.


Validating software for an industrial process is not the same thing that you're describing.

When you validate software in manufacturing, for an FDA-regulated process, you are assuring that it does what the standard operating procedure says it will do.

The advantage for companies that use Excel is that there are several books and guidelines for validating processes that use Excel. As far as I know, the equivalent documentation doesn't exist for other kinds of programs.


Even with Excel, students need to know which standard deviation formula to use, the SD of a sample, or the SD of a population.

I would vote for diversity in programs, but my teaching experience says that's not as easy as it sounds.

Should cost/availability be considered as well? Microsoft Office comes with a hefty price, even for student versions (~$150 IIRC). When compared to OpenOffice (which is free), demanding Excel for students seems to be less favorable.

I don't think that price is unreasonable. At $150, Office is cheaper than a single science textbook. Unlike a science textbook, students can list Excel on their resumes, use the same program for multiple courses in multiple years, and find it in all of their school computer labs.

I've never demanded that students use Excel, only that they be able to complete the assignments. So far, I haven't had any students ask if they can use an alternative program.

you may be treading on thin ice in your use of "validate", there. you seem to mean people have written books describing how to perform certain given business processes with (one hopes, specified versions of) Excel so as to hopefully get the correct answer out of the software; presumably, these books guide their readers around the known bugs in the given versions of Excel in some standardized manner.

now this may be common usage of "validate" in some circles, but at least to me, it feels iffy. who guarantees that the given procedures will actually produce the correct answers, and how do they know that? Excel is closed-source; it has to be black-box analyzed for any process you wish to perform with it, preferably over the entirety of a given range of data, and the analysis repeated for each given version and patch level of Excel.

maybe it's because i'm a programmer, and used to seeing the phrase "formal verification" (of software) as meaning something very much stricter, and my brain making a short-circuit between the words. one certainly cannot formally verify Excel; even Microsoft likely could not do that.

Excel's bugs and problems may be well known, and to some extent possible to work around, but they're most certainly there. i've run into some of them myself. just for one, importing from and exporting into portable file formats (in my case, plaintext CSV files) can mess with data that happens to be large strings of numbers, such as credit card numbers; they should be treated as strings, but Excel tends to treat them as integers instead, occasionally losing precision because of its internal representation of numbers. a little while ago i heard of geneticists massaging gene sequence data in Excel, and having some of their strings randomly converted to date format because Excel thought they resembled the English-language representation of months and dates.

By Nomen Nescio (not verified) on 02 Jun 2008 #permalink

"Validate" is one of those words that has different meanings in different settings.

I don't have a time for detailed answer right now, and you don't have to take my word for this.

A Google search with the phrase "validating Excel in pharmaceutical manufacturing" gives the documents and courses that I'm referring to.

Perhaps (probably!) I got it backwards. General purpose business tools are simply inadequate, and we all agree about that, but it's the *biology* that's off-topic.

Now would be a good time for development of DigitalBIO, a (validated) tool that would accept raw data from field/lab work and produce a glittering variety of charts, graphs and posters -- completely annotated with procedures and formulae. DigitalBIO finally makes good on the promise that the phrase 'electronic book' implies. DigitalBIO is simultaneously a simple field notebook, a worksheet for calculations and statistics, a plain old dead-tree document, and a presentation document with embedded video, animations and live interaction.

Using DigitalBIO, even the laziest student could see exactly how counting arthrobacter in the creek leads to a wacky, one-column-goes-negative bar chart.

DigitalBIO leverages a blend of Open Source and proprietary technologies and tools to produce a cross-platform (Mac, Linux and, if necessary, Microsoft) product.

DigitalBIO comes in several basic configurations and can be customised to fit your own research area. Got special presentation needs, unique data preparaton procedures? No problem, as DigitalBIO is fully customizable either by you, your students, the community, your institution's CS department or a variety of developers: No proprietary lock-ins! Best of all, it's built with today's technology, not crufty, fossilized 20th Century code.

Why stop with DigitalBIO? Roll out modified versions for DigitalASTRO, DigitalPALEO, DigitalPSYCHO -- you get the drift.


... well yeah, you're right: it doesn't actually exist, as such. But you are in a good position to make it happen. Looky here:


By Matt Platte (not verified) on 02 Jun 2008 #permalink

I agree with WBK above, in that I too have found open office to be a much more Excel-like than Excel 2007. Unfortunately, it is true that people ask for Excel, period, and if you want your students to get jobs, you ought to teach them Excel if you can afford it.

Now, if you want to really stir people up, Sandra, you could blog about what programming language biologists should learn!

By Martin Gollery (not verified) on 02 Jun 2008 #permalink

non-programming specialists learning programming languages? if you've a head for it, look into Lisp or a Lisp-derivative like Scheme; if not, stick with Python. get no more exotic than OCaml or Haskell, and for FSM's sake avoid the C derivatives (C, C++, C#, Java and anything lower level than Java). domain-specific languages like Matlab or one of the packages statisticians use instead of Matlab to do nothing but statistics with are fine, too.

basically, any time a programming language forces you to spend any significant amount of thought on how the machine will be running your program and representing your data instead of thinking about what your problem is and how to algorithmically describe its solving, you're in territory better suited to professional programmers, and even they would do well to avoid going there if they can find a tool (compiler) that'll go there for them.

By Nomen Nescio (not verified) on 02 Jun 2008 #permalink

The future of databases and open platforms for research and teaching will involve applications like Caboodle that allow students, teachers, and scientist to create their own data collection forms and reports on the web. It doesn't require expert knowledge like Microsoft Excel and Access and can be shared with ease on the internet. Imagine having the ability to launch a multi-site research project in a matter of minutes with zero startup cost and no maintenance. Caboodle addresses a lot of the issues mentioned in this post.

Nice subtle plug there Mr. Jeffrey Austin White. My question for you is ... why would anyone pay for housing their information at Caboodle, when places like OpenWetWare already exist and are no charge?

Though, to be honest ... I'm not comfortable with the idea of having someone else manage my data.

Mr. Tom Joe I guess you never went to the site. It is free for personal use! My question for you is does OpenWetWare comply with FDA 21CFR11 required for federal funded research projects. The problem with open source applications and file sharing is that they do not comply with federal regulations and HIPAA requirements. That is a show stopper for anybody doing NSF or NIH research.

I haven't tried Caboodle yet and will try to do so before the summer is over.

But, as far as regualations -FDA 21 CFR part 11 only applies to a certain class of work that is regulated by GMPs (Good Manufacturing Practices). GMPs are only likely to apply if you're in a human-health related area of biotech.

I did go to the site. It's free up to a point. If you want your data backed up, which would make perfect sense, it's going to cost you. If you want more than 100M of space, it's going to cost you. If you want to collaborate, it's going to cost you. IOW, launching a multi-site research project in a matter of minutes with zero startup cost and no maintenance appears to be free for the first couple of minutes. As soon as you try to share anything, you got to pay. So, if you're only going to use it for personal use, use a notebook IMNSHO, at least then you never have to worry about database downtime and lost data due to a server crash. YMMV, I suppose.

Ok, TomJoe, I jumped in late but went to both sites (OpenWetWare and Caboodle) and they don't seem to offer the same thing. OpenWetWare seems to be a Wiki where people can post information smilar to Wikipedia but the other (caboodle) seems to be a data collection system. Not really the same thing. But if any one has a data collection and management site that provides free backup and unlimited space, please post it.

By tbiresearch (not verified) on 03 Jun 2008 #permalink

I have to disagree with this:

for FSM's sake avoid the C derivatives (C, C++, C#, Java and anything lower level than Java)

I got through grad school Bioinformatics only by learning Perl and racking my brain to re-learn Java. It's worth it to learn programming languages because someday, in many sorts of biotech industry, you will be running a reactor or helping someone else run a reactor. It will not be working properly, or perhaps you will be called upon to help with the validation. If you know nothing of the many programming languages or how software is written, you will be utterly lost at sea. And the support engineers, they go home at 9pm whether the reactor is working properly or not, but you will still be on your boss' s**tlist if the reactor is down the next morning.

Just last week, I sent a DNA construct over to an automated sequencing lab. They told me my construct was not there, that I had sent them vector. I triple-checked, the construct was right there on the many gels I ran. We dickered back and forth, I sent them new clones, new minipreps, new versions of the construct. Somehow, I had managed to find the ONE sequence on the face of the earth for which their particular software could not design a primer. It happens. The difference between the tech who told me I had screwed up the DNA prep and the lab supervisor was, the lab supervisor knew the programming well enough to figure out how the software was going wrong in the primer design.

I agree with tbiresearch in the fact that the two applications (OpenWetWare and Caboodle) are very different. I have been using Caboodle now for 3 months both for personal and research purposes and I haven't been able to find any other software package that gives me the flexibility to create my own data collection scenarios with minimal effort. As for the cost involved, considering what it would cost to hire someone to create this kind of solution I think the $55 per/month that it's costing me per user to collect this data is well worth it. My research project is only going to last for another 3 months. After doing the calculations it is much more cost effective to use Caboodle. If there are any free applications out there with the same features I wasn't able to find them. If anyone else can please place a post.

By Jason Tepper (not verified) on 04 Jun 2008 #permalink