In the same basic vein as last week’s How to Read a Scientific Paper, here’s a kind of online draft of the class I’m going to give Friday on the appropriate ways to present scientific data. “Present” here meaning the more general “display in some form, be it a talk, a poster, a paper, or just a graph taped into a lab notebook,” not specifically standing up and doing a PowerPoint talk (which I’ve posted about before).
So, you’ve made some measurements of a natural phenomenon. Congratulations, you’ve done Science! Now, you need to tell the world all about it, in a compact form that allows the viewer to make a good assessment of your results. Here are some rough notes on the best ways to go about this, starting with:
STEP ZERO: Know what point you’re trying to make. If you’re trying to interpret brand-new data, in the privacy of your own lab, office, or coffee shop, you can just slap together any quick-and-dirty sort of graph that you like, so you can see what you’re dealing with. When you’re preparing to present data to somebody else, though, you need to have a specific purpose in mind. Are you just comparing two numbers? Looking at how some property changes over time? Trying to characterize a distribution of numbers? Different goals will be best served by different types of presentations, and it’s important to have a clear idea of what you want to accomplish, so you can choose the right sort of graph for the job.
STEP ONE: Know your options. There are a whole host of different options when making a graph, as even a casual glance at Excel will show you. Some of these are versatile and powerful, some are only useful for such a ridiculously narrow range of purposes that I’ve never seen one used effectively. And, of course, if you look into data visualization, you’ll find a whole community of people who are really hard core about this stuff, crafting wholly original graphics specifically designed for each new data set they work with.
If you’re at a point where you have need of my input, though, there are really only a handful of options that you need to be aware of. As you get a better feel for your subject, you can start to explore others, but these will get you started. The “starter set” of data presentation methods, with appropriate applications is:
Option 0: Data Table On the one hand, you can’t go wrong with just giving the reader the numbers and having done with it. There’s almost no way to mislead people with a table full of raw data, and they can always do their own analysis of the data and make whatever kind of graph they like best. The down side of this is that it’s pretty much a punt– admitting that you really don’t have any good idea how to visualize your numbers.
Option 1: Scatter Plot This is the most basic form of plot you can make: you just plot one quantity versus another, as a bunch of dots on a square field. As basic as it is, though, it’s the best application for a lot of data in physics. If you’re looking at the position of a moving object, or the rate at which something heats up, a scatter plot is the way to go. It gives you a very clean presentation of the data, and an easy way to analyze the behavior as you make quantitative changes in some parameter.
A special case of the scatter plot is the logarithmic plot, which is the appropriate choice for data spanning a range of multiple orders of magnitude, such as book sales. This is a little trickier to interpret, but still has the same ability to give you a clear idea of quantitative trends.
The students in my class are currently making their own measurements of time-related quantities, and among the obvious choices there, the long-term measurement of a timer’s performance is best demonstrated by a scatter plot:
This clearly shows that there’s a linear drift for each timer, and a simple linear fit to the data lets you determine the rate of that drift.
Option 2: Bar Graph This is the other most common form of graph, the one you’re most likely to see in the newspaper or on tv. Here, you represent the magnitude of some quantity by the length of a horizontal or vertical bar.
A bar graph is the appropriate choice when you want to compare a small number of qualitatively different scenarios, and so it’s very common in social-science sorts of applications, comparing the earnings of people with different levels of education, for example. There’s no simple quantitative relationship between the different levels that would let you make a scatter plot (I suppose you could do “years of schooling” as one axis, but that’s kind of contrived…). If you’re comparing two or more quantities for each of your qualitatively different conditions, bar graphs give you a very quick visual way to identify the relative sizes.
Bar graphs are also one of the easiest forms to make annoyingly deceptive, so they need to be used with care.
Option 3: Histogram At first glance, a histogram appears to be just a special case of a bar graph, but it’s different enough to rate its own category. When you make a histogram, you’re not representing the size of a single parameter, but characterizing a distribution of measurements. For a histogram, the lengths of the bars represent the number of measurements falling into a particular range of values.
I’ve posted a bunch of histograms here over the years, for everything from the distribution of baby feeding times to commute times. A histogram gives you a good sense of not only the size of an effect, but the spread of the measured values. It’s the best way to tell whether you’re dealing with a nice, normal “bell curve” type distribution or something more complicated.
For the class of the moment, the measurement best represented by a histogram is the test of a cheap sand timer:
This graph lets you see right away that the two ends of the timer have different characteristic emptying times, and that there’s virtually no overlap between them.
Option 4: The color map. sometimes, you need to characterize the behavior of some measured quantity as you change not one but two other parameters. In such cases, you can make a visual representation of the system by mapping your measured value onto the color (or greyscale density) of points on a two-dimensional grid, where the grid coordinates represent the values of the two variable parameters. These are tricky to interpret, and a friend at work still gives me grief about the color plots of SteelyKid’s feeding schedule from back in the day, but it’s a category of plot that’s reasonably common these days, since computers have gotten powerful enough to make these more or less effortlessly.
There are tons of variants on these– you can turn a color map into a surface plot, or make a scatter plot with two different axes, or stacked-bar graphs– but these are the most basic methods for presenting data to someone else who might be interested in it.
STEP 2: Remember your audience No matter who you’re preparing the graph for, even other people in the same lab, they won’t be as familiar with the data as you are. Keep that in mind, and work to make your graph as self-explanatory as possible:
—Keep it simple Yes, you can use modern scientific software to make a scatter plot with fifteen different quantities, two separate vertical axes, with three inset plots and a 3-D surface map on the side. But nobody will ever be able to make sense of that unless they already understand the data as well as you do. As much as possible, you want to keep your graph simple: if you’re comparing things, pare it down to only the 2-3 most representative of whatever it is you’re trying to show.
—Make it clear If you’re plotting multiple quantities, make sure that they’re visually distinct. Don’t make a graph with points that are distinguished only by color (some people won’t be able to see that), but make sure that the shapes of the markers are easily distinguishable. Make sure that different datasets in a scatter plot aren’t on top of each other (unless that’s the point you’re trying to make), that the bars on your bar graph are wide enough to show up clearly, that your histogram doesn’t have an excessive number of bins, and so on.
—Label everything No matter what sort of plot you’re making, make sure that it has clear, comprehensible descriptive labels for everything that matters. Your axes should be readily identifiable, with appropriate units (or lack thereof) provided. If you’re plotting a calculated quantity that isn’t represented by absolutely standard notation, label the relevant axis in words. There’s nothing worse than coming across a scatter plot with axes labelled only with those squiggly Greek letters that nobody can keep straight (lowercase zeta? lowercase xi? who can tell?) and having to go searching through the body of a paper to find the definition. If you’re using multiple symbols, there should either be a clear legend in the plot itself, or a clear statement of what each represents in the figure caption.
A good visual presentation of data can make a complicated result come clear in an instant. A bad visual presentation of data will remain baffling no matter how many times you read its description. There aren’t any hard and fast rules here that can never be broken, but if you take this advice as a starting point, you’ll be fairly safe.
(I’m probably forgetting a few items that will come to me five minutes after this psot goes live. Just in case I don’t think of them, though, feel free to point them out in the comments.)