Previously, I talked about science fairs. One of the problems is that students don’t really have a good understanding of data analysis. For me, statistical analysis is just something to do with data. It isn’t absolutely true. So, it doesn’t really matter that students use sophisticated tests on their data. The important point is they use some type of test to compare data.
I just made up some arbitrary data analysis rules. Maybe if students and judges accept something like this, it could really improve science fair projects and judging.
To explain my analysis, I decided to have my own little science fair project. I wanted to look at reaction times for my left and right hand.
All hail the might hypothesis! Long live the hypothesis. Ok, I don’t have a hypothesis. I am not even going to guess an outcome because that doesn’t really matter. A hypothesis would matter if I were testing some model. How would I know if the model was right or wrong without it? In this case, I am just playing around – you know, like a real scientist.
To test reaction time, I had someone else (my wife) drop a ruler in between my fingers. I started with my fingers at the 0 cm mark and caught it as soon as I could. The recorded distance from start to catch point is a measure of the reaction time. I will not go into the calculation of the actual time. (I am pretending like this is middle school after all).
After doing 5 drops that were caught with my right hand, I did 5 with my left. Yes, more would be better – but again, I am trying to be realistic here. Just imagine me doing this the night before the science fair.
Below is a plot of the distances that I caught the ruler.
Yes, I know I should have had a title that said distance instead of time. The average for the left and the right hand are: (this is actual data, fake data will come later)
- Average Distance for right hand: 13.54 cm
- Average Distance for left hand: 18.9
First order analysis (this is what you usually see at science fairs) – the right hand has a faster reaction time because it caught the ruler in a shorter distance.
Second order analysis (this is the one I am suggesting). Here I will use the overlapping box analysis. Let me draw a box around both sets of data.
These boxes are an attempt to describe how the data is spread. The right hand had distance from 9.4 to 19 (a spread of 9.6 cm). The left hand had a spread of 13 to 28 (a spread of 15 cm). This is not the best way to describe the spread of the data. For instance, suppose I had most of the distances around 10 cm, but a couple much farther away at 20 cm. This would give a spread of 10 cm. Now suppose I had distances equally spread out from 10 to 20 cm, this would also give a spread of 10 cm. So the box give an estimate of the range of the data, but not how that data is spread out.
What do I do with the boxes? Well, in my method, I want to find out how much of the data is overlapping. Let me draw a third box.
In this case, there are 3 data points from the right hand that overlap with the left hand points. Also, there just happens to be 3 on the left data that overlap with right hand data. I am going to say that there is no significant difference between these two sets of data.
Data Analysis Box Rule
If no more than 1/5 (20%) of the data from the two sets overlap, then the two data sets have a good chance of being significantly different.
Yes, this is an overly simplistic method of analyzing the data – but remember that it is for middle school. Here is an example of a data set that would be significantly different with the “box rule”.
Here one data point from right overlaps with the left data and one from the left overlaps with the right data. This data could be significantly different. Yes, I know this is not the best way to do it. There are lots of problems with this method, but it is a start in the right direction.
Non-Science Major College-Level Analysis
Maybe this is too much for a middle schooler (and it is still not the best method) but how would a college student analyze this data? I would suggest finding the uncertainty (as represented by the standard error) first. The standard error is a measure of how spread out the data is that is a little more sophisticated than the “boxes” I use above. The standard error is:
Where s is the standard deviation. The standard deviation is essentially the average difference between each data point and the average.
Here wikipedia lists the standard deviation with an N-1 term. There can be some debate over whether this should be N or N-1. Really, you should have enough data that it doesn’t matter. However, I will use the N for my calculations. Let me go ahead and explicitly calculate the standard deviation and standard error for my last set of right right-hand data above.
First, notice the units. I didn’t carry the units all the way through because of my laziness, but they should be there. The standard deviation has the same units as the quantity (distance in this case). Second, if you find the standard deviation by other means (say with your calculator) it may give you a different value. This is because it might be using the N-1 instead of N.
If you have more than 5 numbers, you are going to have to do something other than finding this by hand. I suggest using a spreadsheet. For both OpenOffice and MS Excel, the standard deviation is “=STDEV(cell-range)”. If you don’t know what that means, don’t worry. Here is an online standard deviation calculator.
Now to calculate the standard error, just take s divided by the square root of 5 (the number of data points).
With this, I can report the distance for the right hand as:
This says that the value of the distance the right hand catches the ruler is most likely from 10.5 cm to 11.7 cm. Most likely. I wrote it a second time rounding to make it look better. I can do this also for the left hand data:
Notice that the data for the left hand is much more spread out and thus has a larger uncertainty. So, how do I tell if these two measurements could be the same value or different? I will use the basic idea that if the uncertainties for the two things overlap, they could be the same. If the uncertainties do not overlap, they are most likely different. For this case, the smallest distance for the left hand is 18 cm (from the uncertainty). The largest distance for the right hand is 11.7 cm. These two do not over lap, so it is likely that they are different.