Dot Physics

Previously, I talked about science fairs. One of the problems is that students don’t really have a good understanding of data analysis. For me, statistical analysis is just something to do with data. It isn’t absolutely true. So, it doesn’t really matter that students use sophisticated tests on their data. The important point is they use some type of test to compare data.

I just made up some arbitrary data analysis rules. Maybe if students and judges accept something like this, it could really improve science fair projects and judging.

To explain my analysis, I decided to have my own little science fair project. I wanted to look at reaction times for my left and right hand.

Hypothesis

All hail the might hypothesis! Long live the hypothesis. Ok, I don’t have a hypothesis. I am not even going to guess an outcome because that doesn’t really matter. A hypothesis would matter if I were testing some model. How would I know if the model was right or wrong without it? In this case, I am just playing around – you know, like a real scientist.

Methods

To test reaction time, I had someone else (my wife) drop a ruler in between my fingers. I started with my fingers at the 0 cm mark and caught it as soon as I could. The recorded distance from start to catch point is a measure of the reaction time. I will not go into the calculation of the actual time. (I am pretending like this is middle school after all).

After doing 5 drops that were caught with my right hand, I did 5 with my left. Yes, more would be better – but again, I am trying to be realistic here. Just imagine me doing this the night before the science fair.

Data

Below is a plot of the distances that I caught the ruler.

i-cfef4125d2f288b7a714b9df954e8a8a-catchinggraph1.jpg

Yes, I know I should have had a title that said distance instead of time. The average for the left and the right hand are: (this is actual data, fake data will come later)

  • Average Distance for right hand: 13.54 cm
  • Average Distance for left hand: 18.9

Analysis

First order analysis (this is what you usually see at science fairs) – the right hand has a faster reaction time because it caught the ruler in a shorter distance.

Second order analysis (this is the one I am suggesting). Here I will use the overlapping box analysis. Let me draw a box around both sets of data.

i-7bd90f5dbe7a7846b98e349605ee546a-boxanalysis-1.jpg

These boxes are an attempt to describe how the data is spread. The right hand had distance from 9.4 to 19 (a spread of 9.6 cm). The left hand had a spread of 13 to 28 (a spread of 15 cm). This is not the best way to describe the spread of the data. For instance, suppose I had most of the distances around 10 cm, but a couple much farther away at 20 cm. This would give a spread of 10 cm. Now suppose I had distances equally spread out from 10 to 20 cm, this would also give a spread of 10 cm. So the box give an estimate of the range of the data, but not how that data is spread out.

What do I do with the boxes? Well, in my method, I want to find out how much of the data is overlapping. Let me draw a third box.

i-7522f7a60036e65d2acd01acb0234371-overlapp1.jpg

In this case, there are 3 data points from the right hand that overlap with the left hand points. Also, there just happens to be 3 on the left data that overlap with right hand data. I am going to say that there is no significant difference between these two sets of data.

Data Analysis Box Rule

If no more than 1/5 (20%) of the data from the two sets overlap, then the two data sets have a good chance of being significantly different.

Yes, this is an overly simplistic method of analyzing the data – but remember that it is for middle school. Here is an example of a data set that would be significantly different with the “box rule”.

i-de5f05bdfd7fdf1b813a1ca777569c95-sigdiffbox.jpg

Here one data point from right overlaps with the left data and one from the left overlaps with the right data. This data could be significantly different. Yes, I know this is not the best way to do it. There are lots of problems with this method, but it is a start in the right direction.

Non-Science Major College-Level Analysis

Maybe this is too much for a middle schooler (and it is still not the best method) but how would a college student analyze this data? I would suggest finding the uncertainty (as represented by the standard error) first. The standard error is a measure of how spread out the data is that is a little more sophisticated than the “boxes” I use above. The standard error is:

i-efd6f8a2825addfbd5292255e4bab3cf-standarderror.png

Where s is the standard deviation. The standard deviation is essentially the average difference between each data point and the average.

i-6ae60ec6f970fdb142f9ad32d6af501b-853c-79575bd-7e-5a-9fdbc-480844b-76337.png

Here wikipedia lists the standard deviation with an N-1 term. There can be some debate over whether this should be N or N-1. Really, you should have enough data that it doesn’t matter. However, I will use the N for my calculations. Let me go ahead and explicitly calculate the standard deviation and standard error for my last set of right right-hand data above.

i-44126b7935032a13259cc62235d8e6e3-stdev-12.jpg

First, notice the units. I didn’t carry the units all the way through because of my laziness, but they should be there. The standard deviation has the same units as the quantity (distance in this case). Second, if you find the standard deviation by other means (say with your calculator) it may give you a different value. This is because it might be using the N-1 instead of N.

If you have more than 5 numbers, you are going to have to do something other than finding this by hand. I suggest using a spreadsheet. For both OpenOffice and MS Excel, the standard deviation is “=STDEV(cell-range)”. If you don’t know what that means, don’t worry. Here is an online standard deviation calculator.

Now to calculate the standard error, just take s divided by the square root of 5 (the number of data points).

i-41686e5b04af5ffb0210f5688251f774-standareeror-calc.jpg

With this, I can report the distance for the right hand as:

i-d8d4c2e329bc2faead7f8579ecb4a5e2-d-witerror.jpg

This says that the value of the distance the right hand catches the ruler is most likely from 10.5 cm to 11.7 cm. Most likely. I wrote it a second time rounding to make it look better. I can do this also for the left hand data:

i-755d731a2cdd7c5b9327f36bb87729f6-dleft.jpg

Notice that the data for the left hand is much more spread out and thus has a larger uncertainty. So, how do I tell if these two measurements could be the same value or different? I will use the basic idea that if the uncertainties for the two things overlap, they could be the same. If the uncertainties do not overlap, they are most likely different. For this case, the smallest distance for the left hand is 18 cm (from the uncertainty). The largest distance for the right hand is 11.7 cm. These two do not over lap, so it is likely that they are different.

Comments

  1. #1 Rob
    February 12, 2009

    You’re almost to something that starts to resemble what scientist would do. Do you plan a subsequent article with the level of confidence for rejection of the null hypothesis? Some who follow your blog might find it interesting.

    Or not.

  2. #2 Erin
    February 16, 2009

    Those equations are way too complicated for a middle schooler. im in the 8th grade and i had no clue what you were talking about.

  3. #3 Rhett
    February 17, 2009

    Erin,

    The first part (the box rule) doesn’t really use any equations. You could do it that way. I added the second method just for comparison. Let me know if you need any help.

  4. #4 Miroslav Provod
    February 18, 2009

    Golan Heights

    Golan Heights that spread over 1250 square kilometers are an important spring region that supplies 4 states with water. The great presence of static electricity in this region can be deduced from the high number of megalithic structures. A local circular structure of diameter of 159 meters is composed of five concentric circles that are laid out by freely laying stones that all weigh 37 000 tons in total. The heaviest single stones that were used weight about 20 tons. Around this structure there is further 8 500 dolmens and menhirs, the heaviest weighing 50 tons and they are up to 7 meters in height.
    Golan Heights may be thought of as a natural laboratory, which can provide us with historical information, in view of research. The static electricity that is almost unknown to present science was crucial for all megalithic structures that were built around the Earth. The new knowledge about properties of static electricity that I describe at http://www.miroslavprovod.com http://www.miroslavprovod.com/ provides more in depth information via continuing research about the mysteries of construction of megalithic structures with combination of different types of rocks, for example Stonehenge, Machu Picchu and many others.
    The megalithic structures had all the same function. They accumulated static electricity in their matter, which they gained from various sources. At Golan Heights the sources are mainly underground springs, which provide the megaliths with the static electricity charge.
    The “electronics of human body” takes the static electricity from cellular membranes in order to maintain functionality of all organs. The static electricity is continuously supplied by mitochondria. It can be proved by various experiments that the charge of human body can be filled by other means as well – by staying close to greater source of static electricity, which spontaneously gives the energy to cellular membranes. The transfer of energy is rather slow. In case of transfer between a rock and a human body it takes tens of minutes. This shows that the megalithic structures and later also sacral structures were built mainly for health purposes.
    Golan Heights are a great hint, which needs greater research to be done. By this I mean gathering of statistical data about all the megaliths in this region, mainly from which types of rocks the individual dolmens were built and their chemical composition. The great ratio of dolmens to the population hints that the reason for building them wasn’t only filling the bodily energy. It may be deduced from the combination of rocks at many of the structures that it mattered through which part of the rock the static electricity went. It’s probable that it was influenced by chemical properties of the rock, which it transferred to the cellular membranes of human body. If the combination of different types of rocks is proved by the dolmens it would logically point out that the reason for building them was curing of various bodily anomalies.
    Big groups of megaliths aren’t only at Golan Heights but at various places around the Earth. Further statistical data from other regions would make more believable knowledge not only about history but also about static electricity. From the economic point of view the “re-discovered” energy may bring great motivation to people in many fields.

    January 2009-01-06
    Miroslav Provod

  5. #5 Robbie
    February 22, 2009

    i dont get the data

  6. #6 Azazael
    February 27, 2009

    Hmm, very cognitive post.
    Is this theme good unough for the Digg?

  7. #7 Morgan
    August 11, 2010

    When I found your blog I was searching for instructions on how to complete a data analysis science fair project. My sister and I are comparing thinking preferences in twins. Everyone in our school takes a thinking preference survey, so we thought we would compare sets of identical twins. I don’t know how to form a hypothesis and procedure for my forms and paper though. If we are comparing data from two sources, we don’t really have a hypothesis. Do you have any advice?

  8. #8 Rhett Allain
    August 11, 2010

    @Morgan,

    Interesting. First, do you want a hypothesis for the official science fair project, or just for the experiment? Also, you can make a comparison – but you will need to know something about the test. What exactly about the test do you want to compare? Their scores?

    Really, your hypothesis (in official science fair terms – for the science fair) would be that the twins answer the test in a similar manner.

  9. #9 Morgan
    August 12, 2010

    For the forms to fill in we have to write out a hypothesis and procedure. Since this is a data analysis project, we are technically not “testing” anything. So, I guess what I am asking is: should the hypothesis be more like a conclusion in this case? and how do you write out a procedure if your not doing an actual test?
    We are comparing their results ( e.g. 12% analytical, 64% conceptual, etc.)

    sorry for the confusion in the first comment. Thanks for your help though!

The site is currently under maintenance and will be back shortly. New comments have been disabled during this time, please check back soon.