Grading Methods Don't Matter

By drorzel on January 28, 2009.

Over at Dot Physics, Rhett is pondering grading curves:

Should you grade on a curve or not? If you are student, the answer is clear: go by whatever the instructor does. Otherwise, you have a choice. I don't like to tell other instructors or faculty what to do because I respect their freedom. For my classes, there is no curve. Why? Well, the question really is: "why grade on a curve?" I don't know the exact reason for particular instructors, but I can come up with some possible reasons.

My first few years teaching, I worried about this quite a bit. I talked to different faculty in the department about what they did, and to a few people in other departments. In my third year or so, though, I realized that it just didn't matter.

The catalyst for this realization was a student who wasn't happy with his grade. He'd been a decent student in the class, and I thought the grade I gave him was perfectly respectable, but he was hoping to go to med school, and thought he should've been a step higher.

I was bothered by this, because he had been a good student, so I went back and recalculated his grade using every one of the methods other people had told me about, from explicit numerical "curves" to setting the mean for the course at a target letter grade, and going one letter up or down for each standard deviation away from that mean. And every one of them came back the same way.

Since then, I've been much more casual about the way I assign letter grades. The relative weights of the different assignments are clearly set out in the syllabus, and I try to make the grading standards as clear as I can, but when the time comes to convert the numbers in the spreadsheet into letters for the Registrar, I just go with numerical values: 95% is an A, 85% is a B, 75% is a C, with pluses and minuses spaced roughly equally between those. I may shift the line between B+ and A- (or any other adjacent pair of grades) if I think a particular student deserves a higher grade than strict numerical cut-offs would indicate, but that's a tiny effect.

And as long as I've been doing my job well during the term, the grades fall out more or less where you would expect. There is always a student or two whose score is around 95%, to get an A, and the weakest students come in around the C/D boundary. I don't think I've had anybody finish the course and earn an outright F-- I've occasionally had to fail students because they failed to turn in labs (we have a policy that students must have a passing lab grade to pass the course, and that all lab reports must be handed in to get a passing lab grade), and I've had students who were headed for an F drop the class, but I don't think anyone has been so dismal they didn't deserve even a D.

I think that's really the key. If I'm doing my job, and properly matching the content of the tests to the content of the class, everything works out fine without needing to curve anything.

When students ask, I do tend to say "The person with the highest total grade in the class will get an A," which works remarkably well to reassure students who are hoping for a curve. I don't tell them that that's because there's almost always somebody in the class who gets A grades on all the individual assignments, and not because I curve the grades to get there.

More like this

The only rational explanation for curving that I've ever heard came from a professor at the university where my best friend went to school. He summed it up this way: "If I [the professor] give a test, and no one in the class gets more than, say, 80% of it correct, then clearly either my test was too hard or I didn't do a good enough job of teaching the material. Either way, it's not my students' fault, so I will curve the grades accordingly."

Presumably, this professor uses a similar reasoning for assigning overall course grades.

Of course, I've also had professors who deliberately write tests that are way too hard (e.g., median score is 60%, and the highest score is 75% or something). I think that's just mean.

I recently had a class that was graded on a curve. The mean value for the final was a about a 57 (which I was quite close to) and I still managed on getting an A in the class due to the curve. While I appreciated the help I do have to agree with Dr. Kate's story. There was a disconnect between what was being taught and what was being tested on and rather than attempt to correct this throughout the semester the teacher decided to teach how they wanted and simply move the curve around. While I enjoyed the class for other reasons, I do think that a teacher should try to be more introspective and look for how they could have taught the material so they would not need to curve, if that is possible. In other words, I don't think a teacher should go into the class expecting to grade it on a curve.

Dr. Kate, I have had one-- exactly one-- professor give me a rationale for the impossible test syndrome. His claim was that if he gives a test out where someone gets 100%, he still doesn't know how good that student is, or how much he knows. All he knows is that the student knows everything on the exam. Hence, he tried to write exams that no one could get a perfect on, in order to properly gauge what the maximum level for the class really was.

To this day, I still have mixed thoughts on that.

On the one hand, yes, that is completely and perfectly rational.

On the other hand, the vast majority of the time, it just strikes me as mean and lazy, as well. I've had a lot of professors give the Impossible Exam, and with that one exception there has never been any explanation or apology, and more than once there's been the attitude of, "What the hell is wrong with you people?"

(I do actually believe that one exception, though, because in all other ways he was a class act. He warned us ahead of time so we wouldn't completely panic, he curved heavily, etc. Hell, he once e-mailed me after a final exam to tell me he thought one of my essay answers ("How do you solve this open research question for the field?") might be a fruitful line of research for a PhD. Alas, it was in a field I consider mind-numbingly boring....)

Short answer: the culture of your school, and of your department, provides context and advice.

By the principal of Academic Freedom, you have the autonomy to grade as you choose -- up to the point where the administration pushes back at you.

"Hey, you gave EVERYONE in your class an A" [as I did, once, and was made to suffer for it], or "You failed too many students; parents are complaining; get your ass in gear for the flood of formal grade appeals coming at you."

Of course, there are already laws to prevent you from grading in an overtly discriminatory way (say, passing all the girls and flunking all the boys, or passing all the Latinos and flunking all the Asians), or selling grades for money or sexual favors [I love your sexy curves; I grade ON the curve, if you get my drift].

I've said this on another thread, but can't resist:

"Why did the teacher drive his car off the road and into a tree?"

"Because he was grading on the curve."

Heh. I took a test once (freshman honors physics) where a 47 was the high score. That took "challenging the students" to a new level. I learned later that the prof was teaching for the first time, but it also had attracted students who didn't understand the concept that a calc III prerequisite meant you needed to still know how to apply calculus.

I do pretty much what you do, Chad, choosing problems so that a passing student should get around 70%, particularly if they get full marks for the homework.

I have heard faculty complain about a curve problem all the way to the comprehensive exam. If you ask really hard problems (see my first example), you don't learn much about what the students actually know. The final grade consists entirely of partial credit. It is much easier to have confidence when failing someone when you can say "they couldn't even do X".

I could go either way on the question of curving grades, as long as there is a reasonable justification given. However, I once had an experience with curved grading as an undergrad, which was horrible.

I was in introductory linear algebra, and the professor broke the grading down so that homework was worth 25% of our grade (I think. It's been a while). Now, I think I'm prety smart, but I've never been a model student, and while I did fine in the class for the most part, I only got about 75% of the homework done. This was online homework (using the WebWork system, if you are familiar with that), which means that a lot of people had 100% of the work done.

I wasn't too happy about missing so much work, but 75% of the homework was still a significant amount of effort that I assumed I'd get credit for, and based on my other grades I was expecting a B or B+. However, the professor applied a curve to this data, without any real though about why, or what it would actually mean to curve this data. Because so much of the class got 100% of the homework grade, my 75% curved to 0.

That's right, I got 0% credit for the homework. Not because I didn't do any work, but because everybody else did more.

Of course there are mathematical reasons why this makes no sense, and they are all the more surprising because this guy was a mathematician. but the real problem was that he applied a mathematical tool without considering whether it was useful or correct, and whether the outcome made any sense. I acknowledge that I should have done more homework, but is 75% of the work really the same as none at all?

To scientists who use measurement devices all the time, it ought to be pretty obvious how difficult tests should be. The object of a test is to measure a students knowledge of the material. If you were buying a pressure gauge and had an infinite supply of pressure ranges to choose from, you would never pick one where you knew you would only operate in the upper quarter of the tools range. If you had a gauge like that, you'd find yourself musing that you would prefer one that measured only the range you worked in, but had better resolution.

Tests ought to be the same way. Ideally, the very best students should score quite high on the test, but without getting a perfect score, while the worst students ought to do quite poorly. Unfortunately, we've all been bred to think that anything below 80% on a test must mean you are significantly sub-standard.

More on Chad's topic, I worked for a professor who pointed out that students worry a lot about the distribution of value between tests, assignments and quizzes. He looked at all his students scores and found that the A students nearly always get As on their homework, As on their tests and As on their quizzes. Likewise the C students did roughly equally poorly on all three. It didn't really matter how he weighted the different categories, except for a very few students who might make minor leaps from a B to a B+.

I taught High School science several years ago, and had to take a formal testing methodology class to get my certificate. It was a useful enough class that I would recommend at least sitting through one if you are near an education program. The first thing you have to try to figure out is "what am I testing for." If you want to see how good the best students are, then you make something noone will get 100%. If you do this you *have* to know that you loose distinction on the low end of the scale.

"If I'm doing my job, and properly matching the content of the tests to the content of the class"

If only every instructor would do this, there would be fewer problems in the grades.

All of my physics classes had impossible exams.

Usually, the passing grade was around 15 points - out of a hundred possible. The best students had points in the 20s and I made it two or three points above failing, which depressed me greatly, but still wasn't so bad, considering that the results were curved, so that 50% of the students failed.

It was a very frustrating experience. It also didn't help my self-esteem at all..

This has been something that's been bothering me for a while - why is having the top score in the class be 80-90% or so "mean" or "lazy" or bad at all? So long as you set the letter grades appropriately (i.e. 100-80 A, 60-80 B, etc.), what difference does it make?

Especially in physics, there is practically no limit on how hard an exam can be or how many topics can be covered - i.e. it's not like you're ever going to learn everything there is to know about statistical mechanics. So why artificially limit the difficulty/amount of material to the point where a significant amount of people would expect to get over 95%?

The other side of this problem is when you make the material so difficult that nobody can get more then let's say 50% on an exam - and I think almost anyone would recognize that as a problem.

Like Clark points out above, you would always try to adjust the range on your measurement device to the variation of what you're measuring.

There are a couple reasons not to adopt Clark's method, despite the fact that it is logically true.

First, it can be incredibly frustrating. If the average student gets half the problems incorrect, it's just demoralizing. It doesn't correspond well to "average" performance in real life.

Second, it doesn't sufficiently punish students for slacking off. If 50 is a passing grade, then I can skip half the assignments and still pass.

In classes that actually teach how to teach, it turns out that the curve and how easy/difficult the exam is, like you say, not very important. What IS important in how you grade papers and things is that you blind them somehow; ideally, you'd double-blind them and only do the un-blinding at the end, performed by a TA who wasn't involved in the class.

I used to have the students put only student IDs on the papers, wrote out a specific grading rubric with the exam, and stuck to the rubric ruthlessly. Didn't un-blind the results till final grades needed to be handed in. My boss used to argue and whine endlessly about how This One was a Good Student who could not possibly have earned a B- and That One was lazy and didn't talk enough and could not possibly have earned an A. And if you tried to suggest, gently, softly, that perhaps he had misunderstood a shy student or been wowed by a chatty one, oh, hellfire and brimstone, you were Questioning His Scientific Objectivity, and woe be unto you.

Best professor I had in grad school was similarly ruthless about grading, kept everyone on tenterhooks until finals, and required that all homework and papers be vetted by Turnitin.com. She gave out lots of Fs for cheating, and you would not believe the amount of students in the sub-Ivy private schools who cheat and have the nerve to complain when they get caught.

Another reason I consider the Impossible Test to be bad practice is this:

The kinds of situations where it is appropriate to ask for solutions to open research questions (or other similar Impossible Test questions) on a two hour exam are often not the sorts of situations where two hour exams are really really relevant any more.

You do that sort of thing (fairly) in mid to upper level graduate courses, where your students will at least recognize the nature of the question and have a sufficient breadth of knowledge to sketch out a solution, and note the good things and bad things about it. You don't do that, in my opinion, to undergrads because it is pointless.

But, by the same token, why the hell are we giving out standard two hour exams to mid or upper level graduate students? Give them a paper, or a project, or something meaningful to sink their teeth into.

Teachers are not babysitters. That is at the foundation of any good methodology of grading, with or without The Curve.

I Am Not A Babysitter
As a teacher, I face many stereotypes about my job. But I wouldn't trade my career for any other.
By Heather Robinson | NEWSWEEK
Published Oct 11, 2008
From the magazine issue dated Oct 20, 2008

"It is said that teaching is the profession that creates all other professions. That's a beautiful compliment for a job that often does not receive the respect one would predict given all of the platitudes bestowed upon teachers. 'God bless you!' 'What a noble profession' and 'I couldn't do it, but thank goodness there are people like you out there' are a few that I've received.... What many people don't understand is that we teachers are working with wiggling, chatting children with varied needs. They are not robots who perform exactly as we direct without exception. Teachers are not on autopilotâwe make thousands of decisions each day while working hard to produce a quality product that provides each student with what she needs and deserves. Teaching isn't simply perching at a lectern and pontificating to hungry minds; it's being an educator, a mentor, a parent, a nurse, a social worker, a friend, a diplomat and an expert on the curriculum. In short, we are professionals...."

When I was in high school, I took AP Chemistry. Because the class was designed to prepare us for taking the AP exam in the spring, most of our in-class test questions came from former AP exams. To get a 5 on the AP exam, you only need to get about 65% of the questions right. (Why this is, I do not know...) Therefore, my teacher concluded that to get an A on a test derived entirely from AP exam question, you should only need to get 65% of the possible points. I'm not sure how exactly her grading rubric mimicked the one used by AP graders, but I assume they were similar. She also had a policy that scoring 80% or better on a test was an A+, which would be factored in as she calculated final grades. It all made perfect sense to me.

Now that I'm a graduate student, my professors have different policies for grading, but most of them use some sort of a curve. For my biochemistry class, the exam average was set at a B+, and your letter grade was determined by how many standard deviations above or below the mean you fell. The average hovered around 75%-80% in most cases, but there were always one or two people in the 90%+ range. This is probably because people of vastly different skill levels were all taking the same class -- from cell biology PhD students who had majored in biochemistry to neuroscience PhD students who had majored in psychology. I think the grades worked out to be quite fair in the end, although after the first exam people panicked until they saw the grade distribution.

My syllabi say (something very akin to) "if the average grade for the class is lower than C+, I will curve to raise the average to a B-. Curving will never lower a grade." Haven't yet had to curve, although I can see that I _will_ have to at some point, or alter test difficulty downwards (which I'm reluctant to do; I _have_ found that student effort increases if they see tests as very hard *but* fair). I've had folks get 100%; rare, but has happened. Average tends to be around C+ and end-grade average right around B- after extra credit bits get added in, which is what I aim for.

The problem in this area was biggest for me when I was teaching at Mary Baldwin, a small college in VA, for a semester while Jenny was finishing up her thesis. I was teaching Psych 101 and (separately) stats, about 30 students each. By a couple of weeks in, it was very clear that my grade distribution in both classes was going to be [roughly] 3xA (foreign students, who often worried that they had only scored 100% and was there anything else they could do?), 4xB (the VWIL cadets, who were massively disciplined but also massively overloaded), maybe a C or three, a D or three, and 15-20 Fs. So I went to talk to the Dean, who commented that this was expected, but that the college had a policy of admitting anyone who could come up with the $40k tuition and relying on faculty to break it to them that they should not be in college. Yuck. I have no clue to this day how I could have come up with exams for those classes; the top students in stats were asking about multivariate regression while the bottom students gave answers such as (this is a direct quote) "I don't know how to calculate this but I believe that I have a feeling that the answer should be 7." The actual problem was extremely basic and had an answer of 0.35; but the student then argued that her answer should get credit because her feelings were as valid as mine.

Sorry. Ranty. Came close to souring me on teaching completely, though :(.

Student perception, that an exam is fair, has been found in pedagogical research to be very important.

I quote myself in a review that I wrote for one of my School of Education graduate courses.

============================================
For: Prof. Nick Doom, EDSE 401
Charter College of Education, Cal State L.A.
From: Prof. Jonathan Vos Post
Due: Monday 19 May 2008
Done: and emailed Sunday 18 May 2008
============================================
"Find a professional journal (not Time or Newsweek) on Education (Education Week, for example). Pick any article on any district, state, USA; on critique or reform. Give title of journal, article title, authors, publication date; summarize in 5 sentences, and critique."

============================================
Article #1 for Summary & Critique

http://arxiv.org/pdf/0803.4235
Title: Exam fairness
Authors: Mathieu Bouville
Comments: 5 pages
Subjects: General Physics (physics.gen-ph); Physics Education
(physics.ed-ph)
29 March 2008

Summary in 5 sentences:

(1) It is widely agreed that exams must be fair, yet what this exactly means is not made clear.

(2) One may mean fairness of treatment, yet this merely propagates the fairness or unfairness of pre-existing rules.

(3) Fairness of opportunity, on the other hand, necessarily leads to identical grades for everyone, which clearly makes it inapplicable.

(4) Neither view is helpful to make decisions on competing claims: fairness of treatment ignores the problem and fairness of opportunity holds all claims to be equally valid.

(5) To escape this deadlock one needs an external criterion such as how good engineers students will be, to replace the fairness viewed as student-student comparison.

Critique:

(a) This paper does not definitively define either the problem nor the solution but, rather, gives a useful survey of the opinions expressed in the 18 papers referenced and cited.

(b) Some quotations are vivid, such as:

"What students hate more than anything else are examinations that they perceive as unfair."
[R. M. Felder, "Designing Tests to Maximize Understanding", J. Prof. Iss. Eng. Ed. Pr., 128, pp.1-3, 2002]

"For the student, the most important question to be answered in order to be content with the outcomes of the assessment is 'what is a fair assessment.'"
[H. Vos, 'How to assess for improvement in learning", Eur. J. Eng. Educ., 25, pp. 227-233, 2005]

(c) The argument make plausible -- without being compelling -- that "... fairness is a necessary consequence of the validity of the exams, rather than a separate criterion."

============================================

I think it depends on what you believe the purpose of grades are. If they are only for class ranking, then it doesn't matter whether you grade on a curve or not. If they are not for class ranking, what are they for?

To me, what people need to know about a student's performance in a class is this: what topics has the student mastered? What topics has he/she not yet mastered? And possibly, What topics is the student not likely to master, even if he works on them for the rest of his life?

Grades are nearly meaningless, except for the purpose of comparing students to each other, so that we can talk about the best students, the worst students and the mediocre students.

Doesn't matter what system you use - point score on a curve, simple pass-fail cutoff, whatever - as long as the system is clearly described from the outset. A Lot of the perception of unfairness comes from people more or less unconsciously are studying for a different kind of grading than they will actually undergo.

Me, I like to argue for pass/fail, at least at higher levels. As an undergraduate we had "fail", "pass", and "pass with excellence", with pass at 50% and excellence at 75%. Of course, the higher grade normally didn't mean anything as a pass is a pass, and your final degree is not scored at all. The higher grade could at times make a difference for a specific high-level course or two if you were applying for a PhD-student position involving that area - but then, if you aren't good enough to earn that higher grade you'd not have any interest in graduate school anyhow.

This simple cutoff works just fine.

"But, by the same token, why the hell are we giving out standard two hour exams to mid or upper level graduate students? Give them a paper, or a project, or something meaningful to sink their teeth into." - John Novak.

I asked that question of a few lecturers whilst doing a Masters by coursework, and the response was either "University regulations require it" or "It's too difficult or unfair to assess an individual's performance on a project which could have had collaboration".

I suspect the latter is the reason for the former.

Overall it's pretty disappointing, and when the only real way of trying to rule out collaboration is quizzing the student on their project / paper, you may as well have given them an exam.

prk.

I don't believe that exams and papers/projects need to be exclusive. In graduate courses especially, there should always be assignments that require loads of research and analysis time. That's how one develops a true appreciation for the amount (and lack) of knowledge in an area.

But there should also be some form of timed examination because in real life you are sometimes required to deliver information and make calculations on the spot or at least under very short time constraints. Yes, you can study and prepare for presentations and consultations, but when the time comes you're going to often be asked questions on the spot. And your client or your audience is going to evaluate you on some level your competence on your ability to do so.

What should be considered is the relative weight of exams versus homework versus projects. At least in grad school, exams should not make up the bulk of your grade. Unfortunately, usually the opposite is true.

Doesn't all of this depend on both the type of class and the type of test? If I teach intermediate algebra, aka, 'things you should have picked up in high school, had you been paying attention', I curve very heavily. This is due not only to the quality of the student, usually a first-semester freshman, but because the test is in a multiple-choice scantron format. A student gets either 0 or 5 points, all or nothing. So I don't have a way to distinguish the 1/5 point kids from the 4/5 point kids. Conversely, if I teach a calc class, or a diffeq class, things are a bit different, both in the quality of the student and the ability to assign partial credit.

Iow, this is a roundabout way of saying that I have to grade not only on what students get right, but what they get wrong, and why. A good test should not only measure performance, but should be some sort of diagnostic. And this in turn should be reflected in the grading at the end of the semester.

ScentOfViolets writes:

A good test should not only measure performance, but should be some sort of diagnostic.

But once you summarize the test scores/homework scores/lab scores/scores on research papers, etc. as a grade, the diagnostic aspect is thrown out. There is just not enough information in a letter grade (or numeric grade, for that matter). If the course covers, say, 10 topics, and a student masters all but one of those topics, he may very well get an A for the course. But if future courses depend on the student knowing that one topic, then she may be in trouble. On the other side of the scale, if a student masters only 3 of the 10 topics, then he will fail the course, meaning that he's no better off than if he mastered none of them.

As I said, what I would like to see implemented in schools is that every student has an associated list of microtopics that she has completely mastered (in the sense that I have completely mastered the multiplication of numbers using decimal notation; it's not just that I get it right 95% of the time, I essentially never get it wrong, except for typos). There may be ambiguity about whether a student understands a large topic well enough to get an A or a B, but it seems that for a small topic, it is unambiguous to demonstrate that a student has mastered the topic.

Good point at #12. My exams have a cover sheet, and the first thing I do is turn them all to some particular problem, shuffling the "deck" as I go. Each problem is graded as a group, with the same treatment for all.

I usually explain this the day after I give the first exam, pointing out that there is no reason to ask "have you graded MY exam yet" because (a) I have no way of knowing and (b) I won't be finished with the last problem on their exam until, at the earliest, I have graded last-minus-one on all of the other exams.

As for the measurement range, I think my exams are pretty reliable between 50 and 100. I'd rather have an exam operate up there than between 0 and 50 (out of 100). The reason is simple: If my measurement uncertainty is the same on both tests, my relative error decreases if I don't have to curve. As I said above, I'd also rather give a passing grade to a student who got some problems completely correct rather than one who never got more than half credit on any exam problem.

I forgot to mention one thing: It bothers me a lot that my tests don't challenge the best students. I have thought many times about putting a 25 minute problem worth 5 points on each test, but too many of the weaker ones ignore a giant bold faced statement like "don't do this until you have 100% of the other problems correct".

Does anyone have any experience with giving a 95 point exam and then handing out a 5 point "A student" problem when they turn it in before the end of the test?

Grading Methods Don't Matter

More like this

Go On Till You Come to the End; Then Stop

Meet Charlie

Physics Blogging Round-Up: August

The Age Math Game

Kid Art Update

Wanted: Volunteers for the Expo!

Messier Monday: The Wild Duck Cluster, M11

What Can Dolphins Tell Us About The Evolution of Friendship?