We are now just 12 hours from the release of the National Research Council Data Based Assessment of Graduate Programs.
The tension is just overwhelming...
An interesting thing about the 2010 NRC rankings is the methodology, and a final version seems to have been settled upon.
As you know, Bob, the primary purpose of the new methodology is to make sure Princeton wins, and Harvard is suitably humbled provide a robust and objective ranking of US graduate programs, for the ages, which is not a subjective grossly lagging metric.
The complaints about the methodology have already started to bubble out, and there will be many more.
That the process was flawed in detail is undoubtable, some of the numbers just can't be right, they don't pass a sanity check, some smell of being simple transcription errors, but there are over 100,000++ pieces of data on about 5,000 programs at a couple of hundred universities.
The numbers are annoyingly robust in aggregate, at a glance.
So, what have they wrought?
There are a couple of key things to know:
1) There are two overall rankings: the R-ranking and the S-ranking.
These are reported separately, not as a (weighted) sum.
a) the R-ranking, is, roughly speaking, the old style "reputation ranking", whereby senior faculty are asked to rate other departments
b) the S-ranking is a synthetic "objective ranking".
It is, roughly speaking, generated by a Monte Carlo simulation of what rank people would give departments, based on the same peoples' stated ranking of the relative importance of objective metrics of program performance, and the reported quantitative value of these metrics.
2) the NRC reports both rankings and the confidence interval for each ranking.
Specifically the 90% confidence interval. (I still think the original 50% interval would be more informative, better still to have reported both).
This, in some cases, gives you a confidence interval range you can drive a truck through, and in some cases gives you nice tight well defined rankings.
They also report the correlation co-efficient between the stated R-rankings and the S-rankings... it is, far as I can tell, mostly positive.
To the extent one can trust these things, the R-ranking is a lagging indicator and is directly comparable with the 1995 rankings
The S-rankings are better more current, though centered on performance data from the 2005 period, and also provide a lot of data on the weights of the individual metrics and how they correlate with mean rankings.
Programs can, and will, generate their own statistics on their performance. Data is all there.
A lot of such numbers will fly about in the next few days, starting with the simple ones, like the centred rankings from the confidence intervals, and unweighted means.
Data mining the metrics will take some time.
There are about 20 metrics, like publications, citations, funding, student funding, composition of faculty, composition of students, time to PhD, graduation rate etc
The metrics are clumped in 3 categories: academics, student issues and diversity.
Each metric has a weight and a correlation co-efficient.
Each field has different weighting for different metrics, but generally only a few metrics contribute significantly to the rankings - most are statistically insignificant.
If you like to think in principal component analysis terms, then typically, near as I can tell at a glance, the rankings are driven by the three most significant components. Possibly depending on field that is being ranked...
There will be some gloating, some defensive critiques, howls of complaint and quiet pats on the back.
Prospective graduate students, and postdocs and faculty, will rethink their priorities, programs will reconsider their strategies, and administrators will ponder weighty decisions.
There will be surprises, good and bad, especially in the obscure metrics that will take time to be mined, even with intense crowd sourcing of thousands of faculty.
Times are very hard in academia, there are persistent rumours of serious program cuts - not trims, amputations - and the NRC report will be used to judge programs.
Some programs can argue for changes since the data was collected, or promise of near future changes, other programs will be mercilessly and messily cut.
Decisions will have to be made on how to reward success, or build on strength, and on how to strengthen weaknesses and expand into new fields and abandoning old.
These rankings are important, for all the flaws in the process, and all our awareness of the subjectivity of some of the quantitative metrics.
In the end, there are single numbers and hard rankings, and that is what resonates with our psyches:
That Is Better Than This!
Interesting times.
- Log in to post comments
I'd take your tension building efforts more seriously if I didn't know you've already seen the results....
Ok, so Agatha Christie I ain't...
But, if you've been holding your breath for 4 years, a certain sense of anticipatory anti-tension is to be allowed for.
Hey, maybe that is what Dark Energy is!
It is the collective negative tension built up since 1995 by administrators in US academia - it is tearing the universe apart!
You're mistaken, I believe, about the R-Ranking. It is an attempt to assign weight to the "objective" measures to get them to mimic the results of a secret survey of some untold number of alleged experts in the field. NRC specifically denies that it is a reputational ranking, and I think they're right. I will discuss this later today (after the embargo). It may be that the results in the sciences will be more worthwhile because of the incorporation of the citation measure (not used in the humanities), but if all the humanitieds results are as odd as the philosophy results, then the NRC wasted a lot of money for nothing.
well, there was a survey and it was not very secret
it asked, I believe, both for rankings, in the reputation style, and for weights to the quantitative metrics
the original intent, as I understand it, was to publish a single ranking that was a weighted sum of the R and S rankings, but this was squelched
the R-ranking is not a pure reputation ranking, they did something to complete the sample - they weighted the R rankings to make them internally consistent - something like if you ranked Uni of X high, but failed to mention Uni of Y which is objectively near identical to X, then then ranking of Y was weighted to the X ranking
I'm sure they'll explain at the presser
the correlation between R and S rankings should show to what extent the reputational rankings match the revealed preferences and be a good measure of ranking lag
The survey for which the weights in the R-Ranking were calculated was secret in the sense that (1) they are not publishing the results of the survey, and (2) they are not, or have not at this point, revealed who completed these surveys or what information they were provided in being asked to complete the survey.
Er, the surveys are all on the NRC website.
You can see exactly what information was asked, and who the pool of people questioned was.
Individuals are of course not identified, that'd never pass IRB nor would anyone answer.