NRC Rankings: minor errors

With about 100,000 metrics collected on 5,000 or so program, there are bound to be errors.

In particular, a lot of the metrics are of the form:
out of N people, how many, k, do/do not have the property we are measuring

This is then reported as percentages.
These percentages must be of the form: (1 - k/N)*100
where k, N are integers.

Yet they are clearly not.

There are several explanations for this, all of which are likely correct:

First of all, there are what look like clear transcription errors; reversed digits, or duplicate or omitted digits. Somebody entered these numbers by hand on some webform or some such. Mistakes were made by human people, who were either temps hired to do this quickly, or staff who did not have this task as their primary job.

Secondly, there is rounding: if 16/17 faculty have grants, do you report 94%, 94.1% or 94.12%?
The first would be correct, the last would be right - because it'd give you a higher rank than the poor deluded fools who ignored the non-significant digits!

This could have been error trapped - trivial task to both round all percentages to two significant figures (and reporting them to four signficiant figures is a cruel joke designed to shorten the lifespans of all physical scientists reading the report), and check that they are of the form (1-k/N)*100!
If not, flag them and check!

Thirdly, the numbers are mostly self-reported: if the NRC lists your "delusional index" as 0%, then it is because you did not report any content for that metric
(yes, I'm looking at you, UW CompSci - though don't be surprised if your grad students are contemplating going to industry rather than academia...).

Fourthly, yes, there are errors.
Almost all of them don't matter.
In almost all fields, almost all the metrics have negligible statistical significance, so it doesn't matter if they have random errors.
Now if you reported 0% success at getting grants, have a word with the NRC about an erratum...

More like this

After my post the other day about rounding errors, I got a ton of requests to explain the idea of significant figures. That's actually a very interesting topic. The idea of significant figures is that when you're doing experimental work, you're taking measurements - and measurements always have a…
Crakar said: Sea level is scarcely rising: The average rise in sea level over the past 10,000 years was 4 feet/century. During the 20th century it was 8 inches. In the past four years, sea level has scarcely risen at all. As recently as 2001, the IPCC had predicted that sea level might rise as much…
I've been getting a lot of requests from people to talk about the recent Excel bug. For those of you who haven't heard about this, in Excel 2007, floating point calculations that should result in a number very, very close to either 65,535 or 65,536 are displaying their result as 100,000. It's only…
As the Hubble servicing mission is going up, hopefully, we contemplate what on Earth is going on with NASA. Like, why the cuts? Well, I have no inside info on this, have not talked to anyone back east, so I'm guessing: first, NASA is just not that much of a priority - too small, and the science…

To point 4: if you run an opaque process and then DO NOT allow corrections then it's my opinion that you are not running a reasonably fair process. The NRC has an obligation to be as accurate as possible.

It was not an opaque process, the universities have staff who liased with the NRC, provided feedback and had opportunities for error correction - UWs VP for Research (or equivalent) would have been briefed on the methodology, given opportunity for feedback (we know the methodology was changed precisely because of Uni VPR objections) and they should have checked the input data.

The process was fair, but not error free. But the UW comp sci error was of their own making as far as I can tell.
If "who do we count as faculty" is ambiguous then ask the NRC back when, not now.

So the process was designed with a single verification of data? Seems crazy to design a process in this manner. Again I'm not saying the fault isn't with UWs, but you still haven't addressed why they have refused to correct clearly erred data at this point. Give me a good reason and I'll stop commenting, promise :)

I don't want you to stop commenting.
This is fun!

As far as I know, there multiple verifications of the data, I know our VPR spent a lot of time verifying departmental information etc, and the cycle was iterated.

But the universities are not passive partners in this, they need to be proactive.
I'm sorry, but it looks like UW comp sci tried to game the metrics by adding a lot of adjunct faculty and didn't think about how the denominator changed in the other metrics when they did that.

If you're not sure, ask.

Correcting is non-trivial - remember the weights are generated iteratively.
If you now decide not to count those faculty, then they have to re-Monte Carlo that whole sector and generate new weights and correlations and ranks.
The NRC is not google, they are $ and manpower limited.

"I'm sorry, but it looks like UW comp sci tried to game the metrics by adding a lot of adjunct faculty and didn't think about how the denominator changed in the other metrics when they did that."

Well I can tell you that is not true...at least for what I know and have heard at a department or dean of engineering level :) The list supplied for graduate students included Microsoft employees who have sat on exactly one Ph.D. committee and are not, by pretty much any standard I can think of, faculty.

Oh and "though don't be surprised if your grad students are contemplating going to industry rather than academia" is a rather low blow. You're not a computer scientist (well neither am I) but UW CSE, for example, supplies a ton of faculty members across the nation (see, for example UW South, aka UC San Diego.)

Well maybe after they charge their hundred bucks for the data to library the NRC can have enough money to rerun their monte carlo simulations (I'm sure our CS department would gladly give them the cycles, heh.)

The "don't be surprised" comment was not meant to be a low blow, but rather an oblique reference to the fact that one of the largest, and most lucrative, employers of CSE types is just down the road from you.
If you were ate UW Comp Sci anytime in the 90s or early 00s you had to have M$ in mind at some level as an option.

"academic plans" is self-reported, not an NRC generated number.

if 16/17 faculty have grants, do you report 94%, 94.1% or 94.12%?

None of the above; you report 16/17. The actual raw data. Why is this so hard?