Robert Chung on David Kane

Boosted from comments. Robert Chung writes:

David Kane wrote:

Anyway, it seems clear to me now that you are bluffing

Me, bluffing about knowing how calculate a CMR? Ouch, that hurts.

David, what a fascinating example of hubris. You do not know how to do something, so you conclude that no one else can either. However, that something “seems clear to you” has, once again, led you down the wrong path — though for you this seems about par for the course.

As you ought to have known long ago, we are clearly not “in the same boat.” The reason you ought to have known this long ago is that you have had in your possession the proof of what I have been saying — but with the blinders you’re wearing you couldn’t see it. 20 months ago, I showed a graph with the cluster CMRs; more remarkably, 14 months ago and then again one month ago, both times in response to your own requests, I have pointed you to my code in which can be found the “magic formula” for calculating the pre- and post-invasion CMRs. Perhaps you missed it since the calculations were cyptically and misleadingly labeled “pre-invasion CMR” and “post-invasion CMR”? I leave getting the overall CMR from the cluster CMRs to you as an exercise.


I, and others, have warned you that you have been confounding the estimates of CMR and the estimates of the CIs around those estimates. You keep saying that my estimates of the CMRs and excess mortality depend on bootstrapping. They do not. The proof is in the code you ignore. You keep saying that Roberts’ estimates of excess mortality depend on normality. They do not. Despite your exegesis of the rest of the article, the proof is at the bottom of the left hand column on page 3, where the CMR calculation is given. Look at it, and please (please!) recognize that it does not depend on normality.

So this is what it comes down to: the estimates of excess mortality don’t depend on normality, but your argument does, and there is no evidence that Roberts and Garfield made that assumption. You have done this even though there is no evidence for it and, in fact, there is evidence against it. Your argument is a phantom argument. There is nothing there. This is what Tim Lambert meant when he said that all you’ve shown is that assuming normality for the CI including Falluja is wrong.

David, there are legitimate criticisms of the Roberts and Burnhams articles. Yours isn’t one of them. Your paper is trash, and you’re hurting yourself. Do the right thing. Write Malkin and Fumento and tell them you didn’t know what you were talking about. Tell them you apologize for the exploded heads. You can even tell them you’re working on yet another crazy argument. You don’t have to tell them that you accused a demography professor of not knowing how to calculate a CMR.

Comments

  1. #1 Torbjörn Larsson, OM
    September 3, 2007

    As a further example why detailing is practically impossible, you can look at lab experiments. You could possibly detail every nut and bolt in a setup, and still have others fail to replicate the experiment when building an equipment clone. (Which, as noted, is an uninformative method of replication.)

    Why? Because the material used may have been special by being defect; chemicals tainted, cell isolates impure, et cetera. What you can do is to save some amounts of non-standard ingredients so if someone can’t repeat your experiment you can go back and check what went wrong. And if it is a new effect that was presented you can help other labs by disclosing the “nut and bolts” description.

  2. #2 David Kane
    September 3, 2007

    SG writes:

    Spagat’s thesis is “Main Street Bias”. The only way to test “Main Street Bias” is by getting the address of the sample records. Do you understand how it is impossible for Roberts and Burnham to give this information away?

    I have communicated with Spagat on this. Although it is true that he and his team would like enough detail to test MSB, they would be eager to just start with the data that has been released to people like me. In other words, Roberts et al refuse to give Spagat even the data that they give to other people. Again, I believe that this is unprecedented. At least, no one in the Deltoid community has cited a similar example.

    Also, Spagat and others are not seeking addresses since the Lancet authors report that all addresses were destroyed by the survey teams even before they left each neighborhood. There is no address data to look at.

  3. #3 SG
    September 3, 2007

    So David, you have finally admitted something – Spagat can’t use the data to check their thesis for confidentiality reasons. But you still don’t get the whole story do you? You say

    Also, Spagat and others are not seeking addresses since the Lancet authors report that all addresses were destroyed by the survey teams even before they left each neighborhood.

    Do I need to put this in capital letters and shove it up your arse? It is possible to identify individuals without using their address. All you need is a distinctive house in a small area, and the locals can identify it. A distinctive house can be the only house in the town with a household member who was killed by a bomb, or it can be the only house with a lot of immigrant family members. All that information is contained in the data. It doesn’t matter that it is also in L2 – if L1 has even one cluster where this identification is possible, you don’t get the data.

    As for not giving Spagat the same data as you – if only Roberts and Burnham have the right to give Spagat that data, why have you published it at CRAN? And if they don’t reserve that right for themselves, why do you complain?

    (And why do you refuse to answer so many of my questions?)

    (Also, I am pretty confident that researchers I have met will not share data with people they don’t trust. I generally don’t hear about it because the usual run of mendacious, nasty researchers don’t have as much balls as you or Spagat, and don’t ask for the data in the first place).

  4. #4 JB
    September 3, 2007

    “Sortition said: “It is also the case that in theory the statement of a theorem is all you need in order to know whether it is true or not, since the reader (being competent enough) should be able with some effort to repeat (and in that way indepedently verify) the steps taken in the original proof”

    It’s simply ridiculous to claim that is the same as “It is very possible (as I explained above) to provide in software documentation (ie, higher level form) everything that is needed to reproduce results — ie, without providing the actual source code.”

    I explained above why: software documentation gives you all the necessary steps to produce the results.

    Sortition continued:
    “Are you implying that the reason published mathematical proofs do not include “every last step” is to protect the competitive advantage of the author or to challenge the reader to retrace the steps of the author?”

    That was not at all what I was claiming. All I was saying was that even mathematical proofs don’t give every last step. My explanation gave the very reason for that. If the person reading the proof is familiar with the subject, they don’t need every last step (as a novice would).

  5. #5 Sortition
    September 4, 2007

    Those commenters opposing publishing code suggest that as long as the published work is reproducible by competent readers, it is not incumbent upon the authors to make it _easy_ for the readers to do so.

    My position is that it is the duty of authors to make reproduction of their work and consequent research by their readers as easy as is reasonably possible. Any code used by the authors would therefore naturally be part of a well-written paper.

    I claim that accepting the proposed policy of allowing to make reproduction difficult, as long as it is at all possible, would have absurd implications. That is the purpose of the example of theorems without proofs – reproduction is possible, but very difficult.

    Even if we reject this example as extreme or “ridiculous”, as does JB, the implication remains: a policy that allows the author to make reproduction difficult opens a Pandora’s box. How difficult is too difficult? What if the author does not provide a proof, but instead provides some hints as to how the proof goes – is that acceptable? Or, to take a completely different example, what if an author provides parts his paper in a disemvowelled version only?

    No – it is the role of the scientist to be clear, open and instructive. Withholding information (code, or anything else that is of relevance to the research) without very good reason, is unscientific and should not be acceptable.

  6. #6 Torbjörn Larsson, OM
    September 4, 2007

    Withholding information (code, or anything else that is of relevance to the research) without very good reason, is unscientific and should not be acceptable.

    Quite frankly, that doesn’t sound like any paper I know of, where a lot of effort has gone into structuring and paring down unnecessary details to make the chain of evidence strong and easy to grasp.

    Detail can and should be provided at request, unless these special reasons (confidentiality, disclosure, competition) preclude.

    The science goes into the ability to repeat and retest, not into stamp collecting detail.

    That said, there are open access efforts to enable publishing raw data and derivative graphs, which are the kind of detail you would like to have access to as basis for new research.

  7. #7 Sortition
    September 4, 2007

    I took it as clear that some things, like code, are not part of the body of the paper but are available as supplementary material.

  8. #8 Torbjörn Larsson, OM
    September 4, 2007

    Oh, I see. Well, supplements typically contain long original derivations, for example if a code uses a new algorithm that needs description for future reference or so.

    I never add the drawings of equipment to experimental papers, and likewise I wouldn’t like to see the software code of R or similar codes.

    I’m reminded of a discussion on The Panda’s Thumb, there the result of a genomic analysis looked so corny for a specialist on a protein assembly that he teased out that the researchers had used, erroneously there, default settings for the statistical analysis.

    Nobody claimed the researchers had withhold usual detail (used settings) or saw something inappropriate, and they corrected their analysis. Just as for Robert above, the experts were used to the analysis and didn’t expect the paper to be written for non-experts, so didn’t complain about lacking a description of settings for a common software.

    In any case, there are a lot of problematics involved in claiming general rules on paper presentations. It differs between areas. That is why I stressed the results (repeatability) over formalism (details). I’m not sure any one strategy is “correct”, but I’m pretty sure what scientists (at least in my area) do in practice.

  9. #9 SG
    September 4, 2007

    (Everyone else, this concerns comment 88 and is irrelevant, so don’t read it – for David’s eyes only)

    David, I just had a look at the problem you describe in comment 88 (on your separate blog) and your method of means becomes more and more biassed as the bias in the missing values increases. Here is some r code you can try to show it to yourself:

    1) suppose 10 clusters all with a true death rate of 1/10, suppose that in all clusters 110 individuals sampled and 11 deaths observed. In 8 clusters there are 10 missing observations, all with no deaths; in 2 clusters there are 50 misisng observations, all with no deaths.

    construction:
    >vvec2< -cbind(rep(11,10),c(rep(100,8),rep(60,2)))
    > vvec2
    [,1] [,2]
    [1,] 11 100
    [2,] 11 100
    [3,] 11 100
    [4,] 11 100
    [5,] 11 100
    [6,] 11 100
    [7,] 11 100
    [8,] 11 100
    [9,] 11 60
    [10,] 11 60
    test David’s mean:
    > mean(vvec2[,1]/vvec2[,2])
    [1] 0.1246667
    calculate the correct way:
    > sum(vvec2[,1])/sum(vvec2[,2])
    [1] 0.1195652
    so now the bias is 12.5% for your method and 12% for the correct method.

    Then repeat with 5 clusters having 50 missing values, and the same true death rate (code omitted, too boring to copy and paste), result:

    David’s method:
    [1] 0.1466667

    Proper method:
    [1] 0.1375

    so now the biasses are 15% and 14% respectively – bias has increased for the incorrect method.

    Note how both methods are biassed against the null, but the mean method is more biassed.

  10. #10 Nash
    September 4, 2007

    #102 Kane:

    In other words, Roberts et al refuse to give Spagat even the data that they give to other people.

    Confidentiality aside the problem is Spagat has a clear agenda to rubbish the lancet study and zero expertise in the field:

    • He had been using IBC data in support of a power law hypothesis about the scaling of violent deaths, which carried on the highly tendentious work he’d done on Colombia.
    • Michael Spagat is an apologist for the Colombian government link
    • He recieves plentiful funding from the arms industry link
    • They arbitrarily chose and inflated the parameters they included in their MSB theory in order to ensure the bias was statistically significant to slander the study and didn’t highlight alternative possibilities. Clear indication of their agenda

    Considering this why would you release anything to someone who has no interest in truth, science, error checking and no experience in said discipline who’s agenda is clearly to rubbish your study to great fanfares despite the evidence. Truthfully I was disappointed they gave in and distributed anything at all to you vultures.

    Again, I believe that this is unprecedented

    No what is ‘unprecedented’ is the lengths that individuals and groups such as Spagat and yourself with little or no experience in the field have gone to to rubbish this study for entirely political reasons. If this engenders ‘unprecedented’ reactions then we should be largely unsurprised. In any case its such a disengenuous, non-sequitur thst your implications here, again of malfeasance and fraud on the part of the authors, is out of line and entirely speculative. Its mud slinging, although admittedly, considering how thoroughly your attempt to debunk the study has been destroyed here that is all you have left.

  11. #11 sod
    September 4, 2007

    i think it s rather difficult to discuss “more openess” in science, with David around.

    just take a look at what he did with data made available to him:

    However, the data shows that only 29 of 47 clusters featured exactly 40
    interviews. The following table shows the number of clusters for each total
    number of houses interviewed:
    33 36 38 39 40 41
    1 2 5 8 29 2

    http://www.bioinformatics.csiro.au/CRAN/doc/vignettes/lancet.iraqmortality/mortality.pdf

    his interpretation given online later looks like this:

    Such a proceedure ignores the fact that non-response varies across clusters.

    Consider a simple example in which you have two clusters with 50 attempted interviews in each using a one year look-back period. In cluster A, you interview all 50 households. There are 10 people in each house and a total of 20 deaths. The CMR is cluster A is then 4% (20 deaths divided by 500 person-years). But, in cluster B, only 10 households agree to be interviewed. The other 40 refuse. There are also 10 people in each of the 10 households. There is one death, giving a CMR of 1% for cluster B.

    http://lancetiraq.blogspot.com/2007/09/missing-data.html

    his example uses a 80% missing responses in 50% of the clusters.

    the Lancet paper had less that 5% missing responses (on average) in less than 40% of the clusters.

    in short, this example has nothing to do with the lancet reality and simply is a distortion of facts.

    and David wrote this misleading example in a post, that he considers his “mea culpa” for having been shown to lack basic understanding!

    while i certainely support more openess in the science community, individuals lie David Kane make me serious doubt the effect of it.

  12. #12 David Kane
    September 4, 2007

    sod,

    My example was purposely extreme to illustrate the underlying point. It was intended for layman, like, say, Donald Johnson, who want some intuition for why different methods for calculating CMR might lead to different answers.

    SG,

    You are obviously a serious fellow and I have wanted to answer your questions. If I have missed some, my apologies. You write:

    Do I need to put this in capital letters and shove it up your arse? It is possible to identify individuals without using their address.

    Of course! My only claim, and I am pretty sure that the Lancet authors would agree, is that they have taken care to ensure that this is impossible for the data which they released for L2 and that similar care could be taken for the data for L1. None of the authors have disputed this.

    Now, it may be impossible to provide data at enough detail to both satisfy Spagat and maintain confidentiality. But that is not the fight we are having today. The fight today is: Why not give Spagat et al the same data that they give to me (and a dozen others)?

    All you need is a distinctive house in a small area, and the locals can identify it. A distinctive house can be the only house in the town with a household member who was killed by a bomb, or it can be the only house with a lot of immigrant family members. All that information is contained in the data.

    Have you read my description of the data? There is no information about specific towns, except for Falluja and Baghdad, for precisely the reasons you give.

    As for not giving Spagat the same data as you – if only Roberts and Burnham have the right to give Spagat that data, why have you published it at CRAN? And if they don’t reserve that right for themselves, why do you complain?

    I have only published the data for L1 on CRAN, data that Tim placed in the public domain when he posted it (without objection from the L1 authors, I assume) on Deltoid. I did not post the data for L2 on CRAN (or anywhere else) because the agreement I signed prevents me from doing so. I did provide tools for working with the data for those who have access to it.

    (And why do you refuse to answer so many of my questions?)

    Let me know if I have missed any.

    (Also, I am pretty confident that researchers I have met will not share data with people they don’t trust. I generally don’t hear about it because the usual run of mendacious, nasty researchers don’t have as much balls as you or Spagat, and don’t ask for the data in the first place).

    Why do you restrict this to just me and Spagat? There are at least 4 co-authors of Spagat (Neil F. Johnson, Sean Gourley, Jukka-Pekka Onnela and Gesine Reinert) who would like access to the data. Are they also “mendacious, nasty researchers?” Just asking!

  13. #13 Robert
    September 4, 2007

    In post #88 above, [David Kane](http://www.google.com/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fwww.apa.org%2Fjournals%2Ffeatures%2Fpsp7761121.pdf&ei=eGLdRtf_JYzYgQORkJWGAw&usg=AFQjCNElD1Z–OcLdhxALGOewDGvcsHiiw&sig2=BWtBhDldkVKKXN3bCMn7Sw), trapped, desperate, and gnawing his leg off, wrote:

    I have written up a brief discussion [...] which explains why the [CMR] formula used in L1 is not as obviously correct as Robert Chung, SG and others like to pretend.

    Dude, you’re arguing that the crude mortality rate shouldn’t be the crude mortality rate. When you’re done with that, can you try arguing that 2+2=4 isn’t as obviously correct as I like to pretend? I think that’d be kinda entertaining, too.

  14. #14 JB
    September 4, 2007

    “Those commenters opposing publishing code suggest that as long as the published work is reproducible by competent readers, it is not incumbent upon the authors to make it easy for the readers to do so.”

    No, that’s not at all what I have been saying. This is not about making things “difficult” for those who would repeat the results.

    That I don’t give you my actual source code does not mean repeating the results will necessarily be “difficult” if all the important steps have been laid out in front of you (in documentation).

    Sometimes (often?) it is actually more difficult to figure out what someone else has done than to simply solve the problem for oneself.

    Often, it is fairly easy to quickly “hack” a computer program together that solves the same problem as a very elegant program that took considerable time — and that can be extended (or perhaps already is) to do many other things besides the problem at hand (Wolfram’s Mathematica comes to mind as an example of the latter).

    The first program was easy, the second relatively difficult. The fact that I do not provide the source code for the second (Mathematica) has little to do with how difficult the first task is, particularly when the really/i> difficult part — the algorithm and its application to the case at hand (including critical implementation steps) — have already been documented and provided by me.

  15. #15 Eli Rabett
    September 4, 2007

    Sorry Sortition, authors do not have a duty to hold anyone’s hand, if nothing else, no one has the time. Look at the effort that a clueless David Kane has extracted here there and everywhere. Those who know, in the normal course of things would have simply ignored him after his initial effort (it was excised with extreme prejudice from the web site where it was originally published). However Kane has used politically motivated allies to keep the waters boiling and cost everyone a tremendous amount of effort. See McKitrick.

  16. #16 Eli Rabett
    September 4, 2007

    JB, from sad experience we all know that code is not self documenting and STEM graduate students do not believe in documenting their code. Mostly you ARE better off with the description of the algorithm in the paper. Again, this is a fight between what should be and what is.

  17. #17 David Kane
    September 4, 2007

    I am glad to welcome Robert Chung back to the discussion. SG, in comment 73 above, claimed that the confidence intervals for the CMR estimates in L1 could not be calculated without “data to the level of the household, which we don’t have.” Indeed. But Robert has argued, above, that he can replicate the CMR confidence intervals even though all we have access to is the cluster-level summaries. Who is right? I think SG.

  18. #18 richard
    September 4, 2007

    David

    As you have a blog on Lancet/Iraq, why don’t you simply open the posts there for comment? Provided you are willing to do so with little or no moderation, that would seem to be an equally valid location for such discussions.

  19. #19 dhogaza
    September 4, 2007

    But Robert has argued, above, that he can replicate the CMR confidence intervals even though all we have access to is the cluster-level summaries. Who is right? I think SG.

    Will you accept Robert’s bet this time before he cleans your clock in public?

  20. #20 Kevin Donoghue
    September 4, 2007

    So a thread that begins with David Kane saying to Robert Chung: “Anyway, it seems clear to me now that you are bluffing” has now after 117 comments arrived at the point where the same David Kane, now with lashings more egg on his face, saying exactly the same thing about another issue. David, it’s reasonably clear what’s going on here. You’re trying to get Robert to do your homework for you. He isn’t going to. He just might if you were to place a bet, but clearly you are too shrewd to do that.

    As a way out of this impasse, I suggest you tell us why you hold the view you do. Have you done any exercises which lead you to believe that the CIs can’t be constructed from the available data? If so, what have you tried? Have you tried to construct a proof of insufficiency? More bluntly, have you bothered your arse doing any work at all or are you simply relying on your undoubted talent for squeezing information out of people? That talent has served you well for getting your hands on data but it has exposed the fact that you don’t really have much idea what to do with it once you have it.

  21. #21 David Kane
    September 4, 2007

    richard asks:

    As you have a blog on Lancet/Iraq, why don’t you simply open the posts there for comment?

    I closed comments on that entry on purpose as it seems rude to shift the conversation somewhere else than Deltoid. For those interested in my posts on related topics, here, here and here are the three most recent.

    As to those suggesting a bet, I agree! I have written to Tim to see if he would be willing to host/judge such a context. Basic rules would require that someone replicate the estimates and confidence intervals for CMRs, relative risks and excess deaths printed in L1 using the available data. SG and Robert have already done some of the work here, but I continue to believe that the confidence intervals can’t be replicated. I hope that Tim will host/judge the contest.

  22. #22 diogo
    September 4, 2007

    Let me second Kevin Donaghue on his suggestion:

    > *I suggest you tell us why you [David Kane] hold the view you do.*

    David Kane suspected L1 and L2 results (he called fraud on the authors at least once).

    He is not a biostatistician or epidemiologist, and as became clear in this thread, his suspicions cannot have arisen because his technical expertise allowed him to spot a critical problem with both studies. So far, he hasn’t.

    His suspicions might have arisen from the fact that the authors are reluctant to release all their data together with their source code to other interested researchers.

    However, he has been informed by practicing researchers that this in fact is not uncommon with public health data, and the authors’ behavior do not deviate from standard practice in the field (whether this is a desirable state of affairs is another issue, but it is clear that the authors’ cannot be faulted for things being this way).

    Is this a fair summary?

    So, what other reasons do you have to doubt L1 and L2, David?

    We know they can’t be technical, and they can’t be any suspicious behavior from the researchers, so what are they?

  23. #23 Sortition
    September 4, 2007

    Eli:

    > Sorry Sortition, authors do not have a duty to hold anyone’s hand, if nothing else, no one has the time.

    I don’t see publishing code as holding someone’s hand. It is a reasonably easy way to help your audience understand your work, verify it, and use it as basis for further research. It is not intended to satisfy clueless hacks like Kane.

    I don’t see why “no one has the time” to publish code. You might as well say that no one has the time to write papers. Yes, it takes time, but it also serves a useful purpose. The effort and time spent on publishing the code should not be out of proportion to the effort of doing the research or writing the paper.

    JB:

    You wrote (#114):

    >> Those commenters opposing publishing code suggest that as long as the published work is reproducible by competent readers, it is not incumbent upon the authors to make it easy for the readers to do so.

    > No, that’s not at all what I have been saying.

    Maybe I didn’t understand you, but I thought this is exactly what you said in #31:

    >> So the idea is that you give proper documentation so that your competitors could reproduce your work, but you don’t give the code so that it is not too easy for them to do it?’

    > That is precisely it.

  24. #24 Exploded head
    September 4, 2007

    “as it seems rude to shift the conversation somewhere else than Deltoid.”

    Like to Malkin?

  25. #25 Eli Rabett
    September 4, 2007

    Exploded head has a great idea.

  26. #26 ephscientist
    September 4, 2007

    re: #122, this has never been about science, but ideology. Kane is a gung-ho, pro-war, right-wing ideologue. He called “fraud” before any sort of serious thought. He had his post taken down on the Harvard stats web site. And now everything else is simply an effort to provide the smallest bit of rationale for that obviously wrong, and wrong-headed, assertion. His disingenuous invocations of “how science is done” has nothing to do with science. He will never give up, he will never listen to reason, he will, in his own mind, always be right. It is about confirmation of steadfast bias, not falsification of hypothesis.

  27. #27 Ian Gould
    September 4, 2007

    “In his spare time [David Kane] helps people believe that violence in Iraq could not possibly be as bad as the Lancet study demonstrated.”

    Had David been an adult in 1967, he would probably be publishing papers claiming that napalm burns were only slightly painful.

    Had David been an adult in 1939, he would probably be defending Germany’s response to Poland’s communist-inspired attack on them.

  28. #28 dhogaza
    September 5, 2007

    Had David been an adult in 1939, he would probably be defending Germany’s response to Poland’s communist-inspired attack on them.

    And in 1946, just imagine what might’ve happened if he and David Irving had been drinking buddies …

  29. #29 SG
    September 5, 2007

    David at 117: of course you think I’m right, it suits you. I bow to Robert’s superior experience, intellect and humour on this one.

    I’m off to teh Japan Statistical Association conference in Kobe, so likely won’t have anything to add to this thread of doom until Monday.

  30. #30 sod
    September 5, 2007

    sod, My example was purposely extreme to illustrate the underlying point. It was intended for layman, like, say, Donald Johnson, who want some intuition for why different methods for calculating CMR might lead to different answers.

    while i enjoy your “layman” arguments, it is NOT what you did in your article.

    instead you write:
    Although I am a layman when it comes to demography, it seems obvious that any statistican would question whether just adding up all the deaths and dividing by person-months is the best way to estimate the crude mortality rate for Iraq. Such a proceedure ignores the fact that non-response varies across clusters.

    you are making a very strong claim about the case of Iraq.

    nowhere do you explain, that the situation of “lack of response” in Iraq is on a completely different order of magnitude.

    Ignoring non-response causes you to weigh clusters with higher-response rates more heavily even though, a priori, there is no particularly good reason to do so.

    a priori, there is NOT even a good reason, why you want to discuss the non-issue of non-response rates in the lancet iraq study. but you decided to write a piece about this anyway, one including a completely misleading example!

    even your disclaimer, does not tell the full truth!
    Indeed, the differences in the two approaches for the Lancet data are even smaller.

    again: the effect will be smaller my MAGNITUDES!

    here is the link again:
    http://lancetiraq.blogspot.com/2007/09/missing-data.html

  31. #31 Palo
    September 5, 2007

    I think Kevin is correct. All David Kane is doing is to get others to do the job for him. He throws in a claim and waits for the discussion to see what he can get out to use in his politically motivated ‘articles’. And if he gets intellectually spanked, he sends a ‘thank you for the discussion’ note to look like a nice and honest guy. Look, he’s started by saying he could ‘bet’ Robert couldn’t replicate the data, to later say that he wasn’t willing to ‘bet’ because he ‘knew’ someone could replicate the data. David Kane is simply a dishonest guy with a political agenda.

  32. #32 Rich Puchalsky
    September 6, 2007

    “You know when you said you think of yourself more as a statistician than a political scientist? I’m guessing a lot of statisticians are asking “why us?” and a lot of political scientists are high-fivin’.”

    Oh, so funny. I have to remind myself to look at their CV the next time I ask a statistician to check some work rather than just assume that anyone who calls themselves a statistician must know what they are doing.

    Actually, this leads to useful question: is there a simple way for a complete non-expert such as myself to estimate a statistician’s reliability?

  33. #33 pough
    September 6, 2007

    Actually, this leads to useful question: is there a simple way for a complete non-expert such as myself to estimate a statistician’s reliability?

    You mean like calculating the odds?

  34. #34 Sortition
    September 6, 2007

    > Actually, this leads to useful question: is there a simple way for a complete non-expert such as myself to estimate a statistician’s reliability?

    The answer, I believe, is “no”.

    This is one particular case of a general problem that is rarely discussed: many aspects of reality are not obvious. There are many things about which there is no way to form an informed opinion without putting effort into finding out the facts.

    You either have to believe the accepted wisdom or put in the time and effort to do some independent research.

  35. #35 Thursby
    September 6, 2007

    Oddly enough, this is the topic of David’s dissertation, “Disagreement”. The answer, shockingly enough, is to increase the confidence interval and simulate parametrically. Breathtaking…I might have used a flat prior and layered hyperparameters, but that’s just me. Of course, you could also use the outcome of experimental bets (hmmm) to figure out that uncertainty…as in Sarin and Wakker “Revealed Likelihood and Knightian Uncertainty”, which oddly enough David does not cite.

    Of course I jest. The only way to know is to do the work, or to rely upon (noisy) external signals, like tenure at an Ivy League school, publication record, and a PhD from a reputable school in the actual field of record. This is why when I see an infomercial on TV about male enhancement or natural cures I don’t believe the PhD “doctor of homeopathy” who’s telling me it works…even though he has a blog.

  36. #36 Harald Korneliussen
    September 8, 2007

    “Actually, this leads to useful question: is there a simple way for a complete non-expert such as myself to estimate a statistician’s reliability?”

    Yes, there are, but they would be frowned at by some statisticians. Look at their politics, look who they write for, look what other statisticians are saying. Look what they have been wrong and right about in the past.

    In short, google them and make up an opinion on what you see.

    I’m not saying it’s always a good way, but it is a way. I say it’s better than assuming all specialists are equally reliable.

    JG: In fact I am after public and distributable code (like for instance the code for R is), but I’m also interested in seeing code published along with documentation. Yes, as Rabett says, serious implementation errors will be discovered on attempts at replication, but that’s the hard and expensive way of doing it. If the documentation is good, checking an implementation’s compliance with it should not require very much domain knowledge.

    I was aware that some huge, important programs like climate models are written in Fortran, in part for performance reasons. What shocked me was that many people apparently still use Fortran numerical libraries directly in cases where packages like R or Mathematica would be appropriate.

    Errors will creep in that way. I suspect you underestimate how easy it is to make implementation errors, and how long they can go undetected.

    Why don’t we ask someone with long experience in detecting them whether it’s easier to find implementation errors by inspection or by reimplementation and comparison? Like, say, a computer science lecturer?

  37. #37 Jon H
    September 8, 2007

    “Why don’t we ask someone with long experience in detecting them whether it’s easier to find implementation errors by inspection or by reimplementation and comparison? Like, say, a computer science lecturer?”

    What’s important isn’t the implementation, all you need is the algorithm, detailed to a greater or lesser degree.

    As a simple example, you don’t need someone’s sort code to figure out if the output is correct. You don’t even need to use the same sort algorithm. You just need the input data and the parameters (ascending, descending, etc). If the other guy used a bubble sort, and you use a quicksort, the only difference should be how long the sort takes, which isn’t the issue. The output should be the same. Same procedure (‘ascending sort’) same inputs (‘z’,’b’,’x’,’d’), same result (‘b’,’d’,’x’,’z’).

    Likewise, if someone says they did an FFT on some data and produced result X, you don’t need to have their FFT code, you need a way to run an FFT with the same inputs. It should produce the same answer.

    If someone says they did an NPV calculation, you don’t need to see their code, you just look at the inputs and output and run it through the implementation of your choice.

    What you want is a high level description, “Given this data, we ran a ascending sort, then an FFT, then an NPV and arrived at these results.” Given that kind of data (with a little more detail about function parameters) you should be able to check their work. The source code would be superfluous.

    The *actual code* used by a given researcher might well include a lot of stuff that is irrelevant to the specific problem, but is used in their work in general. And it might have dependencies on various custom utility libraries built at their institution, or highly optimized commercial libraries, which may have complicated publishing rights. The code used isn’t necessarily a standalone file of sourcecode that depends only on standard libraries.

    In order to publish the code as you wish they may have to either publish the whole kit and caboodle (not what you’d want) or they’d have to extract only the specific code relevant to the particular issue, from wherever that code had lived before, replace proprietary or rights-constrained implementations with public-domain implementations where necessary, boil it down to the bare minimum, and then publish. This can be quite a large undertaking.

  38. #38 Ian Hopkinson
    September 8, 2007

    Harald – I think I’d trust a Fortran numerical library that’s probably been around since the mid-70′s, probably used continuously since then more than I’d trust an R or Mathematica routine and, if my own experience is anything to rely on, I’d trust a physicist to implement what they intend in Fortran more reliably than they would in Mathematica.

    I’ve worked as a physical scientist for 15 years, most of that time as an academic, quite a lot of code gets shared, or made public by scientists but it’s never been a requirement for publication and to my mind there’s never been a demand for it amongst academics. If you’re doing research in a numerate field, then the chances are you’re going to re-implement stuff as a learning exercise. This has the benefit of flaggings typos in papers and revealing any “skeletons in the closet”.

  39. #39 Jon H
    September 8, 2007

    Sortition wrote: ” Any code used by the authors would therefore naturally be part of a well-written paper.”

    That code will become obsolete rather quickly. Some of it will come from obscure languages, or specialized proprietary products that never quite took off.

    A paper accompanied by source code to drive the twin i860 CPUs and 56001 DSP processor on a specialized Ariel expansion card for the 1990s NeXT Cube isn’t going to be very useful to anyone these days.

    A paper accompanied by mathematical formulae and descriptions of the algorithms used would be more useful, because they could be reimplemented on modern hardware and software, since today’s hardware probably doesn’t need that kind of specialized coprocessor for adequate performance.

  40. #40 Jon H
    September 8, 2007

    Sortition writes: “It is simply a matter of making it a requirement for publication. People find the time to handle all the other requirements of publication – I see no reason why this would be any different.”

    So, say, neuroscience researchers will need to submit their monkey to the journal, so other researchers can use it?

  41. #41 Jon H
    September 8, 2007

    Robert wrote: “You’re drubbed, whupped, and schooled. You deserve all of it. You need to read this.”

    Heh. As soon as I clicked on that link I knew what would be on the other end even before it loaded.

  42. #42 z
    September 8, 2007

    In other news, the US is now mounting nuclear cruise missiles on B52s headed for Barksdale AFB, which happens to be a staging point for the Middle East.
    http://www.timesonline.co.uk/tol/news/world/us_and_americas/article2396127.ece
    The “liberal media” hasn’t put that together yet, being (slightly) fixated on the spectre of armed nukes being flown across the US.

  43. #43 z
    September 8, 2007

    Hmm. The underlines in the url get converted to italics. Let’s try again with < <http://www.timesonline.co.uk/tol/news/world/us_and_americas/article2396127.ece>

  44. #44 sod
    September 8, 2007

    on the Topic of including code, there s big news.

    Hansen has released the code to reproduce his results.

    http://data.giss.nasa.gov/gistemp/sources/

    it looks (as predicted!) as if the code is rather ugly.

    The subdirectories contain software which you will need to compile, and in some cases install at particular locations on your computer. These include FORTRAN programs and C extensions to Python programs. Some Python programs make use of Berkeley DB files.

    and he has asked for some weeks to “clear it up”.

    there is a huge celebration over at climateaudit.

    i m slightly worried that every error found in the code, resulting in -0.02 correction in a year, will lead to headlines:
    global warming a computer error!

    but we shall see.

  45. #45 Rich Puchalsky
    September 8, 2007

    Sortition: “This is one particular case of a general problem that is rarely discussed: many aspects of reality are not obvious. There are many things about which there is no way to form an informed opinion without putting effort into finding out the facts.
    You either have to believe the accepted wisdom or put in the time and effort to do some independent research.”

    Completely unhelpful reply, of course. The “accepted wisdom” is that anyone who calls themselves a statistician and who appears to have the appropriate degrees from well-known schools can be trusted to do simple statistical work. If I wanted to do the work myself, I wouldn’t be trying to hire someone to do it. And of course it’s impossible to do all such work myself, unless I want to be my own doctor, lawyer, etc.

    Thursby is right about the noisy external signals of professional competence, but they are indeed noisy. I was hoping that people who were actually statisticians might know some kind of rule-of-thumb way of evaluating it that might help novices.

    I’m left with Harold K’s:
    “Yes, there are, but they would be frowned at by some statisticians. Look at their politics, look who they write for, look what other statisticians are saying. Look what they have been wrong and right about in the past.”

    I think that the simplest rule of thumb is that anyone who has ever written for the right wing, in general, is incompetent. I might miss some competent people this way, but the downside cost of getting an incompetent one is very high.

  46. #46 Sortition
    September 8, 2007

    Rich Puchalsky:

    > Sortition: “This is one particular case of a general problem that is rarely discussed: many aspects of reality are not obvious. There are many things about which there is no way to form an informed opinion without putting effort into finding out the facts. You either have to believe the accepted wisdom or put in the time and effort to do some independent research.”

    > Completely unhelpful reply, of course.

    Sorry. Facts can be inconvenient at times, but that’s not a reason to wish them away.

    > And of course it’s impossible to do all such work myself, unless I want to be my own doctor, lawyer, etc.

    You can’t check everything yourself, so you have to pick and choose. Doing the picking may not be that easy either, of course. Again, sorry, but that’s how it is.

  47. #47 Rich Puchalsky
    September 8, 2007

    “Again, sorry, but that’s how it is.”

    Well, I know that this is a side-path, but that generally just does not seem true. Most knowledge is statistical. You’re treating “an informed opinion” as if there are only two kinds of opinions, informed and uninformed. But there are all kinds of heuristics that can help people make better decisions of this kind. For instance, let’s say that someone wants to pick out a medical doctor. If they have very little time, I’d say that they should see whether the doctors they can choose from are board-certified (in the U.S., anyway) and in what. If they have more time, they can ask different experts to recommend particular people in their area who are good at particular subfields of medicine. None of these steps amounts to “doing some independent research” really; each of them will lead to a substantially better decision on average. All of these kinds of steps benefit from the advice of someone who actually knows the field.

  48. #48 Eli Rabett
    September 9, 2007

    Harald, LINPACK was ok and there are good succesors including those for massively parallel processors. The issue with R and Mathematica is that they are slow.

  49. #49 Brenda von Ahsen
    September 9, 2007

    As a lay person I want to thank everyone for the fascinating discussion. There is just one thing though. You guys do know that David Kane is a troll don’t you? There is no argument that you could ever produce that will change his mind or alter his position. All you need to know about what is really going on is in the very first sentence to his “penence” at #88:

    “I have been having fun on Deltoid recently”

    You don’t need to bring in a statistician to understand what is actually going on in this thread, you need a psychotherapist.

  50. #50 Sortition
    September 9, 2007

    > None of these steps amounts to “doing some independent research” really; each of them will lead to a substantially better decision on average.

    To me those things _are_ independent research. Every time you put in your time and effort to find out information and evaluate it, it is independent research. I agree completely with your description: there is a ladder of time and effort you can climb, each time investing more time and effort, getting a better understanding of the issues, gradually relying less and less on secondary and tertiary sources and more and more on primary sources. Climbing each rung on this ladder makes your decision more informed and increases the chance that you make a correct decision.

    > All of these kinds of steps benefit from the advice of someone who actually knows the field.

    I agree, but didn’t we start out by trying to find out who can be considered “someone who actually knows the field”? That’s part of your research.

  51. #51 rea
    September 9, 2007

    That’s a bit like Custer asking his aide, “are you SURE those are Sioux warriors slaughtering my troops?”

    Bad example, considering that a lot of those “Sioux” warriors were actually Cheyenne . . .

  52. #52 Jon H
    September 9, 2007

    rea wrote: “Bad example, considering that a lot of those “Sioux” warriors were actually Cheyenne . . ”

    The key point is that the tribe is beside the point – Custer’s attention should be on the ongoing slaughter of his troops, not fiddly details about who’s doing it.

    Likewise, it doesn’t really matter if the ‘Robert’ person sinking an argument is Robert Chung or another Robert. If the assault on the argument stands on its own, who’s making it is beside the point.

  53. #53 Stagyar zil Doggo
    September 15, 2007

    For people who haven’t found their way there yet, the discussion in this thread is nowhere near done and continues here.

  54. #54 Harald Korneliussen
    September 17, 2007

    Brenda, when you wrote “I want to thank everyone for the fascinating discussion” I assumed you were David Kane for a moment :-)

The site is currently under maintenance and will be back shortly. New comments have been disabled during this time, please check back soon.