Gene Expression

Over the past few days I’ve blogged a bit about the story about an HIV susceptibility allele; Evolution, a reason for the African HIV epidemic?, Overplaying “AIDS genes” and HIV susceptibility, a “black” thing, not a Duffy thing?. But there’s an important post Genetic Future, Duffy-HIV association: an odd choice of ancestry markers:

In the Duffy study the authors attempt to perform this type of correction using a set of just 11 markers they describe as “differentially distributed between European and African populations”. p-ter notes that several of these markers are not particularly ancestry-informative, and indeed on closer inspection it’s clear why this is: these genes weren’t originally selected on the basis of ancestry informativeness, but rather because they are associated with HIV biology. Every single one of the 11 markers has some association with HIV: three of them have previously been associated with HIV infection, progression, or response to treatment (CCR5 delta32, APOBEC3G H186R, GNB3 C825T); most of the remaining markers are in genes that are known binding targets or modulators of HIV (CCR5, CXCR4, PD1, TRIM5, IL-2, IL-4).

If that’s true – and it’s difficult to see any other rationale for using these HIV markers rather than a set of validated AIMs – this is poor form for at least two reasons. Firstly, it’s unlikely that using such a weak set of ancestry-informative markers provides an effective correction for a marker with as strong a correlation with ancestry as Duffy (as p-ter notes, all of the supposed ancestry markers are far weaker predictors of ancestry than the Duffy variant). Secondly, testing several different variants for an association with HIV and then only reporting the one that achieved significance creates the perfect conditions for a false positive due to multiple comparisons – it’s entirely possible that the Duffy association would not have survived correction for multiple testing. It’s difficult to assess this fully because the manuscript doesn’t seem to report a single P value (!), although I note that the lower edge of the 95% confidence interval of the odds ratio in Figure 2C is perilously close to 1 following their ancestry “correction”.

Read the whole thing…but something is starting to smell fishy. Hey Dave Appell, blogs rock and peer review sucks! (sometimes)