Sample swaps at 23andMe: a cautionary tale

By dgmacarthur on June 7, 2010.

Personal genomics company 23andMe has revealed that a lab mix-up resulted in as many as 96 customers receiving the wrong data. If you have a 23andMe account you can see the formal announcement of the problem here, and I've pasted the full text at the end of this post.

It appears that a single 96-well plate of customer DNA was affected by the mix-up. This resulted in incorrect results being sent to customers, with alarming consequences in some cases; one mother posted on the 23andMe community about her distress upon discovering that her son's results were incompatible with the rest of the family:

Still upset I checked family inheritance and noticed my daughter shared with me, and then I checked my son's. He was not a match for any of us. I checked his haplogroup's and they were different from ours. I started screaming. A month before my son was born two local hospitals had baby switches. I panicked and I checked over and over. My kid's were sitting at the computer because we all wanted to see the results. My son laughed but he looked upset. I called my sister in tears.

Although 23andMe's announcement of the problem is commendably open, and it appears that the problem has now been fully resolved for the customers involved, there are numerous complaints in the announcement's comments thread about the length of time it took for customers to get feedback on their puzzling data. In addition, there's concern about the company's failure to perform even basic error-checking (e.g. confirming that the sex determined from genetic data was consistent with that specified by customers), and its vagueness regarding the corrective measures it plans to take to prevent this from happening in the future.

This isn't the first time that personal genomics companies have mishandled customer data. In August last year, New Scientist's Peter Aldhous revealed sporadic problems with his mitochondrial DNA profile in deCODEme (which later turned out to be the product of a software glitch), and deCODEme also fumbled the process of translating 23andMe data last December.

Mistakes happen - as any lab tech can testify, these sorts of sample swaps happen with frightening regularity even in clinical diagnostics labs. However, if the industry is to survive the massive scrutiny currently being pointed in its direction following the Pathway/Walgreen's debacle it must avoid any appearance of being amateurish; this type of mistake adds even more fuel to the regulatory fire burning under the industry. 23andMe needs to move fast to make sure that serious safeguards are put in place to make sure this never happens again. In addition to increasing lab quality assurance (at 23andMe's testing facility, LabCorp) 23andMe should introduce basic sanity-checking procedures to look for obvious data swap problems.

Customers also have a role to play here in checking their own data for obvious mistakes - and it is noteworthy that in this case the sample problems seem to have been detected by customers rather than the company. This is a good time to repeat my advice from last year:

However, this incident serves as a canary in the personal genomics coal-mine - a warning of the challenges that lie ahead for companies in ensuring that massive, complex genetic data-sets are presented accurately to consumers.
It's also a useful reminder to personal genomics consumers to not take their results for granted. The process between spitting into a cup and viewing your genetic results online involves multiple steps where things can go wrong, ranging from errors in sample tracking (the most pernicious and difficult to correct), through genotyping problems (usually much easier to spot), to errors in data analysis and display.

In general the odds of a given genetic data-point being wrong are very low, but they're sufficiently far above zero to warrant caution in making too much out of any single result - mind you, given the extremely small effect sizes of most of the variants currently assayed by personal genomics companies, that's good advice anyway. Certainly it would be a good idea for customers to seek independent validation of any result if they intend to use it to guide serious health or lifestyle decisions.

But the most important piece of advice for personal genomics customers is to engage with your data. Aldhous only detected these anomalies because he was exploring his own genetic data in multiple ways, cross-checking it against both other data and his own (informed) expectations, and was persistent enough to follow up on the strange results he found.

That's a good example for other personal genomics customers to follow: rather than being a passive recipient of genetic forecasts, dig into your data and see if it makes sense, and keep asking questions until it does. In addition to making it more likely that you'll pick up any errors in your results, you'll also develop a much deeper understanding both of the nature of genetics and of your own genome.

Some errors you won't be able to spot, but make sure you're informed enough to spot at least the most glaring problems.

Here's the full announcement from 23andMe:

We recently determined that a number of new 23andMe customer samples were incorrectly processed by our contracted lab. We want to clarify what happened with the sample errors, how it happened and what we're doing to prevent it from happening again. Providing each and every one of our customers with accurate data is 23andMe's number one priority, and we fully realize the gravity of this incident.
Up to 96 customers may have received and viewed data that was not their own. Upon learning of the mix-ups, we immediately identified all customers potentially affected, notified them of the problem and removed the data from their accounts. The lab is now concurrently conducting an investigation and re-processing the samples of the affected customers and their accurate results will be posted early next week. We expect the investigation will be complete over the next several days and we will provide further details when we have them.

We are currently putting additional procedures in place that will add an extra layer of safeguards to help assure that similar incidents do not occur in the future. We are deliberating on a process that would include removing manual steps at the lab, completely automating the sample analyses, and implementing further checks of the data before it gets loaded into customer accounts. Please be assured that our testing laboratory's processes comply with strict professional, regulatory, and corporate quality assurance standards for ensuring that all laboratory test results are accurate. The laboratory will adopt corrective action as warranted based on the findings of the investigation.

The science behind 23andMe's personal genetics service remains proven and sound. We recognize that this is a very serious issue and your trust is of the utmost importance. We hope that this helps clarify what has happened and what we are doing to prevent these problems in the future. Please contact me at khomenko@23andme.com if you have any further questions. We appreciate your comments and feedback.

Update 8/6/10: 23andMe has provided details of the cause of the incident and the steps it has taken to prevent similar problems in the future.

More like this

Oh dear. As you say, it's a mistake that can, and probably has, happened to every lab. I wonder what happened, 96-well plate inverted? Or was it the data transfer where there was a mix up. Like you though, I am amazed that such a gross error was not picked up in their QC - basic procedures should have detected it, including blind controls (do they run them?) and the gender determination you mention. These are the sort of protocols that CLIA is supposed to control for.

Anyway - it's a biggish error but I hope we don't see any gloating or self-satisfied comments from competitors, would be a bad move, because one day it will happen to them as well.

Please be assured that our testing laboratory's processes comply with strict professional, regulatory, and corporate quality assurance standards for ensuring that all laboratory test results are accurate.

I don't find that very reassuring, somehow ...

Nature News has a timely piece this week:

Alla Katsnelson (2010)
Biologists tackle cells' identity crisis:
DNA fingerprinting scheme aims to make sure researchers are working on the right cells.
Nature 465, 537 | doi:10.1038/465537a

http://www.nature.com/news/2010/100602/full/465537a.html?s=news

I am one of those who received the mixed-up results from 23 & Me, and also one of the ones who caught it early because I stay on top of the DNA lines I research. First of all, we all KNOW lab mix-ups happen. I have had that happen in a doctor's office where the doctor insisted the medical results were correct. I went to the lab and got a retest which proved they were wrong. Secondly, the company reacted within a 24 hour period. Yes, I was initially shocked and upset, when I received the results. I also knew they had to be wrong. I contacted others on the ISOGG (Int'l Society of Genetic Genealogists) mailing list to find out who else got bad results. I also monitored the 23 & Me Community Forum and also found others. Many of us worked together to contact those we knew at 23 & Me to find out what the problem was. Later we were sent messages explaining there was a problem and the bad results were pulled. Some of us received phone calls to discuss it with 23 & Me as well, and asked if we had any questions. I worked many years for big corporations and know that that type of response in 24 hours was great. It is sad that this happened during all the furor over regulating this market. Sadly, it was California's government regulation that forces 23 & Me to outsource their lab work. I definitely hope nothing like this happens again, however I also want it recognized that this company took responsibility early to resolve it. I lived it and am satisfied with the response!

Good comments Nora, and you're quite right, the worst timing possible.

Still, better than cutting off the wrong leg or removing the healthy kidney...I don't think that 23andme need to consult BP PR damage limitation just yet (although probably they are the last people to call on for help)

One of my family member's test was also affected by this mix-up. Since I am the genetic genealogist in the family, I quickly realized that there was a mistake in the reporting and contacted 23andme. For me, personally, the response from 23andme was immediate and satisfying. In this case, the community worked together to quickly ascertain that there was a problem and to alert those who may have been affected. This was an important step toward self-regulation. Now we must quickly take the next step toward ongoing, qualified and organized self-regulation. This way, in my opinion, the genetic genealogy community will demonstrate that there is not a need for regulation from outside agencies.

I can't defend sloppy work like this, and I can't imagine an acceptable excuse for a corporate culture that supposedly prizes informatics and superior operational efficiency like 23andMe purports to do. But, in perspective, I remember than the United States Veterans Hospital had recently disclosed that veteran patient records had been scrambled. The story was told to me that the scramble was discovered when a female patient complained about her prescription for erectile dysfunction drugs.

I doubt that the US VA mixed its records due to insufficient federal regulation and oversight. I would like to mention this before anybody suggests that the solution to incompetence is to declare incompetence to be illegal.

Disclosure: I have a consulting arrangement with 23andMe, acting as a liaison to the genetic genealogy community. However, I have no insider knowledge about the mechanics of database management, and the observations here are mine, not official policy of 23andMe.

The batch in question was uploaded Tuesday night. I happened to be up in the middle of the night, and I had received several queries about odd results. I immediately notified 23andMe, doubtless along with others who wrote to 23andMe directly. I received a response before 7:00am Wednesday that they would look into the matter. Over the course of a single day, they identified the scope of the problem (just one of several 96-well batches uploaded that night) and notified everyone who was affected. I find that to be an admirably prompt response, given the complexity of the analysis.

I'm not altogether certain that a "sanity check" such as gender would be practical. The lab has only a bar code, which somehow got out of sync with the samples in this particular batch. 23andMe later connects the bar code to a claim code, which assures that the testing laboratory has no way of identifying an individual. There may be no personal information attached to the claim code, since the person who orders the test may not have created a website account with the claim code, or may not even be the person submitting the sample. The best sanity checks (I'm proud to say) came from the genetic genealogy community, which not only "engages with the data" but has a very collaborative approach to studying it.

Ann Turner, M.D.
co-author (with Megan Smolenyak) of "Trace Your Roots with DNA"

Sounds like they did a very commendable job recognizing the error and communicating information. Simple mistake, and its absurd to think mistakes never happen. (its not like they were repeatedly fined for safety violations long before the rig blew up, or anything like that - we should save our outrage for the corps who deserve it)

I was thrilled when 23andme offered the test for such a HUGE discount on DNA day awhile ago, but I did wonder if they had the infrastructure to keep up with the one day demand spike. I've not yet submitted my test, figuring I'd give them a bit of time to work through the early returners.

I think now is the perfect time to send back my kit, I'm sure they are going to be extra careful from here on out.

@Ann

Thanks for your inside information. It does seem like 23andme reacted quickly. I understand about the problems with using gender testing as a control but you absolutely cannot rely on the "best sanity checks (I'm proud to say) came from the genetic genealogy community". It's not a sanity check, it's a discovery of wrong results already distributed. Running control DNA samples, several per 96-well plate, would have picked up the error if it had been a total scrambling of the data, or the results being out of sync. I've been through these sorts of errors several times over the years, but they were caught before any damage was done. Running loads of controls (rather than just 1 or 2 per complete run) is expensive - but worth it in the long run.

@Ann:

I'm not altogether certain that a "sanity check" such as gender would be practical. The lab has only a bar code, which somehow got out of sync with the samples in this particular batch. 23andMe later connects the bar code to a claim code, which assures that the testing laboratory has no way of identifying an individual. There may be no personal information attached to the claim code, since the person who orders the test may not have created a website account with the claim code, or may not even be the person submitting the sample.

And I thought consent was required!

Anyhow, to assume there are no errors, and not check what you can check, for whatever reasons, is just complacent ...

@ Keith

Running control DNA samples, several per 96-well plate, would have picked up the error if it had been a total scrambling of the data, or the results being out of sync. I've been through these sorts of errors several times over the years, but they were caught before any damage was done. Running loads of controls (rather than just 1 or 2 per complete run) is expensive - but worth it in the long run.

... and I don't think loads of controls are necessarily the answer.

Now, I have submitted sample manifests to both Affymetrix and Illumina for academic GWAS, and led subsequent sample reconciliation efforts.

We were expected to supply 3 types of information as part of the sample manifest in addition to an anonymous sample identifier:

1. the subject's sex, at the vendors' request and expressly for sanity checking each group of samples, as well as singly;

2. DNA extraction parameters (concentration, method) in case these relate to poor performance, for which we will be billed;

3. whether the sample in well x is an intentional duplicate of a sample in well y.

The latter is probably the most open to negotiation, but "none" has not been an option, as far as the vendors are concerned.

With regards minimising the number of non-billable samples, one could use one or two reference samples per 96-well DNA plate as positive controls, and TE buffer as a negative control, but this would only tell you about day-to-day processing efficiency, and genotype repeat rates in near-perfect samples.

By preference, and where we have the DNA stocks, we have included a different pair of study samples in each plate, in different well positions (so we can detect plate rotations), and in different rows and columns (as the processing is not usually plate-based, but in groups of rows or columns, and this helps detect processing issues).

Yes, I agree this is much harder to automate than sticking positive and negative controls at each corner of a plate, but you do find out rather more. For instance, that even the very best labs make mistakes.

Errors like this at a CLIA-certified laboratory make me wonder about the prevalence of similar errors in the medical arena, with potentially even worse consequences. Errors in DNA microarray results, like those provided by 23andMe, can be relatively easy to identify, compared with, for example, blood test results or drug test results. (For example, earlier this year, I received anomalous blood test results from Quest Diagnostics, and I suspected the possibility of laboratory error, such as sample swap; however, I have no easy way of proving or disproving the occurrence of such an error.) Doctors can base life-and-death decisions on the results of a blood test and a false positive drug test result can have obvious negative consequences, so I think this could be a serious issue even outside the area of genetics testing.

Thank you Neil for more clarity and detail - I agree with the controls you propose. By "loads of controls" if I had added detail I would have said what we do: standard DNA samples and negatives plus several "customer" sample controls which as far as the lab is concerned (either internal or outsourced) are genuine samples, so these are added randomly to the plate (2 or 3 per plate).

I suppose for DTC it's hard to ask for too much info, even sex, when promising absolute confidentiality. For academic GWAS it's OK, there is a relationship with one customer for 1,000's samples rather than 1'000's of relationships, one per sample.

There's no reason not to have an "opt-in" questionnaire when people register on the website (after mailing off their sample) for the purposes of sample tracking: sex, eye colour and hair colour alone would be a start. If even 50% of customers filled in these details that would be enough to spot a dodgy plate with high confidence, even allowing wiggle room for incorrect answers given by some customers. Add the plate-specific control samples proposed above by Neil and you have a very good system for preventing plate mix-ups.

In fact, it's entirely possible that enough customers already provide these details prior to sample-handling in 23andMe's optional phenotype surveys for this type of analysis to be done.

Another perspective: we're suggesting human biochemical medical laboratory procedure as third party laypeople in a blog comment thread.

Thanks Keith.

I suppose for DTC it's hard to ask for too much info, even sex, when promising absolute confidentiality.

Yes, and I can also see that it would be problematic for a DTC company to say:

if you give us some not-very-personal details we are more confident we'll give you back the correct data

Less cynically, the best lab test we have for sample continuity is concordance testing against results achieved independently from the same sample, but handled differently.

This could be done (where blood is taken) using a bloodspot/Guthrie card and genotyping a few very common SNPs - as although it takes quite a few SNPs to make a sample unique, it doesn't take many to show up a problem, as the lab has to make a (hopefully very rare) sample handling error and the samples still have to match afterwards.

Not sure if something similar can be done splitting a saliva sample?

Or perhaps the data should come with a health warning? To take Daniel's examples above:

you've taken our test, so you're probably the sort of person who thinks genotypes are predictive. So, from your data, we PREDICT you are probably a brown-eyed male with curly hair - if this is not you, please contact your customer representative ...

PS: Andrew - if you want a reference for this, search for "we excluded 16 samples".

I had an electrician working on my house when he discovered through genetic testing that his two year old daughter wasn't his. He lost his mind and gave all my electrical devices 220 instead of 110 volts. It fried all my clocks and I knew something was wrong when my refrigerator sounded like it was a race car. I got a new electrician and left the old one alone even though he had cost me a lot of money. A funny story now, to me, but you can imagine the human tragedy of parents and children feeling betrayed and disconected from each other.

Another perspective: we're suggesting human biochemical medical laboratory procedure as third party laypeople in a blog comment thread.

Neil Walker is no layperson when it comes to large-scale sample handling.

As per a recent Spittoon blog post, future customers will be asked about their gender when submitting the sample.

http://spittoon.23andme.com/2010/06/08/update-from-23andme/

Thanks Ann - I've added my comments on the 23andMe response here.

Did the search, for other readers:

http://www.nature.com/nature/journal/v447/n7145/full/nature05911.html#a…

"We excluded 16 samples with discrepancies between WTCCC information and external identifying information (such as genotypes from another experiment, blood type or incorrect disease status)."

As for my comment, I don't mean to say that these suggestions are not good or not relevant. I thought that the idea to keep samples and results orthogonal was a credible solution. I don't know how or why that didn't work in practice because I don't know what happened. The public story is that a plate was flipped. So, I think that the problem was with the lab equipment that didn't differentiate between flipped plates, not that the theory of keeping data and samples orthogonal was proven to be fundamentally flawed in theory. That may not be the fault of 23andMe at all, although they would still be liable as providers of the test results. I don't know.

I find it disturbing that there were no controls in the plate. ALL the wells were customer samples? That makes no sense to me. I've never worked anywhere that didn't include at least one negative control (TE or similar) and one known/positive control well in order to catch these problems and to provide data QC. And that's including always labelling or barcoding plates on the same side every time to prevent mis-loading plates.

Well, for better or for worse, I was personally uncomfortable with having identifying information (gender, etc.) precede my obtaining results from 23andMe. This is for all of the reasons that Daniel mentions in the above post, where he references back to his 2009 post. As I informed the med students in my Personalized Medicine class, my greatest fear upon being genotyped was spending $$$ and ending up with the genotype of some âunknown 23andMe custodianâ (through a mix-up, as just occurred). In fact, I was so suspicious of the overall process of genotyping that I felt it mandatory to be genotyped by at least TWO different DTC providers; 3 would have been preferable, so that at least two sets of data should agree and simultaneously comply with my personal traits (gender, blood type, etc.). Fortunately, Navigenics had a deal last year at 80% off list price for joining the Scripps Genomic Initiative (Eric Topolâs). Only upon considerable scrutiny of the data from both 23andMe and Navigenics was I convinced that I was actually looking at my own genetic data and not that of some âunknown DTC custodianâ. Thatâs particularly relevant, given that Navigenics provides APOE results. At any rate, given the number of possible steps over which an error can arise, it amazes me that the process works as well as it does. I think 23andMe did a pretty good job of resolving the problem, and I think that paying attention to the wise advice on posts like this (and I believe they do) provides free support for the DTC genetic companies.

Bottom line- Iâm UNCOMFORTABLE providing pre-identifying information; the less the better. I think the system worked itself out. I realize most people canât afford a single DTC genetic test, let alone two. Just my opinion.

"basic procedures should have detected it, including blind controls (do they run them?"

Quite possibly blind controls were carried out yet showed nothing wrong.
Say we take 10 samples out of the 96 for a sanity check (that's over 10%) but for whatever reason those 10 were the only 10 not affected by some error (say the random sampler caused all 10 to be taken from one specific area of the sample tray, where the error was confined to another area of the try, could happen).

The only way to avoid this is to have all samples retested by another facility using different procedures altogether, compare the results, and have everything tested again at a 3rd independent facility with different procedures from either in case any differences are found.
Even then there's a miniscule chance things turn out the same in all tests even though they are in error, but it would reduce that chance as much as possible.
The cost for doing so would however be horrendous, effectively trippling the processing cost of each sample (as well as requiring larger sample sizes of course, as each sample must be sliced in at least 2 more parts than at the moment).

I do not and cannot know the testing procedures in place, but can speculate that there was some sanity checking and in this case due to whatever reason it failed.
Either the sample set was abnormally homogeneous for all the criteria used in the sanity check (maybe an entire run for a sorority of blue eyed black women who are natural blondes and the sanity check tests for gender, eye colour and hair colour just to give an extreme sample?), or the sanity check was performed on a sample of the total batch and nothing wrong was found in that sample (which might indicate a problem in the sampling procedures at best).

The problem was a flip of the whole plate, so even one known control in a known position anywhere on the plate would have revealed it in this case.

Obviously it's true that no amount of safeguards will protect against all possible calamities, but it's possible to minimise the risks of massive failure and common sources of error using some straightforward (and reasonably cheap) measures.

@23

my guess - and it is only a guess based on what we have been offered as a starting layout by the larger vendors - is that:

[a] a lab technician has been tasked with putting prepared samples into a 96-well plate, leaving specified wells empty for controls;

[b] this job has been done at a bench, by hand;

[c] the control samples are added in a separate step prior to automated processing on a robot.

Now, plates are usually manufactured to be hard to insert into a robot incorrectly - some have a corner nicked off, some have indentations on one side only (to assist a plate grabber) and most have barcodes on at most one long and one short side only so they can't be scanned in the wrong orientation - but, as they are only about 4" by 3", it is still possible to get it wrong on the bench.

One failure mode that is possible whilst using controls (and I'd assume their use is a given in a CLIA-approved lab) is to place them with rotational symmetry - typically the first and last wells are used - and to have the controls added in a separate process to the test samples.

All idle speculation of course.

@23

The only way to avoid this is to have all samples retested by another facility using different procedures altogether, compare the results, and have everything tested again at a 3rd independent facility with different procedures from either in case any differences are found.

Under the scenario I suggest above, a retest of the same plate would give the same result?

To be sure, if these results were being used to arrange a kidney transplant on the basis of tissue typing, or other major medical intervention - and Illumina, at least, are moving into this market - I would expect a complete retest of an independent sample to confirm that no sample mixups had occurred.

I don't think the same standards are not intended to apply here, and costs reflect this.

I think i have 1 that tops that,,, I had a DNA/Paternity Test REVERSED after 5 years by the Medical College of Virginia. In '03 the test came back "Negative" but when they "re-ran" the same sample 5 years later, the test came back "POSITIVE". MCV did, however, apologize if this caused any "inconvenience"........

" but, as they are only about 4" by 3", it is still possible to get it wrong on the bench."

Which suggests a similar system to prevent incorrect orientation be installed on the bench.
Engineering solution to a human problem :)

Especially if the placement is mechanically impossible if incorrectly oriented, this shouldn't be too hard to implement.

There was a case a few years ago in the UK where the owner of a paternity testing company took the money, threw away the samples and made up the results - http://bit.ly/bnv0xc

I like Dan Vorhaus' argument abouth this being a reason to favour DTC...at least he makes very good points showing why it cannot be used as anargument against DTC - http://bit.ly/cU8A2Q

As one with a Chemistry background and some experience in analytical research, relying on only lab and one test result is risky.

When I had my Y-DNA tested some years ago, I used both Family Tree DNA and Relative Genetics.

Markers tested by both companies were found have the same value, and the reliability of the methodology was was confirmed.

After reading this article, if I do additional testing, I will continue to do duplicate samples at two different laboratories.

When I found out that I was surprisingly related to the McGuire clan and not to listed Fox DNA profiles, I could believe that linkage.

If it is worth doing, it's worth the extra cost to establish the validity of the data.

"Sadly, it was California's government regulation that forces 23 & Me to outsource their lab work."

I'm not quite sure what this comment means. The lab that does their wet work is CLIA certified, and if it is in CA that means the people doing the pipetting were licensed clinical lab scientists. They contract their work out (I am supposing so that it is cheaper than setting up a CLIA facility de novo) -- would they not have performed periodic vendor audits so that they could keep an eye out for performance improvement opportunities. The CA licensure situation is a pain, yes, but I don't think THAT has anything to do with them outsourcing. IF you are going to be testing human samples that have anything to do with turing results around that are used for diagnostic or prognostic purposes, it pretty clearly states in CLIA regulations that this has to be done at a CLIA certified facility by licensed scientists -- THIS is why 23&me got the cease & desist letter from the state a couple of years ago. CLIA regs also set a framework for QA/QC procedures, and this sort of sample integrity incident will be something that will be closely reviewed the next time the lab is inspected by CLIA inspectors. THIS is why 23 & me needs to have someone on their team that is from the clinical lab arena -- not just a bunch of data mingers.

Sample swaps at 23andMe: a cautionary tale

More like this

Genetic Future is moving

One more step towards the end of recessive diseases

New FireFox plugin for 23andMe customers

Why you CAN have your $1000 genome - so long as you learn what to do with it

Bioscience Resource Project critique of modern genomics: a missed opportunity

My picks from the 2010 Olympus BioScapes Winners

A Feast for your Naked Eyes: Supermoon, Solar Eclipse and Venus Transit!

Kevin Barry, you magnificent bastard, I read your antivaccine book!