APS 2008: Can we learn from errors? What if we're running a nuclear power plant?

By dmunger on May 23, 2008.

Just a few quick notes about Michael Frese's talk, "Learning from Errors by Individuals and Organizations."

Frese gives a rule: "You make about 3-4 errors per hour no matter what you're doing."

If errors are so ubiquitous, maybe it makes more sense to train people to deal with errors, rather than to try to flush out every possible error. Frese and others have studied this phenomenon in the lab. They found that error management actually led to improved performance on computer training tasks: if you are trained to expect errors and deal with them, you do better on the task. There are limits to this approach; in general, the more complex the task, the more important it is to focus on error management rather than just avoiding errors.

They also found that feedback is important: If you have clear feedback, it's better to learn from errors than proscriptive training. If clear feedback isn't provided, then learning from errors isn't as effective.

Frese is an organizational psychologist, and for him the key is results. Small businesses do better -- in real, financial terms -- when their owners say they adapt well to mistakes. The killer stat from his talk: 20 percent of variability in corporate profitability is determined by error management culture. If a company focuses on managing errors rather than simply avoiding them, it's significantly more likely to be profitable than a company that focuses only on avoiding errors.

An interesting question from the audience: What if you're running a nuclear power plant? If an error is catastrophic, then how can you ever learn from it? Frese responded that he's actually consulted with power companies running nuclear plants. He said that errors inevitably occur, even in these situations. A company can take two approaches to errors when they occur -- sweep it under the rug while repairing the damage, or attempt to learn from the error and work on approaches to better handle such errors in the future. For him, the latter approach is still preferable, even when errors can literally mean the difference between life and death.

More like this

A Quantum Bogosity Updated

One of the coauthors on the paper which I claimed was shoddy has written a comment in the original post. Which merits more commenting!

Preventing injuries during surgery due to technical mistakes

You've probably heard the oft-repeated

How much difference can one coding error make?

In his statement on the coding errors Lott tries to downplay the significance of the errors:

I'll bet you don't understand error bars (updated with answers)

Cognitive Daily gets a lot of complaints about graphs, mostly from readers who say the graphs are useless without error bars. My response is that error bars are confusing to most readers. But perhaps I'm wrong about that. Now I'm going to put my money where my mouth is.

Power plants usually (always, I hope) have multiple levels of safety features, so that a catastrophic failure requires a whole series of errors. So error-management plans should be just as useful.

I think Rosie has it right. Most catastrophic failures (eg 3-mile Island, Chernobyl) are the result of a series of small errors/failures. If the initial error or failure can be quickly identified and (appropriate) corrective action taken then the chance of a disaster is reduced.

As a software developer, error handling is obviously part of my job description. I find myself in a constant battle with management regarding the importance of error detection. Too many of the systems we develop make no attempt to detect errors, especially data errors. Fortunately I don't work in the nuclear industry.

I worked in a nuclear plant for years. We constantly ran casualty drills to simulate emergencies and see how people react. Then, errors would purposely be worked into the scenario. For example, if certain switches or valves are supposed to be operated in an emergency, then halfway through the scenario we would be told that the wrong valve was operated because the operator freaked out. So part of the emergency has gotten worse. Now what do you do? We trained to account for errors and respond.

This turned into a long post. Feel free to skip to the last paragraph.

Another software developer here. While without a doubt it is important to code for all input sequences so as to be able to render the unwanted ones impotent, these are not the only error sources involved. If the proposition is true, which I think it is, we also make continuous errors while creating the coding, while learning the application domain, while fitting the program to the domain, etc. I would also discriminate between errors of rote learning (and Nuclear Plant operation, piloting, and so on are examples in that one trains for the job and does not - nor can - learn on-the-job how to deal with things that go wrong) and creative errors. I focus on the latter.

There appears to be what I call a 'cognitive loop' involved in the act of (code) creation: idea -> design -> implement -> test -> idea. That is, testing a creation that reveals errors feeds back into refining the idea and therefore the subsequent steps in the repeat loop. But the process is limited by available cognitive resources and the time it takes to complete one circuit of the loop - needing too many cognitive resources or if the loop time is longer than short term memory limits significantly interferes with the ability to notice and learn from the mistakes.

Traditional waterfall models of software development have shown a serious problem in that their cognitive loop is corporate not personal, and its duration is the product cycle, so the process is incredibly sensitive to correctness in requirements and specification. And unless the project is one of automating existing well-known processes (banking, chemical plant, etc) it is fundamentally impossible to actually know in advance what the requirements or specifications really are. Plus, on a personal note, my whole reason for being in this field is the joy of exploration, of doing what by definition has not been done before, of continuous learning OTJ.

Which brings me back to the small stuff: the everyday experience of a programmer. How to balance how much code to write before syntax checking, test harness runs, operational test. These decisions are modulated by the cost of doing each part. And this cost is implicit in amongst other things the tool set available. The traditional model is edit - compile - test - edit. Which can be quite a long loop - long enough to get a cup of coffee during the compile step (maybe). I have found that (for me) an exploratory model is much more productive - continuous course correction, continuously running and demo-able code, small increments. The biggest factor has been developing a programming environment that eliminated the compile step - suddenly the cognitive loop is closed at the edit stage and course-corrected motion towards the final product becomes more like swimming than run-wait-fear jerky progress. And the environment proved also to be accessible to non-programmers; our marketing VP said "I like VNOS because it makes me feel smart", and he wrote himself a weekly alarm applet that played an mp3 at happy hour on Fridays.

Summing up: the shorter the cognitive loop the better the learning, the surer the corrections; a lesson is learned only if the mistake is visible both as what is wrong with the result and why it went wrong and how to fix it and advance. Don't choke the student.

An entire field of human factors research is devoted to this problem.
Anyone who flies would like to know that planes are safer than cars, and safer than ever before. The reason we have the oldest ever fleet of jumbos the world has ever seen, and so few crashes?
1. All humans err, accept it, and build in layers of defenses. In aviation (and nuclear power) this "Swiss cheese" model is used. Ie. anyone can err, be it the guy checking the bolts in the engine, or the pilot who has flown for 10 hours, but for every conceivable KNOWN error there is a check, or built in buffer of safety. for an accident to occur, (or a less serious incident) the holes in the slices all have to line up.
2. Learn from mistakes and errors. Not just how to fix, or accommodate, or mitigate, these are cover-ups, BUT learn how to recognize, and respond to errors which will occur. This is a safety culture. This has enabled airlines to get to a point where 80-90% of all incidents are traced to human error, and most with some intent to cut corners... But that was another assignment.
Simplistically put...
The three mile island incident less so, but Chernobyl certainly was a poorly attempted safety culture, that resulted in attributing blame to individuals and not finding solutions; so when mistakes where made, and many were, they where ignored, or covered up. In conjunction with some design flaws, which to have been superseded by this processes of constant learning... (for example failed pumps can no longer result in lack of coolant, but rather in to much, and electricity failures within a reactor result in (reaction-halting) carbon control rods being dropped IN, not stuck OUT... and so on) ...
3. Always focus on safety. This means looking for things to improve, and looking fro errors or failures, and pretty soon you get real good at spotting anomalies, and at finding ways to engineer, or train to avoid them.

There are others not relevant here, but any system is enhanced with an attitude of open eyes and acceptance.

Sorry if I make no sense, It is late here, and it was a long day finishing an assignment...

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Cognitive Daily Closes Shop after a Fantastic Five-Year Run

January 20, 2010

Five years ago today, we made the first post that would eventually make its way onto a blog called Cognitive Daily. We thought we were keeping notes for a book, but in reality we were helping build a network that represented a new way of sharing psychology with the world. Cognitive Daily wasn't the…

Both musicians and non-musicians can perceive bitonality

January 20, 2010

Take a listen to this brief audio clip of "Unforgettable." Aside from the fact that it's a computer-generated MIDI performance, do you hear anything unusual? If you're a non-musician like me, you might not have noticed anything. It sounds basically like the familiar song, even though the…

Synesthesia and the McGurk effect

January 14, 2010

We've discussed synesthesia many times before on Cognitive Daily -- it's the seemingly bizarre phenomenon when one stimulus (e.g. a sight or a sound) is experienced in multiple modalities (e.g. taste, vision, or colors). For example, a person might experience a particular smell whenever a given…

Does watching TV really kill you?

January 12, 2010

Today I had to put off my normal morning run in order to make time to be interviewed on a radio show at 7:30 a.m. As I waited on hold for the interview to start, I could hear the hosts joking back-and-forth about what the "latest TV controversy" is. "Is it the Jay Leno / Conan O'Brien news on NBC…

The outfielder problem: The psychology behind catching fly balls

January 7, 2010

It's football season in America: The NFL playoffs are about to start, and tonight, the elected / computer-ranked top college team will be determined. What better time than now to think about ... baseball! Baseball players, unlike most football players, must solve one of the most complicated…