ITIL 中国:浏览精彩论坛

A cautionary tale about nuclear change management

发布: 2008-8-06 17:39 | 作者: cw | 来源: cw | 查看: 7次

TAG: ITIL itil Management management change information architecture nuclear power patch SCADA

字号: | 推荐给好友 上一篇 | 下一篇

When I got my daily Washington Post email this morning, I couldn't wait to read the story titled "Fifth coyote attack proves predators still lurking in Estero." 

Oops, that is the Florida update!  Now where is it... Oh yes, here we go.  "Cyber Incident Blamed for Nuclear Power Plant Shutdown."

Here's the gist of the story:

The incident occurred on March 7 at Unit 2 of the Hatch nuclear power plant near Baxley, Georgia. The trouble started after an engineer from Southern Company, which manages the technology operations for the plant, installed a software update on a computer operating on the plant's business network.

The computer in question was used to monitor chemical and diagnostic data from one of the facility's primary control systems, and the software update was designed to synchronize data on both systems. According to a report filed with the Nuclear Regulatory Commission, when the updated computer rebooted, it reset the data on the control system, causing safety systems to errantly interpret the lack of data as a drop in water reservoirs that cool the plant's radioactive nuclear fuel rods. As a result, automated safety systems at the plant triggered a shutdown.

The story goes on to cite several other incidents, including one where a malfunctioning computer indirectly caused the deaths of three children.

Unplanned nuclear plant shutdowns used to be a fairly common event, but not anymore, Joe Weiss, managing partner at Cupertino, Calif.-based Applied Control Solutions, said.  In fact, he said, another shutdown of a U.S. nuclear plant was also precipitated by a cyber event. In August 2006, Unit 3 of the Browns Ferry nuclear plant went into a shutdown after two water recirculation pumps failed. An investigation found that the controllers for the pumps locked up due to a flood of computer data traffic on the plant's internal control system network.

Weiss said many people in charge of SCADA systems have sought to downplay the threat that hackers pose to these complex networks. But he cautioned that internal, accidental cyber incidents at control system networks can be just as deadly as a carefully planned attack from the outside.

In June 1999, a steel gas pipeline ruptured near Bellingham, Wash., killing two children and an 18-year-old, and injuring eight others. A subsequent investigation found that a computer failure just prior to the accident locked out the central control room operating the pipeline, preventing technicians from relieving pressure in the pipeline.

OK, we have two glaring problems here.  The first, obviously, is poor change management.  In my day job, we don't deploy software patches until we have tested them thoroughly and have determined that they pose no threat to the operations of the agency.  That is called good policy.  Once, at my previous CIO job, a contractor applied an enhancement to the agency's email system during business hours, without testing and without securing permission from the Change Management team.  He had been warned previously that we had strict policies regarding change management.

Exchange shut down hard.  Over 15,000 users were without communications.  The phones lit up like a Christmas tree.  And so did my face.

I fired him on the spot and had him escorted from the building in full view of everyone.  I then called the contractor's employer and told them that if they continued to supply me with contractors who would openly flaunt the agency's IT policies, the entire company would be fired and billed for any lost time incurred by an agency of 26,000 people and 15,000 total computer users.

Change management is not just an ITIL requisite and good sound policy.  It is there to safeguard systems and, by proxy, the people who use them.  The above-mentioned 1999 incident that killed three young people may have been unavoidable.  the other two incidents, however, were imminently avoidable.

The other problem that we see here is one of poorly designed information architecture.  You would think that an organization that handles nuclear fuel and has the lives of potentially millions of residents in its hands would have the best-designed information architecture going, right?

You would be wrong, Grasshopper. 

SCADA systems first showed up on the radar during Y2K preps, when no one really knew if SCADA systems would go HAL on everybody and malfunction, shutting down freshwater, wastewater and other utility systems.

Turns out, SCADA systems stayed on the reservation.  But their vulnerabilities were duly noted, especially in the aftermath of 9/11.  The fact that some utilities -- including nuclear utilities -- are stupid enough to attach the servers that control and manage  SCADA systems to the same Internet that runs porn and Nigerian scams and MySpace is ludicrous.

It is also dangerous.

Somewhere in the institutional memory of these utilities lurks people who would have been able to issue warnings about linking SCADA control systems with the Internet.  Why those servers were Internet-attached speaks to poor architecture, poor change management and poor information security.

The cautionary tale here is that everyone can have their own meltdown.  Improve your change management policies and open up the team's membership to those who manage other forms of technology, not just IT, if their technology touches IT. 

You also should know and fully understand the ramifications of what is bolted onto your IT systems.  And you should scream like Hell if and when you find out that something is attached to your network that shouldn't be there.

Finally, another ITIL issue:  Know what apps your servers are running.  Change management practices include knowing every piece of software that is running on a server.  That way, if you take a server down for maintenance, you can alert all users of the applications running on those servers well in advance of the shutdown.  Proper change management would have told the utility that the server in question was monitoring the aforementioned chemical and diagnostic data.  

 

评分:0

我来说两句