Process Control and Optimization, VOLUME II - Unicauca

Advanced Alarm Administrator (AAA Suite), Event Analysis Package. (Expalog), and Operation ... the formation of the Abnormal Situation Management (ASM). ConsortiumTM .... a theory of the state of the process makes sense. Operators.
244KB taille 1 téléchargements 33 vues
4.11

DCS: Management of Abnormal Conditions B. A. FITZPATRICK

(2005)

Professional Organization:

Abnormal Situation Management (ASM) Consortium® (www.asmconsortium. com)

Consulting Services:

Brad Adams Walker Architecture (www.bawarch.com) Human Centered Solutions (www.applyhcs.com) Mustang Engineering (www.mustangeng.com) TTS Performance Systems (www.myplantstraining.com) User Centered Design Services (www.mycontrolroom.com)

Alarm Management Software and Its Suppliers: Asset Integrity Management:

SILAlarm (www.assetintegrity.co.uk)

Control Arts:

Process Alarm Toolkit and Alarm History Analysis (www.controlartsinc.com)

Gensym:

G2-Expert Systems (www.gensym.com)

Honeywell:

Alarm and Events Analysis, Alarm Configuration Manager (ACM), and Alarm Scout (www.honeywell.com)

ICS Online:

IMAC (www.ics-ltd.co.uk)

Matrikon:

Process Guard (www.matrikon.com)

Nexus Engineering:

Real-Time Operations Excellence (rtOpx

PAS:

Plant State Suite (www.pas.com)

ProSys:

Special Alarm Management (SAM) (www.prosysinc.com)

TIPS:

Logmate/AMS (www.tipsweb.com)

Yokogawa:

Advanced Alarm Administrator (AAA Suite), Event Analysis Package (Expalog), and Operation Efficiency Improvement Package (Exapilot) (www.yokogawa.com/us)

TM

INTRODUCTION From an industrial process control standpoint, an abnormal condition develops when the process deviates significantly from the “normal” and acceptable operating range. In pragmatic terms, the “abnormal” that one must manage is a deviation that will lead to financial losses. The challenge of abnormal condition management is to provide the operating team with tools that will enable them to avoid or minimize the impact of abnormal conditions. This is not an insignificant undertaking, but it is one that stabilizes operations and has dramatic financial rewards. This section will provide a conceptual overview of abnormal condition management and will highlight key consider706 © 2006 by Béla Lipták

) (www.nexusengineering.com)

ations in a formal abnormal condition management program focusing on design elements for control room layout, operator training systems, alarm systems, and the graphical operator’s interface to the control system.

ABNORMAL CONDITION MANAGEMENT To manage anything, one must first understand its basic nature. From an industrial process control standpoint, an abnormal condition can be defined as the condition that develops when the process deviates significantly from the “normal” acceptable mode of operation. In pragmatic terms, the “abnormal” that one must manage is a deviation that will lead to unsafe

DCS: Management of Abnormal Conditions

$$

Environmental Damage Equipment damage

€€

Poor product quality

Poor raw material efficiency

Poor employee retention

$$

Poor energy efficiency

$$ € € Missed shipments

Personnel injuries

€€

Schedule changes

Injuries to members of the public

Insurance premiums

Damaged corporate image

$$

Production downtime

FIG. 4.11a Types of losses that can be caused by abnormal conditions.

conditions or losses. Figure 4.11a shows a sampling of losses that can be caused by abnormal conditions, including losses in efficiency, quality, production, or equipment or environmental damage, personnel injuries, and other softer issues. The preventable losses to the U.S. economy are estimated at several billion dollars annually. Recognition of this led to the formation of the Abnormal Situation Management (ASM) TM Consortium , a group managed by Honeywell and composed of several manufacturing companies, universities, and other organizations performing research and development focused on developing technologies to aid in abnormal management. The consortium has published several guidelines and documents for use by member companies. Additional information can be found at www.asmconsortium.com. Types of Control In most industrial processes, the operator no longer initiates the majority of the steps of the operation. The days of (localand panel-mounted) single-loop controllers whose performance was orchestrated by human operators is largely gone. Distributed control systems (DCSs) are generally financially attractive to implement and have become quite common. Control schemes have become quite robust and generally require little operator intervention. The trend in staff downsizing has resulted in ever-fewer operators managing an increasing number of loops. In the days of the panel-mounted single-loop controller, the entire span of control for an operator was generally mounted within his or her sight. Alarms and annunciators were mounted on the control panel. The action required to intervene was generally visually apparent and in close proximity. With the advent of the DCS and optimizing control systems, the interactions between control loops became less

© 2006 by Béla Lipták

707

apparent in the new operating graphics. Today, a single line of text in the alarm summary can be all the information that is provided to the operator. The degree to which the operator understands the process provides the context that leads to the decisions about intervention. In case of a significant upset it is common that the operator will be overwhelmed or “flooded” with hundreds or even thousands of alarms, signaling a variety of abnormal conditions. This sudden flood of information is hard to handle. The newer types of controllers tend to also diminish the likelihood that the operator will understand all the details of the control algorithm and its interactions. As control systems are becoming more complicated and more robust, an unfortunate result is that abnormal conditions that occur are increasingly difficult for the operator to recognize and respond to in a manual mode. Figure 4.11b shows a pyramid of a typical control system. Moving up the pyramid, profitability and on-stream reliability increases, but the complexity of the control strategy rises and therefore the role and the understanding of the operator can diminish. When the control strategy is very complex, no matter how well the operators are trained, they cannot fully understand all the possible interactions and consequences. One of the challenges of abnormal conditions management is to provide tools to the operations team to bridge this gap in understanding. Need for Operator Intervention With all of the capabilities of current technology, one might question why operator intervention is needed at all. The reason is Murphy’s Law—anything that can happen, will. For this reason, human operators are required to reason and adapt their knowledge of the process and develop a response plan to any situation. While adaptive controls and optimization technologies are emerging and are useful in obtaining everhigher levels of automation, it is still probable that human operators will be needed when systems fail. An analysis of the root causes of abnormal conditions and incidents suggests that in some 40% of the cases, operators have played a role in causing the accidents. Figure 4.11c 1 recaps the general findings of varied studies performed to determine the root causes of incidents that have prompted significant losses.

PSYCHOLOGICAL BASIS FOR INTERVENTION To work out how to best enable the operating team in the management of abnormal conditions, it is useful to understand the basic steps of cognitive processing. There are numerous models for the psychology of operations, notably that of Swain and Guttman, who suggest a three-step model of Orienting, Evaluating, and Acting. These steps are key elements in the Human Intervention Framework developed by the Abnormal Situation Management Consortium. Figure 4.11d

708

Control Room Equipment

RTO/MES

Real-time optimization and manufacturing execution systems

Operator detailed process knowledge

Expert or other advisory systems

Knowledge-based control

Multi-variable Neural networks control (DMC/ or others RMPCT) models

Feedforward control

Code control Loop tuning

Basic loop control

Logic Blocks

Enhanced control

Control scheme monitoring

DCS indication

Field indication

Multi-variable control

Field (local) controllers

Basic control

Field control

FIG. 4.11b An approximate pyramid of the elements of an advanced control system.

proposes a simple sequential model of Detect, Sort/Select, Plan/Act, and Monitor.

Process 20%

Detect Phase People 40%

Equipment 40%

FIG. 4.11c Root causes of industrial upsets and accidents.

Detect

Sort/Select

FIG. 4.11d Sequential model for operator intervention in case of an upset.

© 2006 by Béla Lipták

Tools for the Detect Phase of operator intervention should be focused on ensuring that the operator is aware of conditions in his or her span of control. There are several elements that are critical to successfully managing the operator’s awareness of process conditions, including the design of the control room environment, the alarm system, and the graphical user interface (GUI). It is very important to remember that operators have human physical and mental limits that can be exceeded in abnormal conditions. Thus, it is important to factor these limits into the design of critical systems. Also, advanced detection tools exist that can aid the operator by helping to detect abnormal conditions early and give the operations team additional time to avert an upset or loss.

Plan act

Monitor

DCS: Management of Abnormal Conditions

Sort/Select and Monitor Phases

709

Tools for the Sort/Select and Monitor phases of operator intervention should be focused on ensuring that the operator will draw the correct conclusions regarding the state of the process. Key elements include Operator Training Systems, Advanced GUI Design, and Advanced Alarm System handling. Fundamentally, any clues regarding the root causes of upsets and any prevailing interactions should be presented and managed in these systems. One other important consideration is Additional Operator Information Needs. Systems should be assessed to ensure that any information relevant to diagnosing abnormal conditions should be presented to the operator if feasible.

In the stress of the moment, it is likely that the operator will not process all available information, but will act when a theory of the state of the process makes sense. Operators generally look for patterns in the behavior of the process. Once a pattern is recognized, then it becomes “reality” until an accumulation of data might prove it otherwise. In fact, as soon as a pattern is selected, the operator focus will shift from understanding the upset to monitoring intervention activities. The focus will not shift back to a search for general system understanding until it becomes clear that the intervention is not improving the situation or several new alarms do not match the pattern selected. Thus it becomes critical to provide support in identifying the root cause of a given condition.

Plan/Act Phase

Planning the Intervention

Tools for the Plan/Act phase of operator intervention should focus on ensuring that whatever action taken is the correct one. Key elements here include:

Once the problem has been identified, the key issue becomes one of planning and executing the appropriate intervention. Any known responses should be designed into the automatic control system. If the response cannot be automated, then the operator should be instructed to make the required changes. Any advice that can be concisely presented may speed up the intervention process. If there is no pattern that can be diagnosed automatically, then the interface to the control system should be designed so that the process state and critical interactions are visually apparent. It is important to remember that the abnormal conditions will generally be complex and will result from a series of 3 consequences rather than one single event. Belke performed a review of chemical industry accidents looking for common causes and found that major disasters are often preceded by a series of smaller accidents, near-misses, and other known precursors. Plants are routinely run “blind” with failed instruments. And, at times, “without a safety net” because shutdown devices are bypassed. The decision to continue operating under these conditions is a critical one because a new event or failure combined with the existing failures is often simply too much for the operators to overcome in the time available to intervene.

• • •

Graphics showing the entire span of control Contextual information displayed on key graphics (including advice on response) Appropriate update rates that show the impact of operator moves

Several other factors can impact the ability of people to perform effectively in a plant environment. These include personal factors such as knowledge, skills, motivation, and personality, and group factors such as the working environment, organizational structure, and work hours. Psychological and work process factors like availability of procedures, communications within the organization, work methods, control display relationships, and task criticality also have a 1 profound effect on an operator’s performance. Response Time The time available for operators to respond is a primary requirement. Industrial processes are dynamic, and the pace of change during an abnormal condition, particularly once it has been detected, can be quite rapid. Varied studies have repeatedly suggested that the less the time available to respond, the 2 more likely that the correct response will not be found.

TABLE 4.11e 2 Probability of Failure to Respond Time Available (Minutes)

Probability of Failure

1

~1

10

0.5

20

0.1

© 2006 by Béla Lipták

30

0.01

60

0.001

Types of Operations Finally, it is also important to note that the type and quantity of operator interventions needed are functions of the type of operation. Figure 4.11f shows a proposed model for the types or modes of operation. The first type of operation is under Optimal Operations, where the process is operating at peak efficiency and quality. Optimal Operations is considered to be a subset of Normal Operations. Normal Operations can have some efficiency and quality losses, recognizing a range of operations that is considered acceptable. As the magnitude of losses increases, the type of operation becomes Upset Operations. This mode of operation might incur some minor equipment damage. As the problem escalates, the mode of operation can shift into Emergency Operations. Emergency Operations has more

710

Control Room Equipment

CATASTROPHIC LOSS -ESD fails Unit shutdown -Emergency shutdown system (ESD) acts

Operator intervention

Emergency operations -Significant efficiency and quality issues -Significant equipment damage -Injuries, environmental damage -Significant loss of production Upset operations -Large efficiency and quality issues -Minor equipment damage Normal operations -Some efficiency losses in raw materials and energy -Some offspec or lower quality product

Optimal operations -Peak efficiency and quality

FIG. 4.11f Model for defining the types of prevailing states of operation.

significant losses in efficiency, quality, and equipment damage. Also possible are injuries, environmental damage, and loss of production. The model also includes an escalation into Unit Shutdown and, with a failure of the Emergency Shutdown Systems, an escalation into a Catastrophic Loss. As the type of operation changes, the operator also has different primary active goals. While safety is generally the primary goal of all types of operations, it is not the most active goal during, for example, Optimal Operations. During Normal Operations, operators can, in theory and in fact, undertake a variety of tasks that are not directly related to “operating” the plant. It is common for operators to undertake personal training or to support projects during these relatively quiet times. However, as an abnormal condition develops, the operator

becomes fully engaged in operating. The pace and magnitude of intervention generally accelerate. Table 4.11g summarizes the most common active goals and types of operator intervention during the different types of operations. So, as the time available to respond drops and the likelihood of successful intervention diminishes, the need for operator intervention increases, as does the magnitude of the likely losses if the intervention is not successful.

MANAGING ABNORMAL CONDITIONS There are certainly as many approaches to managing abnormal conditions as there are causes for the upsets themselves. Perhaps the most prominent and effective lines of defense

TABLE 4.11g Overview of Operator Intervention and Goals Type of Operation

Operation’s Active Goals

Major Type of Intervention

Optimal

Exceed targets for production, efficiency and quality

Process monitoring (minor SP and OP changes)

Normal

Move to Optimal Operations

Process optimization (small SP and OP changes)

Upset

Return to Normal Operations

Process stability (large SP and OP changes)

Emergency

Ensure safety, stabilize unit (if possible)

Process safety (major SP and OP changes)

Shutdown

Secure unit, prepare for restart (if possible)

Process safety (major SP and OP changes); interact with ESD system

Catastrophic Loss

Secure unit for safety, minimize losses (if possible)

Emergency containment, field isolation

SP: set point; OP: output.

© 2006 by Béla Lipták

DCS: Management of Abnormal Conditions

are in the control schemes and in the design of the emergency shutdown systems. Care should be and is taken in the design of these systems. However, the following ancillary systems can also be useful in the management of abnormal conditions and care should be taken in their design: • • • • •

Control rooms Operator training programs Alarm systems Graphical user interfaces (GUI) Advanced condition detection and advice systems

Control Room Design Early control rooms were not much more than glorified umbrellas designed to do little more than provide shelter for equipment and operators. The early equipment was reasonably impervious to the elements, so the designs were not complicated and operators got little extra consideration. As the control equipment became more advanced and environmental requirements were added, the control rooms generally became actual buildings requiring architectural design support. Over time, safety considerations have became more prominent, so it is relatively common today to have blast-proof buildings with advanced environmental controls. In fact, a specialty within the field of architecture eventually emerged, focused specifically on control center design. Additionally,

reliance on video display terminals for process control system interactions with operators makes the ergonomics of workstation and console design critical. A detailed discussion of the design and upgrading of control centers is provided in Sections 4.1 and 4.2 in this chapter. It is important to remember, however, the needs of the operating team during abnormal conditions and to factor those needs into the final control center design. The control room is the communications center for operations. Varied types of people need to interact with the workstation operators. But these groups of people also need to interact among themselves, as is illustrated in Figure 4.11h. Thus, it is important to provide meeting facilities and entryways in such a way that they do not disturb the workstation operators. In abnormal conditions, the traditional control center design with its direct entry into the console area and its limited space often hinders operator performance. The following key design specifications have been proven effective in designing for the management of abnor4 mal conditions: 1. Control center arrangements (focused on different types of rooms) 2. Control room layout (including usable space, furniture, maintenance access, storage, entrance and exits) 3. Workstation layout and dimensions (including communication systems) 4. Displays and controls design

Maintenance Support

I & E Support

Outside Operator

Project Engineer Process Engineer

Console Operator

Upper Level Management Control Engineer

FIG. 4.11h The required communication interactions in a control center.

© 2006 by Béla Lipták

711

Operations Team Leader

712

Control Room Equipment

5. Environmental design (including air quality, lighting, noise, static, electromagnetic field, hearing clarity) 6. Operational and managerial requirements (practices and organizational policies) Summarizing, care should be taken in the design and retrofit of existing control rooms to address the communication and ergonomics needs of the operating team. Efficient designs can be effective in dramatically improving operator response to abnormal conditions. Operator Training Formal operator training programs, for the most part, evolved from programs designed to meet the first governmental training regulations. The early programs were largely on-the-job training (OJT) systems that were largely modeled after the traditional apprenticeship system. Programs today commonly have a mix of OJT, classroom lectures, and computer-based training (CBT) systems (some of which are programmed to adapt to the student’s pace and success). Increasingly, training simulators have also been developed to simulate the operation of varied industrial processes, with varied degrees of process model fidelity. The simulators are generally either a generic training module at moderate cost or a high-fidelity model of that actual process unit at generally very high cost. These high-fidelity simulators allow operators to operate the plant offline in simulation mode, but with the same look and feel of the actual plant. The simulators can be used for operator training, as well as to verify process engineering and process control designs. Progressively more companies that are focused on achieving manufacturing or operational excellence have acknowledged the need for structured training in preparing for emergency operations. This is because there is no time to review emergency operating procedures or to call for engineering support during an emergency. By the time an emergency

arises, the operators must already have a series of possible response plans internalized or “unless special care is given to developing and maintaining the operator’s abnormal operations skills, such as through simulator-based training, he will have difficulty fulfilling his role effectively once an abnormal 5 event occurs.” New regulations have also spurred an emerging approach to training systems based on operator competency, where the specific skills and knowledge for a given job are documented and the operator is assessed against these requirements. Many companies have augmented this approach with a tiered training curriculum as seen in Figure 4.11i. The number and specifics of the tiers vary from company to company, depending on the work process design at each location. The example in Figure 4.11i shows that all site field operators complete the first three tiers, while all console operators complete Tier 4 and, for the unit process they operate, would complete two additional tiers that would be unique to their specific jobs. From an abnormal condition management perspective, it is useful to consider possible abnormal conditions at each of the tiers, so that the operator would know how to respond to major site risks and sitewide upsets, as well as the specific risks within the process unit and his or her specific job tasks. One effective way to document much of this is through periodic process hazards analysis reviews. Fundamentally, it is important to train operators in the response to abnormal conditions or upsets either through OJT discussions, classroom training, CBT reviews of past events, or the use of process simulators. This training, periodically refreshed, is a first line of knowledge that can be critical in avoiding or managing abnormal conditions. Alarm System Design As was discussed earlier, the financial losses due to upsets are varied and many. More functional alarm systems can help

Job specific Process specific

Field operators

Console operators

Unit console operators

Site overview Chemistry, process engineering and process control basics General equipment training (motors, heat exchangers, etc.) Regulatory training (site emergency plan, etc.)

FIG. 4.11i Illustration of a tiered approached to operator training.

© 2006 by Béla Lipták

DCS: Management of Abnormal Conditions

minimize losses in several ways by: 1. 2. 3. 4.

Avoiding overloads in upsets Decreasing trips and downtime Decreasing losses in efficiency Identifying areas for improvements in instrumentation, operations, and/or control

It has been observed that severe upsets can corrupt DCS database files, requiring involved rebuilding of these databases. A corrupt database can result in inaccurate alarm system displays, providing the operator with an inaccurate image of the process. The possibility of inaccurate alarming could justify an alarm management evaluation project. Perhaps the most commonly referenced guideline for the design and use of alarm systems is the Engineering Equipment and Materials Users Association (EEMUA) guideline entitled “Alarm Systems, a Guide to Design, Management and Procurement.” The guideline provides a comprehensive look at the design and procurement of alarm systems and the assessment of alarm system performance. Additionally, the ISA SP18 committee started work in 2004 on a new ISA standard (ISA-18.00.02), tentatively titled “Alarm Systems Management and Design Guide.” The Abnormal Situation Management Consortium published a guideline titled “Effective Alarm Management Practices” in 2004. The following ten-step process, as depicted in Figure 4.11j, is recommended for managing an alarm system 6 performance improvement project:

4. Set up the standard approaches 5. Set up the collection infrastructure 6. Set goals and measures 7. Perform alarm analysis 8. Perform alarm rationalization (as needed) 9. Implement improvements 10. Repeat the process Alarm System Objectives As with any course of action, setting clear objectives is critical to success. Example objectives include the elimination or optimization of: •









• •

1. Set alarm system objectives 2. Set an alarm philosophy 3. Set an alarm interface standard

Set collection infrastructure

Set goals & measures

Set standard approaches

Alarm analysis

Set interface standard

Alarm rationalization

Implement improvements

Set philosophy

Set objectives

© 2006 by Béla Lipták

Chattering alarms, i.e., alarms that rapidly cycle in and out of the alarm state, causing a large load for both the DCS system and the operators. Nuisance event-based alarms, i.e., alarms that are just expected steps in routine procedures or sequence of events. Duplicate alarms, i.e., alarms that detect and annunciate both a high and a high–high process variable condition. These alarms eclipse one another and essentially are duplicated entries in the alarm system. Stale alarms, i.e., alarms that have been in their alarm condition for a protracted period and do not need to remain in the active alarm summary. Disabled or inhibited alarms, i.e., alarms whose annunciation is not enabled, indicating a potential problem in the alarm design. Operator alarm changes. Frequent changes by the operator indicate a potential problem in the alarm design. Other operator actions, e.g., changes to outputs or set points, that might indicate areas for improvement in instrumentation, process design, or process control.

All passes

First pass & as needed

FIG. 4.11j 6 Steps required in an alarm system performance improvement project.

713

Repeat cycle

714



Control Room Equipment

Controls not in Normal Mode, if the process is not controlled in the normal mode, that might also indicate potential areas for improvement in instrumentation, process design, or process control.

Alarm Philosophy Setting an alarm philosophy can be difficult, since the varied stakeholders have disparate needs and generally strong opinions. The needs of most of the alarm systems tend to meet a variety of different needs. From a process safety standpoint, it is vital to start and end with operators. The alarm system should primarily be an Operations tool. It is also important to ensure that Operations is aware of the limitations of the DCS, such as the bandwidth limits that cannot be exceeded, because systems can be overburdened to the point of being unreliable in an upset. The alarm practices could easily be different for every DCS system at a given site, though some level of common ground is useful. Examples of common philosophy include: •





Stipulate that annunciated alarms require an operator action. If no action is needed, then it is not an alarm. • While it can be useful to alarm a variety of items for operator information (e.g., alarms to alert the operator that a sequence program has moved from one step to another), it is important to remember that the primary goal of the alarm system is process safety. Overloading the system with message-type information can make it fail to perform during emergency operations. Reserve emergency alarms for true emergencies. (Note: Emergency is defined as the highest DCS alarm priority, recognizing that different DCS vendors have different naming conventions.) 70

• It is very useful to have a sound and color reserved for true emergencies that require critical responses from the Operations team. Deciding which alarms warrant emergency could be based on, e.g., time to respond or potential impact severity. • It is vital to ensure that the operator response is not diluted with too frequent emergency alarms. True emergencies should be rare occurrences. All alarm types need a defined rationale and response. • A good first step at improvement can be setting the rationale for the emergency alarms in the system. However, every alarm type should have a defined meaning and expected operator response. It is important that the operations team understand and internalize the meaning of the different types of alarms. Recommendations on how to proceed with defining the rationale will be discussed under standard approaches below. • In general, three levels of alarms are considered sufficient. A proliferation of alarm types makes color and sound coding difficult and success at internalizing operator understanding unlikely. • Several publications recommend certain ratios of alarm types by priority, including the EEMUA guideline. Such ratios are illustrated in Figure 4.11k: — Emergency alarms should represent true emergencies requiring immediate operator intervention and be less than 3 to 5% of the total configured alarms in the system. — The High alarm priority should represent upsets that are likely to move the process into emergency operations, requiring timely operator intervention. This category should be no more than a third of the remaining alarms or a maximum of roughly 30%.

65

60

Alarm by priority, %

50 40 30 30 20 10 0

5 Emergency

High

FIG. 4.11k Recommended alarm priority ratios of low, high, and emergency alarms.

© 2006 by Béla Lipták

Low

DCS: Management of Abnormal Conditions



— The Low alarm priority represents the balance of the annunciated alarms and should sound as the operation is moving from normal into upset conditions, requiring operator action, but generally not with great urgency. Operator training operations will be provided for all deleted alarms. • At first glance, this might seem more of a step in the project process than a true philosophy element. However, it is important to understand that operators have an internal mental model of the process and that a missing alarm will be assumed not active and actions will be taken consistent with that assumption. Thus, removing alarms can have serious consequences if the operators are not adequately trained.

Interface Standards Rapid visual identification of alarm conditions can be an important part of an effective alarm system. Thus it is important that alarms have a standard visual and audible presentation format. Some example elements of an interface standard include: • •







Alarm presentation is consistent Alarm presentation is redundant (e.g., by color and symbol to ensure that it is prominent and insensitive to loss of color in the display or to colorblind operators). Red and yellow are reserved for alarms only. • This can be hard to initially sell to the operations team that is accustomed to red used for tripped equipment, but generally once implemented, operators grow to appreciate the value of reserving the color for alarms. One effective means of differentiating tripped versus running equipment is the use of hollow symbols for tripped and filled for running. Ensure that the alarm enable state status is redundantly apparent on the graphic, in color and with a coded symbol, e.g., INH for inhibited alarms or DIS for disabled. Ensure that the operator is aware of critical code execution status on affected controls (e.g., if code acts on a given controller and it is not functioning, ensure that a visual message or symbol is present on the operating graphic).

3. Consider the use of commercial software, which will save time in setup and ensure that someone else will manage any needed updates to keep in synch with the DCS database structures. a. Most DCS vendors have some form of alarm management software. b. Other commercial packages that are available were listed at the beginning of this section. c. If commercial packages are not feasible, consider automation of key reports to ensure that the analysis is ready whenever time is available to work on improvement. 4. Find a specific area of concern to the stakeholders and work on that in parallel with the rest of the general project. Success at resolving a key stakeholder worry will help build enthusiasm. Improvement projects can be most effectively managed when the engineering decisions are made at the beginning and then used as templates for specific applications. The most common solutions are: • • • • •

1. Get strong management support and charter a crossfunctional team that includes representation of all areas of the operations team. 2. Start with Basic Alarms Cleanup, which will generally show initial improvement.

© 2006 by Béla Lipták

Database management Group alarming Alarm “silencing” Alarm analysis Alarm rationalization

Database management should be designed to run in an automated manner, providing a periodic audit of the DCS, including: • •

Disabled or inhibited alarms Changes since last audit

Group alarming can be used where a process upset results in numerous alarm conditions, which the operator does not need to be aware of: •

• Standard Approaches Alarm management projects can be daunting to undertake. There are, however, four things that can make the projects more likely to be successful:

715

For example, eliminate alarms when a given section of the plant is down and secured. Include logic to test for down and secure. Perform a safety review indicating which alarms are not needed when down and secure. Visual indication of the true state of all alarms should be available at the user interface. A second example of alarm grouping is to take all of the noncritical alarms in one area of the process into a single alarm. This can dramatically improve the clutter of stale or standing alarms (defined as the alarms that remain on and standing in the alarm summary). Standing alarms can be a significant issue during upsets as an additional information load that the operator has to sort through to understand the state of the process.

Alarm “silencing” can be an effective temporary tool. Consider the automation of alarm “silencing,” where the operator

716

Control Room Equipment

TABLE 4.11l Types of Alarm System Analyses

on a rolling buffer. A wide variety of options are available for the collection of alarm information, from custom to commercial applications:

Alarms per time period (minutes, hours, days, shift, etc.) Time in alarm (both event and cumulative over the period) Time between new alarms Duplicate alarms (query for alarms with *xxx)

• • •

Alarm annunciation not enabled Operator changes to: alarm parameters, controller mode, etc.



Time not in normal mode Time to acknowledge Time in alarm (time to return from alarm) Sorting: by area, unit, priority, unit and priority, operating shift, day or night, etc.

is allowed to inhibit the audible annunciation of alarms until the next shift change. This can eliminate the stress of an alarm that is malfunctioning and chattering on and off with high frequency or any other temporary condition. It can also help manage standing alarms due to instrumentation problems. Such an automation step would reintroduce the alarms after shift change (checking for a lack of abnormal condition) or on operator demand. This would eliminate the possibility that the alarms could be left in their inhibited state for a long period of time. Alarm analysis is the bulk of the effort needed in an alarm management program. A variety of analyses can be useful, as listed in Table 4.11l. The most important data are the number of alarms per operator per time period, the number of chat6 tering alarms, and the number of standing or stale alarms. Alarm rationalization is the formal process for setting alarm priorities. The alarm philosophy will define the meaning and calculation methods for alarm priorities. The most common approaches involve the consideration of the time available to intervene and correct the situation and the potential magnitude of impact (type of injuries and equipment or environmental damage). There are varied commercial software packages available to assist in the documentation and calculation of priorities. Rationalization is the process for applying the alarm priority to individual alarms. It is perhaps simplest to do this rationalization as part of the periodic process hazard analysis (PHA). Alarm rationalization can be a time-consuming process, though detailed review is likely to be only needed on such parts of the process that can lead to emergency operations and should be identified as part of the PHA cycle. Alarm Collection Infrastructure Most DCS systems are designed to manage a rolling buffer of alarm information and have no standard means to store a continuous stream of alarm information. Alarm understanding is generally improved with a review of operator changes, which is also generally stored

© 2006 by Béla Lipták

DCS vendors are starting to offer optional packages. Data historian companies generally offer optional tools. Most DCS systems can stream this information to a “system printer,” which can be directed to a virtual file system. The system printer can also be replaced with a connected personal computer, running terminal/modem software managing the data or directly to varied thirdparty applications.

The most critical issue is to make the data readily available to all users and to automate as much as possible. The most effective improvement programs make the alarm information available to the entire operations team on their desktops so that access to data is not a roadblock to improvement. Goals and Measures Alarm Management is never complete; monitoring the performance of the system is a neverending task. Thus it is important to set intermediate improvement goals and minimum performance limits on key performance targets. Management commitment to the goals will help make the process of Alarm Management become a way of life. Common problems beyond simple redesign of alarm systems include: • • • • •

Noncritical digital indicators failing/failed Field switches that need new set points due to process changes Noncritical loops with chronic performance issues Critical loops in need of maintenance Different modes of plant operation needing different alarm trip points

Graphical User Interface The graphical user interface can be designed in several ways to support the operators during upsets. Key issues to consider include: • • • • •

View of span of control Navigation design Dedicated displays for abnormal conditions Dedicated displays for task support Basic display design guidelines

View of Span of Control In modern digital control systems, the full span of control for an operator is often not shown on a single screen. In an abnormal operating condition, this lack of the “full picture” can lead to errors in judgment. Thus it is recommended that an operating graphic

DCS: Management of Abnormal Conditions

be designed that clearly shows the entire operator span of control. In order to manage graphic throughput constraints, alarm grouping techniques or alarm counts by area can be used to manage the information needs for this process overview display.



Navigation Design Navigation design can be critical because it directly impacts the time to understanding and ultimate action. Key considerations include:



• • •

Display every loop in the unit on at least one operator graphic. Keep the number of operator graphics to a minimum. Keep the navigation scheme fairly simple and flat. • Most companies have display design guidelines that limit the depth and breadth of the graphics hierarchy. There has been significant research into “menuing” and navigation design. A simple rule of thumb is that every loop should be visible within no more than three navigation moves. Graphics for critical loops should be accessible with one move.





Dedicated Displays for Abnormal Conditions Dedicated displays for special tasks and abnormal conditions can be very effective. However, care must be taken to either use the same type of layout as used in the normal operating graphic or have periodic training sessions that include the actual use of the dedicated displays. It can be very effective to display context-sensitive information. Examples include:











Information about interlocks; include details on permissive and bypasses that are available as required. Conditional advice embedded in the graphics that displays only when logic tests are true (or only at high intensity when true). Sections of the graphic dedicated for the display of information relevant to the actions of the operator.

Dedicated Displays for Task Support Where there are specific tasks that are more prone to generate abnormal conditions, it can be valuable to perform a process mapping of the task and to then generate specific operating graphics to support the effort. Additional automation ideas are likely to be generated as part of this process mapping. There is an emerging field of specialty in procedure automation and verification, partially driven by OSHA 1910.119 and partially by initiatives like abnormal situation management. Basic Display Design There are some basic design guidelines that can be effective in designing operating graphics that perform well under abnormal conditions. Perhaps the most important considerations are those concerned with the basic ergonomics of the operator, ensuring that the pace of information delivery matches reasonable human capabilities. Key considerations include:

© 2006 by Béla Lipták



717

The display refresh rate is appropriate. Just because the DCS can refresh the graphic every 1/10 of a second does not mean that this speed is needed. The speed of refreshing the graphic should be based on the actual response time of the process, thus reducing the information load on the operator and on the system. Error avoidance techniques should be implemented. • Require a double confirmation on critical steps. • Redisplay the requested action in the verification, if appropriate. • Enforce high and low limits of change on typed entries. Design the graphic to ensure that the equipment and piping fade into the background. While the graphic represents the process of, for example, a distillation tower, the valves, pumps, and instruments are more important than the tower or its trays. Detailed information should be available, but only on operator demand. Do not clutter the operating graphic with all the information that might ever be needed, but the tag numbers and shutdown limits provide methods to display, etc. The direction of flow and the overall layout should be consistent with the actual process. Reserve red and yellow for alarms only. It is also prudent to keep the number of other colors to a minimum and apply the meaning of these colors consistently. It can be very effective to make all process equipment gray and all text some other dull color. This makes the red and yellow for alarms very prominent. Consider colorblindness and contact in the design of the graphics.

Advanced Detection and Advice A key goal of abnormal condition management is to ensure that the operator gets the correct information at the correct time to be able to make the required corrective action. Advanced detection and advice systems strive to provide real-time alerts and dynamic advice to the operator. Technologies that support this include knowledge-based systems (e.g., expert systems), model-based systems, data-driven approaches (e.g., statistical process control, Six Sigma, and neural net7 works), and engineering process control. Research into abnormal condition management, especially with the ASM Consortium, is currently striving to create a collaborative technology that will further enable the generation and delivery of this advice. Much of this is presented in contextual-based graphics within the DCS or in a technologyspecific interface on a non-DCS computing platform. One useful presentation aid is that of the polar star, which provides a multi-dimensional image of “normal” operating conditions for key controllers or properties and a real-time view of the process in the same format. Providing more time to orient the operator with early detection and dynamic real-time advice can be a very effective

718

Control Room Equipment

means of averting abnormal conditions or of minimizing their impact.

CONCLUSIONS Abnormal conditions cost the U.S. economy billions of dollars every year. The keys to the management of abnormal conditions are the use of complex control and emergency shutdown systems, as well as early detection systems, comprehensive operator training, performant alarm systems, dynamic well-designed user interfaces, and advanced operator advice systems. No one of these elements is the magic bullet to solve the problem; rather, they form a basis for context and understanding for the operations team. In reality, the problem is never solved; it is a continuous work process for operations, requiring vigilance and unremitting improvement.

ACKNOWLEDGMENTS The author would like to recognize the influence of the ASM Consortium in this material and the critical review of Byron Lemmond and Ed Quesada.

References 1. 2.

3. 4.

5.

6. 7.

Lorenzo, D., A Manager’s Guide to Reducing Human Error, Washington, DC: Chemical Manufacturers Association, 1990. Swain, A., and Guttman, H., Handbook of Human Reliability Analysis with Emphasis on Nuclear Power Plant Applications (Final Report), Washington, DC: United States Nuclear Regulatory Commission, 1983. Belke, J., “Recurring Causes of Recent Chemical Accidents,” Paper Presented at MAPP, 1999. Errington, J., and Bullemer, P., “Designing for Abnormal Situation Management,” Proceedings of the 1998 AIChE Conference on Process Plant Safety, Houston, TX, 1998. Emigholz, K., “Improving the Operator’s Capabilities During Abnormal Operations: Observations from the Control House,” Paper Presented at AIChE Loss Symposium: Human Factors in Plant Operations Session, Boston, MA, 1995. Fitzpatrick, B., “Mustang Alarm Management,” Houston, TX: Mustang Engineering, 2005. Cochran, E., and Rowan, D., “Human Supervisory Control and Decision Support: State of the Art,” Paper Presented at ISPE Conference, 1995.

Bibliography Anderson, N., and Vamsikrishna, P., “Ergonomic Best Practices for Information Presentation to Operators,” Paper Presented at AIChE Meeting, 2000.

© 2006 by Béla Lipták

Attwood, D., and Fennell, D., “Cost-Effective Human Factors Techniques for Process Safety,” Paper Presented at CCPS International Conference and Workshop, Toronto, 2001. Brown, S., Advances in Computer-Based, Multimedia Training Provide Significant Opportunities to Improve Results, Tucson, AZ: Performance Associates International, Inc., 1999. Bullemer, P., “Managing Abnormal Conditions: A New Operations Paradigm,” Paper Presented at Honeywell North American Users Group Meeting, Phoenix, 2001. Bullemer, P., Cochran, T., Harp, S., and Miller, C., “Managing Abnormal Conditions II: Collaborative Decision Support for Operations Personnel,” Paper Presented at ISA INTERKAMMA, 1999. Bullemer, P., and Nimmo, I., “A New Training Strategy: Design the Work Environment for Continuous Learning,” Paper Presented at the Mary Kay O’Connor Process Safety Center Annual Symposium, College Station, TX, 1998. Campbell Brown, D., and O’Donnell, M, “Too Much of a Good Thing? — Alarm Management Experience in BP Oil, Part 1: Generic Problems with DCS Alarm Systems,” Paper Presented IEE Colloquium on Stemming the Alarm Flood, London, 1997. Cochran, E., and Bullemer, P., “Abnormal Conditions Management: Not by New Technology Alone,” Proceedings of the 1996 AIChE Safety Conference, Houston, TX, 1996. Cochran, E., Bullemer, P., and Millner, P., “Effective Control Center Design for a Better Operating Environment,” Paper Presented at NPRA Computer Conference, 1999. Cochran, E., Miller, C., and Bullemer, P., “Abnormal Conditions Management in Petrochemical Plants: Can a Pilot’s Associate Crack Crude?” Proceedings of the 1996 NAECON Conference, 1996. Errington, J., and Nimmo, I., “Designing for an Ethylene Plant Control Room and Operator User Interface Using Best Practices,” Paper Presented at AIChE National Meeting, 2001. Gerold, J., “Managing Abnormal Conditions Pays Process Dividends,” Automation World, September 2003. Ghosh, A., Abnormal Situation Management is Your First Life of Defense, Dedham, MA: ARC Advisory Group, 2000. Hendershot, D., and Post, R., “Inherent Safety and Reliability in Plant Design,” Proceedings of the Mary Kay O’Connor Process Safety Center Annual Symposium, College Station, TX, 2000. Jamieson, G., and Vicente, K., “Modeling Techniques to Support Abnormal Situation Management in the Petrochemical Processing Industry,” Proceedings of the Symposium of Industrial Engineering and Management, Toronto, 2000. Mattiasson, C., “The Alarm System from the Operator’s Perspective,” Paper Presented at IEE People in Control Meeting, Bath, UK, 1999. Metzger, D., and Crowe, R., “Technology Enables New Alarm Management Approaches,” Paper Presented at ISA Technical Conference, Houston, TX, 2001. Mostia, B., “How to Perform Alarm Rationalization,” Control, August 2003. Mylaraswamy, D., Bullemer, P., and Emigholz, K., “Fielding a Multiple State Estimator Platform,” Paper Presented at NPRA Meeting, Chicago, 2000. Nimmo, I., “Abnormal Situation Management,” Paper Presented at NPRA Conference, 1995. Nimmo, I., “Adequately Addressing Abnormal Situation Operations,” Chemical Engineering Progress, September 1995. Nimmo, I., “Ergonomic Design of Control Centers,” Paper Presented at Honeywell Users Group Meeting, 2000. Nimmo, I., “The Importance of Alarm Management Improvement Project,” Paper Presented at ISA INTERKAMMA, 1999. Nimmo, I., and Cochran, E., “Future of Supervisory Systems in Process Industries: Lessons for Discrete Manufacturing,” Proceedings of the MVMT Workshop, Ann Arbor, MI, 1997. Nochur, A., Vedam, H., and Koene, J., Alarm Performance Metrics, Singapore: Honeywell Singapore Laboratory, 2001.

DCS: Management of Abnormal Conditions

O’Donnell, M., and Campbell Brown, D., “Too Much of a Good Thing? — Alarm Management Experience in BP Oil, Part 2: Implementation of Alarm Management at Grangemouth Refinery,” Paper Presented IEE Colloquium on “Stemming the Alarm Flood,” London, 1997. Pankoff, J., “Look Beyond Traditional Organizational Structure: ProductionCentered Design Improves, Sustains Total Plant Performance,” Hydrocarbon Processing, January 2003. PAS (Plant Automation Services), The Cost/Benefit of Alarm Management, Houston, TX: Plant Automation Services, 2000.

© 2006 by Béla Lipták

719

PAS (Plant Automation Services), White Paper: Alarm Management Optimization, Houston, TX: Plant Automation Services, 1998. Smith, W., Howard, C., and Foord, A., “Alarm Management — Priority, Floods, Tears or Gain?” www.4-sightconsulting.co.uk: 4-sight Consulting, 2003. Walker, B., Smith, K., and Lenhart, J., “Optimize Control Room Communications,” Chemical Engineering Progress, October 2001.