The extensive blackout of Aug. 14, 2003, reawakened a somewhat sleeping industry to the importance of emergency response and system restoration, heightening awareness of risks to the physical security of the U.S. transmission grid and its overall reliability. While large-scale emergency events that can produce cascading outages on the grid are fortunately infrequent, operators must be prepared to deal with them nonetheless.
Arguably, the best method of defense against the unexpected is the design and regular execution of large-scale emergency response and system restoration drills. Such drills are a practical means of training system operators at multiple utilities and reliability entities to respond quickly and effectively, because timely response can limit the boundaries of impacted areas and accelerate restoration.
During a real event involving multiple outages, reliability coordinators and system operators can take a variety of emergency actions to mitigate the situation. If initial efforts fail and the system collapses, they must implement procedures to restore the grid. This necessitates almost continuous communication between reliability coordinators and system operators in transmission, generation and distribution control rooms. One of the most important aspects of emergency response and system restoration drills is practicing the communications that must take place during a real event. In fact, training in inter-area restoration drills is a major recommendation of the North American Electric Reliability Council (NERC).
Communication is Key
From the first moment of an emergency, communication is critical. Quite simply, many different people must communicate with each other during a real event. Conference calls are one of the most essential tools used by system operators and reliability coordinators in performing the tasks required for timely and effective emergency response and system restoration.
For example, during the August 14th blackout, a conference call set up by the Midwest ISO (MISO; Carmel, Indiana, U.S.) was a major contributor to the restoration of the grid. Let's take a look at the various lines of communication that must remain open and work hand in hand for the whole process to function effectively.
First, reliability coordinators at the affected utility must communicate with neighboring reliability coordinators, member control centers, management and support staff. In turn, management must notify state commissions, NERC, the Federal Energy Regulatory Commission (FERC) and the Department of Energy (DOE).
The transmission system operator must notify system operators in neighboring utilities, reliability coordinators and management. Control area operators must call in district personnel. Key district personnel — operations managers, team leaders and designated line electricians — must be notified as well as district personnel, corporate communications, system protection, customer service, distribution dispatch and the public.
Under normal conditions, communication is accomplished through the use of conventional or cellular telephones, pagers and radio. However, during an emergency, some or all of these channels might not be available because of congestion or equipment failure. After the first few minutes of a blackout, telephone lines might be overloaded and unusable. Communications facilities without generator backups become unavailable when their battery supplies are exhausted, which generally occurs after a few hours. For these reasons, NERC advises utilities to maintain alternate communications capabilities. Utility telecommunications and information technology departments are directly concerned with broad issues of overall service reliability and business continuity. However, communications at the system operator level deserve special attention because of their importance to the mission of delivering reliable electrical service. Communications consoles in control rooms and operator training on that equipment are central to accomplishing that mission.
Manpower deployment is one of the first actions that require large-scale communications. Restoration procedures must include a system for promptly notifying appropriate off-duty personnel to report for duty. Generation personnel must be dispatched to locations with black start units. Switching personnel must be dispatched to major substations to clear busses and restore load when permission is granted from the system operator. In some cases, the transmission system operator must contact a distribution control center as load is being restored.
Later in the restoration process, when interconnecting islands, close communications are required among neighboring companies, reliability coordinators, the synchronizing location and the controlling plant. These communications must be immediate and direct in order to minimize time delays due to information exchange, scanning times or lag times.
One major objective of emergency drills is to test communications channels and protocols that are not typically used during normal operations. A good example of this is when MISO coordinated two system restoration drills, one on Nov. 10, 2004, and the other on Dec. 1, 2004. Seventy drill coordinators — including managers of transmission operations and shift supervisors — met regionally over a four-month period to plan the drill and develop scenarios. More than 400 individuals took part in the two drills, including transmission system operators and reliability coordinators from MISO, the Tennessee Valley Authority (TVA; Knoxville, Tennessee, U.S.), PJM Interconnection (Valley Forge, Pennsylvania, U.S.) and Independent Electricity System Operator (IESO; Ontario, Canada). NERC received restoration progress updates as part of the exercises, which focused on system assessment, communications and interconnection following unexpected power outages.
The exercises focused on the full complexities of system restoration that would be encountered during times of maximum system stress. Drill coordinators set up a scenario to simulate a large-scale separation with surviving islands of generation and load across the four reliability areas. The drills required reliability coordinators and system operators to carefully assess post-separation conditions and work together to implement plans for expediting system restoration of the affected area. The drills were the first of their kind conducted since the release of revised NERC policies 5, 6 and 9.
The scenarios involved a large-scale uncontrolled separation of much of the Midwest from the rest of the Eastern Interconnection. Participants were given approximately seven hours to restore as much of the system as possible, recognizing that all of the load would not be restored in that time. Since restoration in a real event can involve many operational delays, such as plant startup time, travel time for switching personnel and several unforeseen complications, the drill time line was accelerated once the assessment stage was complete.
In terms of scope, geography and number of participants, these were the largest coordinated system restoration drills held in the Eastern Interconnection to date.
MISO used its backup control center for the drills. The benefits included use of normal EMS and communications tools, while avoiding any interference with real-time operations, such as noise interference or space constraints. Participants also gained a greater familiarity with operating from the backup control center, which is helpful for potential real-time emergency situations.
Participant communications were done using IPC communications consoles, which are installed at the primary and backup control centers. As a result, participants were extremely familiar with the use of these phones, which feature built-in conferencing capability of up to 10 external lines and an unlimited number of internal connections. Participants also used the MISO messaging system to provide status updates, to give restoration tracking data and to update maps.
An external conferencing service was used to maintain an open phone line among all drill coordinators. This was useful for resolving questions as they arose during the drill progression. Additionally, a “blast call” was implemented — similar to what is available to the reliability coordinator in the real-time environment — to quickly connect all participants.
During the drills, reliability coordinators set up multiple conference calls using the IPC communications consoles to manage conference calls.
As recommended in emergency procedures and good utility practice, technical staff from the IT department and telecommunications vendors are called out to control centers during emergencies to be available to maintain and repair critical equipment, such as SCADA and voice communications. For example, during the Midwest ISO drill, technical support personnel from IPC Information Systems LLC (New York, New York, U.S.) were present at the backup control center throughout the drill. They were available for user support and to field questions from the operators as they conducted the drill.
There were a total of 12 positions installed with IPC consoles: four reliability coordinators, two operations engineers, two spare and four in an adjacent scheduling room.
Prior to beginning the drill, coordinators set up a conference call to establish an open line for all participants for the duration of the drill. This conference call was established on a standard telephone with a single handset and speakerphone. The telephone was set to the speakerphone function, as a drill coordinator had to be on the call at all times.
The role of a drill coordinator is a stand-in for an external notification position that would be set up during a real event to communicate to parties not directly involved in electrical system operations, but needing information on the status of the emergency and system restoration.
Beyond the Conference Call
During an actual event, the positions at the communications console occupied by drill coordinators could be used to connect audio from a news or weather service to listen to broadcasts of current information and updates being released during an emergency situation. An applications module could be added to the console for providing video of these broadcasts to the desktop as well.
Another option would be an intercom module connected to the central switch on a LAN/WAN. This would enable the system operator to establish an intercom connection to give participants an open path of communication directly to the system operator without having to establish a call. The system operator could initiate the call, and the participants would receive a signal (splash tone) and could speak at any time into the hands-free open microphone.
Dedicated “hoot and holler” circuits can be used instead of a conference bridge. These circuits can be carried on the WAN, and a module or analog speaker/microphone at each site would prevent any inadvertent public access. This would be a higher reliability solution and, with use of the conference-bridge concept as a backup, would increase reliability through redundancy.
The effectiveness of communications consoles in establishing conference calls can be increased substantially by adapting blast dialing with add/drop features, which, when combined with mute, enables the system operator to control the conversations. This creates a command center for system operators, allowing them to manage conference calls and, thus, control the flow of restoration events. This is especially important when communicating under highly stressful conditions involving a large number of participants.
For management, corporate communications and external parties who require information but are not actively involved in emergency operations, a dial-in bridge for listen-only capability could be provided.
Over vastly wide areas, such as the MISO footprint, with dozens of control rooms and hundreds of operators, there is a strong possibility of confusion during the initial stages of a drill and certainly at the outset of an actual emergency. This typically includes initial uncertainty over who should answer particular lines. Although this eventually resolves itself, even momentary hesitation can delay recovery efforts. Communications consoles should be programmed and the communications consoles' programming should be reviewed with system operators to ensure all parties understand the arrangements. Operator training to increase proficiency in the use of the communications consoles is strongly recommended. This will expedite responding to an event during the initial critical moments.
Most system operators are quite familiar with the basic features of the communications consoles they use every day; however, these do not include extensive conference calling. Conferencing during drills and emergencies can be quite complex and involve parties not involved in everyday operations. System operators in one entity might want to listen to a conference call at another entity to determine system status and assess system conditions. If the system operators at one entity are participating in an internal conference call, access might not be available to the system operator at the other location. A solution would be to have conference bridges with dial-in ports for listen-only capability. In this way, system operators could listen in without actually participating in the conference and without any chance of sharing information that might violate procedures. The system operator at the other location could make the same facilities available for outsiders, such as media representatives and public officials, to listen for status updates, in the event of an actual public emergency.
There might be instances when reliability coordinators are on conference calls, and operating engineers are on calls retrieving information. During these periods, calls can go through an extensive ring cycle before being answered. This might be disturbing to those attempting to continue their work, and contribute to frustration and anxiety due to the possible loss of valuable information. It is important that communications consoles be programmed to forward unanswered calls to an answering point. This might occur during the early stages of emergencies when gathering information is the primary activity.
Conference Calling Enhancements
The deployment of LAN/WAN by utilities and independent system operators provides bandwidth that makes possible enhancements using IP communications. In addition to the advantages of VoIP, multiple conferencing models and multimedia — text, voice, video and data — become possibilities. Other advantages are presence-based services and event-based communications, the integration of voice mail and e-mail, integration with the Internet and fast service provisioning. These technologies are now available from commercial vendors, but they have not yet been fully introduced to control rooms. Evaluating these enhancements and the value added from IP communications is a worthwhile endeavor for telecommunications engineers working together with system operators at their own and at adjacent utilities.
In these times of heightened concern about the reliability of the electrical system, drills can help increase system operators' awareness of operational issues, as well as overall reliability and security concerns. Essential benefits also arise from the drill process itself. Designing the drills, conducting them and developing training programs based on their results provides important opportunities for personnel who do not interact on a daily basis to learn to cooperate effectively. Throughout the process, enhancing communications is a constant goal.
Reliability coordinators and system operators must work together planning and sharing information as part of a coordinated training effort to restore the bulk interconnected electric system to normal conditions. Drills refine the operational and communication skills of the participants, increasing the likelihood of a successful emergency response and faster system restoration should a real event occur.
The authors wish to acknowledge the contributions of Francis Esselman, transmission reliability project manager, American Transmission Co., and Paul Reber, consultant, who first articulated an approach to power system restoration drills using scenarios based on the lessons from the August 14th blackout. Under their leadership, the Emergency Response and System Restoration Working Group, MAIN conducted the first such drill in the spring 2004.
Jim Reilly is an independent consultant, focusing on power system restoration planning, control center telecommunications systems, reliability analysis and emergency operations. Last year, he worked with both MISO and the TVA to organize and write power system restoration plans in accordance with the NERC Version 0 Reliability Standards following the blackout of Aug. 14. He initiated the MISO fall 2004 drill, involving MISO, the TVA, PJM and the IESO to validate the plans and practice communications. Reilly holds the BS degree from Georgetown University and the MBA degree from Columbia University. email@example.com
Kevin Sherd is a lead operations engineer at the Midwest ISO. He led the drill scenario development for the MISO 2004 fall drill and orchestrated the drill implementation for MISO participants. A registered professional engineer, Sherd holds the BSEE degree from Cedarville University and the MSEE degree from Wright State University. firstname.lastname@example.org