Editor's Note: For more on this topic, consider registering for T&D World's BLACK SKY HAZARDS & GRID RESILIENCE virtual event, taking place next week, Nov. 16-17.
The Texas power outage of February 2021 is already fading from memory, but observations made by experts at the time, warrant a follow up discussion. Experienced power grid engineers and utility managers both made statements to the effect that the residents were lucky the entire Texas grid didn’t go down, because had it done so, restoration would likely have taken many weeks to complete. While that may be true, particularly considering the complex nature of restoration work and the many threats facing a large centralized grid, such statements absolutely warrant a few “why” questions.
The purpose of this piece is to start a discussion on the many hidden issues inherent in Black Starting and grid rebuilding, particularly since the winterization issues that came to light during the Texas incident occurred due to “reality therapy,” not from forethought or proactive action. The authors’ hope is that through this article, and those others will write, opportunities for utilities to improve response times to major outages will be brought to light, and acted upon. Specifically, this article will explore the evolution of the grid into its current state and suggest several opportunities to both reduce risk-exposure and improve restoration time.
By way of background, the power grid infrastructure has experienced many changes over the past 40 years. At a high level, some of the relevant changes to that landscape have included:
- Introduction of non-utility entities as co-generators, characterized by more efficient use of available capital and fuel resources, essentially using the same fuel to both generate electricity and perform operations integral to their core business mission.
- Mixing utility-owned and common carrier-owned telecom systems at the point of core power-generation, transmission, in and balancing operations. Additionally, the evolution of the balancing network’s telecom infrastructure from slower analog and Time Division Multiplex communications circuits to high-speed computer-based Internet Protocol (IP).
- Shifting the role of utilities from complete responsibility to generate, transmit, and serve customer load, to owner/operators of public resources under the control of others — such as Balancing Authorities (BAs)and Reliability Coordinators (RCs) and Independent System Operators (ISOs). In this scenario, much of the utilities’ historical power generation has effectively been spun off to other entities. As a result, many if not most utilities do not now own enough internal generation capacity to meet their load, but rather are dependent on 3rd Party generation owners (who may not necessarily share the same moral imperative to serve customer load) to provide resource adequacy.
- Creation of regional Independent System Operators (ISOs) that keep generation and load in balance and provide a market-based vehicle to replace the formerly utility-based reliability function. The ISOs, through the market approach are assumed to provide a more financially equitable and open forum, allowing many different entities the opportunity to participate in the wholesale electricity market. Additionally, many ISOs also provide the BA and RA functions as defined under North American Reliability Corporation (NERC) Reliability Standards approved by the Federal Energy Regulatory Commission (FERC).
- The integration of unprecedented amounts of variable and sometimes intermittent solar and wind energy. This “green” energy is often purchased and coordinated over long distances, increasingly outside the geography of local utilities and Balancing Authority’s direct control. This approach improves utilities’ ability to meet clean air, water, and greenhouse gas standards in accordance with evolving climate change requirements. However, considering the sheer complexity of integrating an aggressive renewable resource buildout, one needs to consider the potentially negative implications for the rapid ramping of conventional generation necessary for proper grid security, particularly using these resources during grid restoration.
- Introduction of national reliability standards by NERC and approved by the United States government ‘s regulatory body, FERC. These standards are promulgated to generation and transmission operators and establish a minimum level of reliability performance and training for their organization’s personnel and related equipment. This includes, but is not limited to, aspects of the control systems, telecommunications and protective relaying used to operate the grid. Each of the following reliability councils within the three major U.S. interconnections (WECC in the West, ERCOT in Texas, and MISO, SERC, NRP and RF in the East) audit their local grid participants to these standards every 3 years.
- Introduction of increasingly complex, computer-based grid monitoring and protection systems otherwise known as Synchrophaser Networks (SN), Remedial Action Schemes (RAS), and Advanced Network Real-Time State Estimators (RTSE) that monitor grid conditions over a wide area and, either invoke significant automatic action, or underpin critical sensitive human decision making, potentially interrupting or re-direct electrical flows across a wide area. These systems, due to their computer-based control and often IP-dependent communications are subject to similar security concerns as other Supervisory Control and Data Acquisition (SCADA) type networks.
At this point it is important to stop and appreciate the marvelous complex machine that the many stakeholders within the grid have built, and the extent to which they have created a system that runs both reliably and economically. In doing so, we need to acknowledge the hard work that NERC, WECC, local ISOs (CAISO in California), Investor-Owned Utilities (IOUs), municipal/government owned Public Power Utilities (PPUs), Independent Generators, and the Telecommunications Companies, who universally maintain good operating procedures, and who’s personnel are exceedingly good at the work they perform.
This marvelous complex machine we call the national power grid is the net result of generations of successful modernization efforts. Recall that we once had:
Small, local power generation plants, located within towns and cities, owned and operated by the local utility, and equipped with dedicated in-house provided, custom-leased, or owned telecom circuits, in which personnel relied on voice and/or written (paper and pencil) communication and control procedures for local balancing or load sharing.
This original model evolved into large IOU and PPU operated generation, transmission, and distribution, controlled via utility-owned and operated telecom voice and SCADA systems, with balancing control centers at the at utility’s dispatch center.
Finally, today's environment, where different entities handle generation, transmission, and balancing activities, where often ISO-based BAs and RCs cover ever larger areas that the IOUs once covered, and where much of the telecom glue that ties voice and SCADA together is now being provided by Telecom Common Carriers using high-speed IP-based transport instead of proprietary and individually-owned, private circuits. Routing of these common carrier transports is now beyond the direct control of the utility operator providing and using these signals. In addition, routing may now traverse many hundreds of miles beyond the local utility’s area of operation.
In light of the grid evolution described, the control points for grid balancing as well as general transmission control have shifted steadily away from the local utility level to increasingly centralized centers of control — geographically removed from local transmission assets, generation and load. This shift applies both to the flow of people-to-people communication as well as how the SCADA is wired. Currently, generation and transmission operations are singularly focused on the wider area BAs providing the balancing function, and RCs for transmission operations. If the grid were to seriously sever itself into islands – or worse yet completely black out — the plan would be to execute one of the NERC approved strategies designed to be implemented from centralized control points assuming existing telecom and computer /SCADA resources remained on-line. Failing that, the protocol would shift to attempting to start and balance small local pods of generation and load, followed by synchronizing the small pieces back into the larger grid.
Obviously, this later process is inherently messy since utilities have in many cases markedly reduced the scale and responsibility of local control centers in terms of support communications for SCADA as well as basic power control. IOU’s could quickly identify local control points, but would then realize the balancing SCADA is no longer provisioned to support a local balancing function, forcing both the utility and their 3rd party generation partners to communicate manually.
The first pinch-point in this regard would be a potentially jammed or inoperative public telephone system and/or congested cell phone network – both wholly inadequate for coordination with multiple independent generators. Given how fragile a small system is during early restoration stages with the possibility of grid IP-based messaging networks unreliable, people-to-people coordination still needs to be locally-based one-to-many, like soldier radio comms, or a conference call, as dynamic circumstances frequently demand that action be taken in several places at once and information flow be across multiple groups simultaneously, as opposed to an inflexible, node-to-node approach. Something as simple as a common two-way radio channel would be extremely helpful in this regard.
Every restoral situation is inevitably complicated from the people side of the equation. In today’s world, specialized functions are siloed (split between different organizations such as ISO, BA, RC, as well as third party generators). In addition, key personnel in the same local area typically do not work or train together on the issues of balancing load and generation. That is because on any normal day, balancing is handled in a centralized manner between each respective entity and the BA, not entity-to-entity within a local area. In the case of a prolonged grid outage (where the linkages needed to give the ISO the SCADA visibility to orchestrate system restoration fail) local grid participants would need to discover each other, find an effective way to communicate, and work in concert to restart small islands of the grid — and ultimately help synchronize these small islands back into the “big grid”. The over-arching question of course is, how long will this process take?
Like the old wise tale about the frog being cooked in a pot of water on the stove, everything is fine until it’s too late. We have allowed ourselves to evolve to a place where we can only effectively operate on a highly complex, wide area scale, and have potentially lost the ability to function holistically on a local level in the event of a crisis.
Improving these two pinch points could shorten cold grid restoration time considerably, at least in terms of restoring some local islands to service.
- The Balancing SCADA is no longer provisioned to support a local balancing function, forcing both the utility and their 3rd party generation partners to communicate manually.
- A prolonged grid outage (where the linkages needed to give the ISO the SCADA visibility to orchestrate system restoration fail) local grid participants would need to discover each other, find an effective way to communicate, and work in concert to restart small islands of the grid – and ultimately help synchronize these small islands back into the “big grid”.
This discussion has focused on communication, SCADA, and inter-operation at the local level. Equally important, and outside this discussion, are communication, SCADA, and interoperation from the local level to the ISO level, and communication, SCADA, and interoperation between ISO’s and other higher authorities. Organizations like the Electric Power Research Institute (EPRI) have started to look intensely at the higher ISO-to-ISO level, but as yet have given inadequate thought to local level control and communication issues, perhaps assuming the utilities have that contingency well covered (as they did in the past when that whole space was “their” responsibility).
This is not a recommendation to undo the “think big” approach that the grid operates within today. This is to recommend that we pre-plan and provision separate telecom Voice & SCADA networks /operation centers purposely isolated from common carrier and other, non-grid control IP-based facilities to facilitate secure local startup and balancing with the many smaller islands of generation, transmission, and load necessary to vastly improve restoration response should the wide area plan falls apart as result of a Black Sky event.
Think about the cyber security and other telecom codependency risks of today’s centralized wide area IP network that underpins the wide area control and balancing that is today’s norm. Commonly trusted network monitoring software that many organizations use, if compromised, could potentially put even the most secure private IP network at risk. Having completely separate/isolated voice and SCADA network support would better enable each small area of the grid to restart simultaneously, saving precious time, and improving the odds of “getting restoration going.”
The people-side of this work centers around getting key local personnel together for training and planning on restoring their small area of the grid in concert. The reason for this should be obvious. We need to provide everyone with a critical local role with the necessary background knowledge and situational awareness such that, in case a total grid collapse, they fully understand their role in the local plan to restore their local area, to synchronize with neighboring, local islands, and ultimately to transition operation back to the ISOs.
The time is ripe for the industry to embrace necessary reform measures before we inevitably face a time-critical grid restoration resulting from a black sky event. The time to build-in the infrastructure and people capability to be able to respond more effectively to a black sky event is before it happens, the big question next is how do we break up the substation loads to set the stage for restart.