The utilities industry is undergoing a significant digital transformation to its operations that will transform the way utilities manage operations but also introduce new vulnerabilities through an expanded attack surface for adversaries to exploit. Over the last 10 years, there has been exponential growth in the deployment of digital assets into field operations. Myriad external factors — climate change, societal pressures, advances in technology, battery storage, electric vehicles — are all contributing to a shift to a more distributed, digitally enabled electric grid. Gas utilities are facing pressure from the electrification of “everything,” while water utilities are facing water quality and water availability challenges. Internal factors are also playing a significant role. Operational efficiencies and cost savings offer significant benefits to the utility struggling to keep its cost of operations under control. While we have seen significant advances over the last decade, the truth is we are still in the very early stages of this transformation. Artificial intelligence, machine learning, blockchain, 3D printing, and virtual and augmented reality — we have barely scratched the surface. This underscores why we must take a pragmatic and risk-based approach to make sure the transition is done securely — and we need to start now.
Operational technology (OT) is defined as the hardware, software and networks dedicated to causing changes in physical processes through control of physical devices — or detecting changes in physical processes through monitoring physical devices. OT has existed in utility operations for decades. The most common are the industrial control systems (ICSs) that automate many activities in generating and delivering electricity and natural gas to customers. Such automation may have many hundreds or thousands of specific devices connected to the system, orchestrated together by Supervisory Control and Data Acquisition (SCADA) software and operated by code (called “ladder logic”) developed specifically to cause certain actions to occur upon the event of a specific trigger. Historically, these systems were not networked but mechanical, and those with digital controls used closed proprietary protocols. This had the same effect as a physical segmentation and provided security against internet or business network-based attacks and unauthorized changes. As a result, little, if any, attention was given to security.
Now take a moment to think about the nature of the assets being digitized. The devices are generally very rugged and purpose-built to last for many years in hostile environments that include heat, dust and vibration, not requiring regular, time-sensitive software updates/patches like systems on a business network. Their lifecycle can often be measured in decades. Each device and sensor could be made or manufactured by any of hundreds of vendors. The devices are physically dispersed over large geographic spaces – physically protected from tampering where possible. Their environment bears little resemblance to the IT environment — until now.
More recently, advances in wireless connectivity have improved the ability to remotely control and monitor physical devices. Advances in serial communications protocols allowed serial point-to-point communications to become more fault tolerant so that the signal can be sent via telephone lines, radio signals and even over the much less expensive Transmission Control Protocol (TCP).
A direct consequence of these advances is that the hardware, software and networks that were designed and deployed on a closed (segmented) network — inherently secure — are now operating in a connected network. As the number of connections increases, the attack surface expands, and we now have a complicated cybersecurity risk management challenge.
In operational technology environments, security processes should first focus on the control and availability of the OT and its communications media. Because the processes that the devices control are dependent upon precise sensor readings, the integrity of the information shared between the devices and the process computers is also critically important. The confidentiality of this information is generally not as relatively critical (at least not while the device remains in the possession of the utility). This prioritization does affect the defense mechanisms available and the speed of remediation efforts. For example, traditional network and application security technologies can add latency to critical real-time communications, which could affect the reliability of operations. Defending OT from cyber-attacks requires a different set of capabilities than those used to protect IT.
This is further complicated by the vendors. Often, technology manufacturers will not certify security technologies to work with their OT product lines due to the potential for disruption, which can result in voided warranties and violations of service contracts if unapproved security products are applied without vendor certification. In some cases, particularly in generation environments, the plant control processes are not only designed by the ICS manufacturer but also managed completely by the vendor, introducing challenges in adhering not only to standard security controls, such as remote access and removable media restrictions but even to personnel controls, such as background screening procedures.
As the cyber attack surface increases, so do the potential threat actors — such as a nation-state or terrorist motivated by a desire to inflict physical damage, a hacktivist motivated by unacceptable environmental practices, a disgruntled employee or contractor, ransomware gangs looking for a financial payout, a group of people undertaking a synchronized attack, or automated attacks carried out by machines. As each weapon is used, it has the potential of being packaged as an attack kit on the dark web and becoming available to a much broader set of less sophisticated actors. In July 2021, Gartner predicted that by 2025, cyber attackers will have weaponized OT environments to successfully harm or kill humans.
One of the more pertinent challenges today is the availability of skill sets to defend against sophisticated threat actors who are rapidly increasing their skill levels and the frequency of their attacks. According to the Department of Energy, History of ICS Cyber Incidents (December 2018), “Many of the threat actors targeting ICSs have advanced skills and knowledge. The defenders of these systems need to have equally advanced skills and knowledge.” Attracting and retaining such skills that are limited in availability and high in demand presents an enormous challenge for utilities.
So, where does this leave us? First and foremost, we need to understand the nature of the risk. This is a business risk to the reliability of service and potentially to the safety of both employees and the public. This clarification is important as one starts to consider the accountability model and roles and responsibilities.
Secondly, regulatory compliance may not be sufficient to manage the business risk. Regulations, such as NERC CIP, are designed to protect the bulk electric system from a cascading regional blackout — they are not designed to protect an individual utility from a catastrophic event within its territory. Compliance is important, and it is mitigating some of the business risk — but is it mitigating the business risk to an acceptable level? Likely not.
Each utility needs to have an asset management program that provides visibility of the assets. The utility must analyze the consequence of a security event in different parts of its operations, the vulnerability of that part of the operations and the likelihood of a threat actor exploiting that vulnerability. Understanding the risk prioritization and exposure can then inform the level of actions that need to be considered. It also shifts resources to focus more on what matters most. Defense-in-depth control layers can be analyzed while making sure sufficient effort is allocated to detection and response — recognizing that a persistent, sophisticated actor may eventually navigate through the preventative controls. One can start to analyze the cost of actions against the level of risk reduction and find that “Goldilocks” balance. Memorializing the control expectations in risk-based policies and standards confirms communication of both the control expectations and the risk being accepted.
Finally — collaboration. No single person or function has all the information to make an informed decision. The business risk of reliability of service and safety is owned by operations. Operations is best informed about articulating the consequences of different levels of security disruption. In many cases, Operations will be responsible for performing control activities and serve as the first line of defense. Operations also owns the budget — either directly or indirectly — and is also mitigating other causes of service interruption. IT plays an important role: the IT network remains one of the most exploited vectors for OT cyber events, and IT is often responsible for managing some of the IT equipment on the OT networks. Security is best informed to assess the vulnerability and the likelihood of a threat actor exploiting the vulnerability. Security is also best informed on evaluating different mitigation strategies and making sure the right defense in depth is being considered.
Historically, an autocratic model of security telling the business what to do was an effective way to rapidly put in place basic hygiene. That model was effective for that time and that need. What got us to where we are today won’t be successful in getting us to where we need to be tomorrow. We need to get smarter about where we are spending our limited dollars. We need to engage the business more, explain the threats and our vulnerabilities, have the business explain the ramifications to its operations and get a consensus on how to best spend the limited funds. This approach is far more collaborative and positioned around business risk. It puts the ownership of security risk on all, and the security function becomes a business partner. Being valued as a business partner and brought to the table early in digital programs can help such programs be delivered securely. Getting this operating model right is not only important for the security function but also critical for the utility.
Matt Chambers is EY’s Global and Americas Power & Utilities Risk Leader.The views reflected in this article are the views of the author and do not necessarily reflect the views of Ernst & Young LLP or other members of the global EY organization.