Design Methods
Chapter 2 — Executable design principles, failure cause analysis, isolation level decision logic, and key design dimensions
2.1 Executable Design Principles and Basis
Effective OT–IT isolation design is grounded in a set of executable principles that translate security intent into engineering decisions. These principles are not abstract guidelines — each one specifies when it applies, what the technical basis is, and how compliance is verified. The following twelve principles form the design foundation for every OT–IT boundary implementation covered by this guide.
| # | Principle | When Applied | Technical Basis | Verification Method |
|---|---|---|---|---|
| 1 | Business Continuity First — security must not become the top outage cause | Always in process control | Availability risk assessment, safety requirements | Uptime metrics; security-caused incident count |
| 2 | Minimum Interconnection — only connect what is required by a documented business need | Any OT–IT link request | Least privilege principle, audit findings | Flow matrix review; orphaned rule audit |
| 3 | Zoning & Conduits — segment by function/criticality; connect zones through controlled conduits | Any new line/station | IEC 62443 zoning concept | VLAN audit; traceroute between zones |
| 4 | Default Deny at Boundaries — start from deny-all, then whitelist precisely | Firewall/diode policies | Secure-by-default control | Negative test: blocked flows confirmed |
| 5 | Protocol Break and Content Inspection — use proxies/gateways to break direct sessions and scan content | File transfer, APIs, historian replication | Malware prevention practice | EICAR test; proxy log review |
| 6 | Strong Identity and Session Accountability — named accounts, MFA, session recording, JIT approvals | Remote O&M, engineering changes | PAM best practice | PAM report; recording sample audit |
| 7 | Offline/Staged Updates — patches and AV signatures pass through staging + verification before OT deployment | Limited windows and legacy systems | OT operational constraints | Patch manifest and hash validation |
| 8 | Deterministic Performance — enforce QoS; avoid deep inspection where it breaks latency constraints | Control loops sensitive to delay/jitter | Control engineering constraints | RTT/jitter measurement under load |
| 9 | Defense-in-Depth — firewall + hardening + monitoring + backup + response plan | All tiers | Layered security model | Control coverage matrix |
| 10 | Auditable Change Control — every ruleset change is ticketed, reviewed, tested, and reversible | Rule updates, new conduits | ISO-style operational controls | Change ticket completeness audit |
| 11 | Fail Secure, Not Fail Open — boundary failure should not expose OT to IT (unless safety requires otherwise and is explicitly approved) | HA design, bypass switches | Risk governance | Failover drill; bypass policy review |
| 12 | Separate Management Plane — manage firewalls/DMZ hosts from dedicated admin networks with strict access | Medium/large sites | Attack surface reduction | Mgmt VLAN isolation test |
2.2 Failure Causes and Recommendations
Many OT security incidents can be traced to a small set of recurring design or operational failures. Understanding these failure patterns and their mechanisms enables designers to build preventive controls into the architecture from the outset, rather than discovering vulnerabilities during incidents. The table below documents eight high-frequency failure groups with their mechanisms, recommended avoidance strategies, and operational checks.
| Common Failure Cause | Failure Mechanism | Recommendation (Avoidance) | Operational Check |
|---|---|---|---|
| Flat network "for convenience" | Any IT compromise reaches controllers directly | Enforce zones; no L3 route IT↔OT | Route table & traceroute audit |
| "Temporary" any-any rules | Rules remain forever, become permanent backdoors | Time-bound rules + auto-expiry enforcement | Weekly rules aging report |
| Direct VPN into OT | Credential theft gives full OT access | VPN terminates in DMZ only; bastion required | Verify VPN split-tunnel disabled |
| Shared vendor accounts | No attribution, password reuse across vendors | Named accounts + MFA + JIT approvals | PAM report and account review |
| No protocol understanding | Wrong DPI settings block process traffic | Pilot test in lab; staged rollout with monitoring | Pre-prod acceptance tests |
| Patch from Internet directly | Introduces malware or breaks system stability | Offline staging + checksum + rollback plan | Patch manifest & hash validation |
| Logs not time-synced | Correlation fails; delayed incident response | NTP/PTP relays; drift monitoring dashboard | Time drift dashboard alerts |
| Under-sized firewalls | Drops cause control disruptions during IT incidents | Size by CPS/sessions/latency with headroom | Load test and headroom policy enforcement |
2.3 Core Design / Selection Logic
The isolation level selection process follows a structured decision tree that maps the functional requirements of each OT–IT interconnect to the appropriate security mechanism. The key driver is the directionality of the required data flow: one-way export requirements favor unidirectional gateways (data diodes), while bidirectional requirements must be carefully evaluated to determine whether they can be safely brokered through a DMZ. Any requirement that cannot be safely brokered in a DMZ should trigger an application re-architecture review rather than a direct OT–IT connection.
Figure 2.1: Decision Tree for Isolation Level Selection — from root question "Need OT-IT Communication?" through directionality and DMZ feasibility checks to the four outcome paths: Physical Isolation, One-Way Gateway, DMZ Controlled Interconnect, or Re-architect/Reject.
Step-by-Step Design Sequence
The following nine-step sequence provides a structured methodology for designing an OT–IT isolation solution from initial assessment through operational readiness. Each step builds on the outputs of the previous step, creating a traceable design record that supports both acceptance testing and ongoing operations.
- Build asset inventory and classify by criticality (Safety, Production, Quality, Visibility).
- Build communication flow matrix — who talks to whom, protocol, ports, direction, frequency, latency.
- Decide zoning — Field/Control/Core/DMZ/O&M/Data Uplink zones with VLAN/subnet assignments.
- For each OT–IT flow, choose isolation level using directionality, integrity needs, and operational need.
- Select devices — industrial firewall vs. diode vs. isolation gateway; determine HA requirements.
- Design whitelist rules and inspection points; define logging and alerting requirements.
- Define O&M model — bastion, MFA, JIT approvals, recording, file transfer controls.
- Define update model — offline staging, scanning, release ring, rollback procedures.
- Define acceptance tests and O&M runbooks; establish change control procedures.
2.4 Key Design Dimensions
A complete OT–IT isolation design must address eight key dimensions that span technical performance, operational sustainability, regulatory compliance, and lifecycle economics. Neglecting any dimension creates gaps that may not surface until acceptance testing or, worse, during an incident. The table below provides a structured overview of each dimension with its key considerations and typical metrics.
| Dimension | Key Considerations | Typical Metrics / Targets | Design Impact |
|---|---|---|---|
| Performance / Experience | Latency, jitter, throughput, CPS; inspection must not break control | RTT <5ms added by FW; CPS headroom ≥4× | DPI placement, bypass for time-critical paths |
| Stability / Reliability | HA, link redundancy, failover behavior, deterministic operation | HA failover ≤30s; no fail-open behavior | Active/standby FW pairs, dual uplinks, state sync |
| Maintain / Replace | Modular design; spare parts; version control; rollback | MTTR <4h for boundary devices | Spare inventory, golden images, staged upgrades |
| Compatibility / Extension | Protocol support (Modbus/TCP, OPC UA, DNP3, IEC 104), vendor interoperability | ≥6 OT protocols supported by DPI engine | Protocol testing in lab before production |
| LCC / TCO | Licensing, support, lifecycle, spares, training | 5-year TCO comparison across vendors | Avoid proprietary lock-in; require exportable configs |
| Energy & Environment | Power draw, heat, cabinet cooling, EMI tolerance | Industrial-grade: -40 to +70°C, DIN rail mount | Cabinet thermal design, UPS sizing |
| Compliance / Certification | IEC 62443 component claims, security hardening guides | IEC 62443-4-2 SL2 component certification | Vendor certification documentation required |
| Operability | Logs, dashboards, runbooks, change workflow | Alert-to-ticket time <5 min; runbook coverage 100% | SIEM integration, change management tooling |