Chapter 2: Design Methods — OT/IT Network Segmentation Design Guide

2.1 Executable Design Principles and Basis

Effective OT–IT isolation design is grounded in a set of executable principles that translate security intent into engineering decisions. These principles are not abstract guidelines — each one specifies when it applies, what the technical basis is, and how compliance is verified. The following twelve principles form the design foundation for every OT–IT boundary implementation covered by this guide.

#	Principle	When Applied	Technical Basis	Verification Method
1	Business Continuity First — security must not become the top outage cause	Always in process control	Availability risk assessment, safety requirements	Uptime metrics; security-caused incident count
2	Minimum Interconnection — only connect what is required by a documented business need	Any OT–IT link request	Least privilege principle, audit findings	Flow matrix review; orphaned rule audit
3	Zoning & Conduits — segment by function/criticality; connect zones through controlled conduits	Any new line/station	IEC 62443 zoning concept	VLAN audit; traceroute between zones
4	Default Deny at Boundaries — start from deny-all, then whitelist precisely	Firewall/diode policies	Secure-by-default control	Negative test: blocked flows confirmed
5	Protocol Break and Content Inspection — use proxies/gateways to break direct sessions and scan content	File transfer, APIs, historian replication	Malware prevention practice	EICAR test; proxy log review
6	Strong Identity and Session Accountability — named accounts, MFA, session recording, JIT approvals	Remote O&M, engineering changes	PAM best practice	PAM report; recording sample audit
7	Offline/Staged Updates — patches and AV signatures pass through staging + verification before OT deployment	Limited windows and legacy systems	OT operational constraints	Patch manifest and hash validation
8	Deterministic Performance — enforce QoS; avoid deep inspection where it breaks latency constraints	Control loops sensitive to delay/jitter	Control engineering constraints	RTT/jitter measurement under load
9	Defense-in-Depth — firewall + hardening + monitoring + backup + response plan	All tiers	Layered security model	Control coverage matrix
10	Auditable Change Control — every ruleset change is ticketed, reviewed, tested, and reversible	Rule updates, new conduits	ISO-style operational controls	Change ticket completeness audit
11	Fail Secure, Not Fail Open — boundary failure should not expose OT to IT (unless safety requires otherwise and is explicitly approved)	HA design, bypass switches	Risk governance	Failover drill; bypass policy review
12	Separate Management Plane — manage firewalls/DMZ hosts from dedicated admin networks with strict access	Medium/large sites	Attack surface reduction	Mgmt VLAN isolation test

2.2 Failure Causes and Recommendations

Many OT security incidents can be traced to a small set of recurring design or operational failures. Understanding these failure patterns and their mechanisms enables designers to build preventive controls into the architecture from the outset, rather than discovering vulnerabilities during incidents. The table below documents eight high-frequency failure groups with their mechanisms, recommended avoidance strategies, and operational checks.

Common Failure Cause	Failure Mechanism	Recommendation (Avoidance)	Operational Check
Flat network "for convenience"	Any IT compromise reaches controllers directly	Enforce zones; no L3 route IT↔OT	Route table & traceroute audit
"Temporary" any-any rules	Rules remain forever, become permanent backdoors	Time-bound rules + auto-expiry enforcement	Weekly rules aging report
Direct VPN into OT	Credential theft gives full OT access	VPN terminates in DMZ only; bastion required	Verify VPN split-tunnel disabled
Shared vendor accounts	No attribution, password reuse across vendors	Named accounts + MFA + JIT approvals	PAM report and account review
No protocol understanding	Wrong DPI settings block process traffic	Pilot test in lab; staged rollout with monitoring	Pre-prod acceptance tests
Patch from Internet directly	Introduces malware or breaks system stability	Offline staging + checksum + rollback plan	Patch manifest & hash validation
Logs not time-synced	Correlation fails; delayed incident response	NTP/PTP relays; drift monitoring dashboard	Time drift dashboard alerts
Under-sized firewalls	Drops cause control disruptions during IT incidents	Size by CPS/sessions/latency with headroom	Load test and headroom policy enforcement

2.3 Core Design / Selection Logic

The isolation level selection process follows a structured decision tree that maps the functional requirements of each OT–IT interconnect to the appropriate security mechanism. The key driver is the directionality of the required data flow: one-way export requirements favor unidirectional gateways (data diodes), while bidirectional requirements must be carefully evaluated to determine whether they can be safely brokered through a DMZ. Any requirement that cannot be safely brokered in a DMZ should trigger an application re-architecture review rather than a direct OT–IT connection.

Figure 2.1: Decision Tree for Isolation Level Selection — from root question "Need OT-IT Communication?" through directionality and DMZ feasibility checks to the four outcome paths: Physical Isolation, One-Way Gateway, DMZ Controlled Interconnect, or Re-architect/Reject.

Step-by-Step Design Sequence

The following nine-step sequence provides a structured methodology for designing an OT–IT isolation solution from initial assessment through operational readiness. Each step builds on the outputs of the previous step, creating a traceable design record that supports both acceptance testing and ongoing operations.

Build asset inventory and classify by criticality (Safety, Production, Quality, Visibility).
Build communication flow matrix — who talks to whom, protocol, ports, direction, frequency, latency.
Decide zoning — Field/Control/Core/DMZ/O&M/Data Uplink zones with VLAN/subnet assignments.
For each OT–IT flow, choose isolation level using directionality, integrity needs, and operational need.
Select devices — industrial firewall vs. diode vs. isolation gateway; determine HA requirements.
Design whitelist rules and inspection points; define logging and alerting requirements.
Define O&M model — bastion, MFA, JIT approvals, recording, file transfer controls.
Define update model — offline staging, scanning, release ring, rollback procedures.
Define acceptance tests and O&M runbooks; establish change control procedures.

2.4 Key Design Dimensions

A complete OT–IT isolation design must address eight key dimensions that span technical performance, operational sustainability, regulatory compliance, and lifecycle economics. Neglecting any dimension creates gaps that may not surface until acceptance testing or, worse, during an incident. The table below provides a structured overview of each dimension with its key considerations and typical metrics.

Dimension	Key Considerations	Typical Metrics / Targets	Design Impact
Performance / Experience	Latency, jitter, throughput, CPS; inspection must not break control	RTT <5ms added by FW; CPS headroom ≥4×	DPI placement, bypass for time-critical paths
Stability / Reliability	HA, link redundancy, failover behavior, deterministic operation	HA failover ≤30s; no fail-open behavior	Active/standby FW pairs, dual uplinks, state sync
Maintain / Replace	Modular design; spare parts; version control; rollback	MTTR <4h for boundary devices	Spare inventory, golden images, staged upgrades
Compatibility / Extension	Protocol support (Modbus/TCP, OPC UA, DNP3, IEC 104), vendor interoperability	≥6 OT protocols supported by DPI engine	Protocol testing in lab before production
LCC / TCO	Licensing, support, lifecycle, spares, training	5-year TCO comparison across vendors	Avoid proprietary lock-in; require exportable configs
Energy & Environment	Power draw, heat, cabinet cooling, EMI tolerance	Industrial-grade: -40 to +70°C, DIN rail mount	Cabinet thermal design, UPS sizing
Compliance / Certification	IEC 62443 component claims, security hardening guides	IEC 62443-4-2 SL2 component certification	Vendor certification documentation required
Operability	Logs, dashboards, runbooks, change workflow	Alert-to-ticket time <5 min; runbook coverage 100%	SIEM integration, change management tooling

Design Reminder: The most common dimension neglected during initial design is operability. A technically sound architecture that lacks clear runbooks, alert thresholds, and change procedures will degrade over time as staff turnover and configuration drift accumulate. Invest in operational documentation as part of the design deliverable, not as an afterthought.