Operations & Maintenance
Chapter 12 — Ongoing operations, maintenance schedules, change management, and lifecycle management for OT/IT network segmentation systems
12.1 O&M Lifecycle Overview
Maintaining the security effectiveness of an OT/IT network segmentation system requires a structured, recurring operations and maintenance program. Unlike IT systems where security updates can be applied rapidly, OT environments require carefully planned maintenance windows, extensive testing before changes, and a clear change management process that accounts for process safety implications. The lifecycle diagram below illustrates the complete O&M cycle, from daily monitoring tasks through annual security assessments and incident response procedures.
Figure 12.1: O&M Lifecycle — Complete operations and maintenance lifecycle for the OT/IT DMZ system. Six phases arranged in a clockwise circular diagram: Daily Monitoring (dashboard, alerts, log review), Weekly Tasks (firewall rule review, IDS tuning, backup verification), Monthly Tasks (vulnerability scan, patch assessment, access review, compliance check), Quarterly Tasks (penetration test, DR drill, configuration audit), Annual Tasks (full security assessment, architecture review, staff training, certification renewal), and Incident Response (detect, triage, contain, eradicate, recover, post-incident review). The OT/IT DMZ shield icon is at the center.
12.2 Maintenance Schedule
The maintenance schedule defines the specific tasks, their frequency, responsible parties, and estimated effort for each maintenance activity. The schedule is designed to ensure that the security posture of the segmentation system is continuously maintained without requiring excessive operational overhead. Tasks are organized by frequency to enable efficient scheduling and resource planning.
| Frequency | Task | Responsible Party | Estimated Effort | Documentation Required |
|---|---|---|---|---|
| Daily | Review security dashboard and IDS alerts; triage and investigate anomalies | SOC / OT Security Analyst | 30–60 min | Daily alert log |
| Verify firewall HA status and system health (CPU, memory, disk) | OT Security Analyst | 15 min | Health check log | |
| Review syslog for authentication failures and policy violations | OT Security Analyst | 15 min | Daily security log | |
| Weekly | Review and clean up firewall rule hit counts; identify unused rules | OT Security Engineer | 1–2 hours | Rule review report |
| Tune IDS alert thresholds; review and close false positive alerts | OT Security Engineer | 1–2 hours | IDS tuning log | |
| Verify backup integrity: test restore of firewall configuration backup | OT Security Engineer | 1 hour | Backup verification log | |
| Monthly | Run OT-safe vulnerability scan; assess and prioritize findings | OT Security Engineer | 4–8 hours | Vulnerability scan report |
| Assess available patches for all DMZ and OT systems; create patch plan | OT Security Engineer + OT Engineering | 2–4 hours | Patch assessment report | |
| Review and recertify all user access rights (least privilege review) | OT Security Engineer + Plant Manager | 2–4 hours | Access review sign-off | |
| Compliance check against IEC 62443 / NIST SP 800-82 control requirements | OT Security Engineer | 2–4 hours | Compliance check report | |
| Quarterly | Conduct penetration test of IT-OT boundary; verify no new paths exist | External Security Firm | 1–2 days | Penetration test report |
| Conduct disaster recovery drill: simulate firewall failure, test failover and recovery | OT Security Team + Plant Operations | 4–8 hours | DR drill report | |
| Full configuration audit: compare running config to approved baseline | OT Security Engineer | 4–8 hours | Configuration audit report | |
| Annual | Full OT security assessment: architecture review, risk assessment update | External Security Firm + OT Security Team | 1–2 weeks | Annual security assessment report |
| Network architecture review: assess new threats, technology changes, business requirements | OT Security Architect | 2–3 days | Architecture review report | |
| Staff security awareness training: OT security, incident response, social engineering | OT Security Team | 4–8 hours per staff | Training completion records | |
| System certification renewal: IEC 62443, NERC CIP, or applicable framework | OT Security Team + Compliance | Variable | Certification renewal documentation |
12.3 Change Management Process
Change management for OT/IT segmentation systems must follow a rigorous process that balances the need for timely security updates with the operational constraints of OT environments. All changes to firewall rules, switch configurations, DMZ services, or access controls must go through the formal change management process. Emergency changes (required to respond to an active security incident) follow an expedited process but must still be documented and reviewed within 24 hours of implementation.
| Change Type | Approval Required | Testing Required | Maintenance Window | Rollback Plan |
|---|---|---|---|---|
| Firewall Rule Addition (permit) | OT Security Engineer + Plant Manager | Lab test + staged deployment | Scheduled maintenance window | Rule deletion; revert to previous policy backup |
| Firewall Rule Modification | OT Security Engineer + CISO | Full regression test | Scheduled maintenance window | Restore previous policy backup |
| Firewall Firmware Upgrade | CISO + Plant Manager | Lab test on identical hardware | Planned outage window | Downgrade to previous firmware |
| Switch Configuration Change | OT Security Engineer + Network Engineer | Lab test + connectivity verification | Scheduled maintenance window | Restore previous configuration backup |
| DMZ Service Configuration | OT Security Engineer | Service functionality test | Scheduled maintenance window | Restore previous service configuration |
| Emergency Change (active incident) | CISO (verbal) + post-change written approval within 24h | Minimal; document risk acceptance | Immediate (no window required) | Revert within 24h if not confirmed effective |
12.4 End-of-Life and Refresh Planning
OT/IT segmentation equipment has a defined operational lifecycle that must be proactively managed. Unlike OT field devices which may operate for 20+ years, network security equipment (firewalls, managed switches) typically has a vendor support lifecycle of 5–10 years. Planning for equipment refresh must account for the lead time required for procurement, testing, and the complexity of migrating configurations to new hardware without disrupting OT operations.
| Equipment Type | Typical Support Lifecycle | Refresh Trigger | Lead Time for Refresh | Migration Complexity |
|---|---|---|---|---|
| Industrial Firewall (OT-FW) | 7–10 years | End of vendor support; new threat capability gap | 6–12 months | High (full policy migration, testing) |
| Industrial Managed Switch | 10–15 years | End of vendor support; port density requirement change | 3–6 months | Medium (VLAN/STP configuration migration) |
| OT IDS Sensor | 5–7 years | End of vendor support; new OT protocol support needed | 3–6 months | Low (sensor replacement, baseline re-learning) |
| Bastion Host / PAM Server | 5–7 years | OS end of support; hardware failure | 3–6 months | Medium (user database migration, policy export) |
| Fiber Optic Cabling | 20–25 years | Physical damage; connector degradation (>3dB loss) | 1–4 weeks | Low (cable replacement) |
| Copper Cabling (Cat6) | 15–20 years | Physical damage; test failure | 1–2 weeks | Low (cable replacement) |