IBP Rank 6 Network Infrastructure Documentation

ROTKO NETWORKS OÜ (AS142108)

Executive Summary

ROTKO NETWORKS operates a highly redundant, carrier-grade network infrastructure in Bangkok, Thailand, designed to meet and exceed IBP Rank 6 requirements. Our infrastructure features full redundancy at every layer - from multiple 10G/100G transit connections to redundant route reflectors, hypervisors, and storage systems.

Network Architecture Overview

Core Routing Infrastructure

Our network utilizes a hierarchical BGP architecture with dedicated route reflectors providing redundancy and scalability:

Route Reflectors (Core Routers)

BKK00 (CCR2216-1G-12XS-2XQ) - Primary Route Reflector
- Router ID: 10.155.255.4
- 100G connections to edge routers
- Multiple 10G transit/IX connections
BKK20 (CCR2216-1G-12XS-2XQ) - Secondary Route Reflector
- Router ID: 10.155.255.2
- 100G connections to edge routers
- Diverse 10G transit/IX connections

Both route reflectors run iBGP with full mesh connectivity and serve as aggregation points for external BGP sessions.

Layer 2 Switching Infrastructure

Core Switches

BKK30 (CRS504-4XQ) - 4x100G QSFP28 switch
BKK40 (CRS504-4XQ) - 4x100G QSFP28 switch
BKK10 (CCR2216-1G-12XS-4XQ) - 4xSFP+ 25G 12x 1G (fallback for 100G)
BKK60 (CRS354-48G-4S+2Q+) - 48x1G + 4xSFP+ + 2xQSFP+ switch (reserved)
BKK50 (CCR2004-16G-2S+) - Management network

These switches provide Q-in-Q VLAN tagging for service isolation and traffic segregation.

3x Independent 10G fiber uplinks from different carriers
Physically diverse cable routes to prevent simultaneous cuts
Multiple cross-connects within the data center
Dual power feeds to all critical equipment

Hardware Redundancy

Dual PSU on all servers and network equipment
Hot-swappable components (fans, PSUs, drives)
N+1 cooling in server chassis
Spare equipment pre-racked for rapid replacement

2. Network Layer Redundancy

Transit Diversity

Our multi-homed architecture eliminates single points of failure:

Primary Transit - HGC (AS9304)

Hong Kong Path:
├── Active: BKK00 → VLAN 2519 → 400M (burst 800M)
└── Backup: BKK20 → VLAN 2517 → 800M standby

Singapore Path:
├── Active: BKK20 → VLAN 2520 → 400M (burst 800M)
└── Backup: BKK00 → VLAN 2518 → 800M standby

Internet Exchange Diversity

Local (Thailand):
├── BKNIX: 10G direct peering (200+ networks)
└── AMS-IX Bangkok: 1G peering (100+ networks)

Regional:
├── AMS-IX Hong Kong: 200M connection
└── HGC IPTx: Dual paths to HK/SG

Global:
└── AMS-IX Europe: 100M connection

BGP Redundancy Features

Dual Route Reflectors: Automatic failover with iBGP
BFD Detection: 100ms failure detection
Graceful Restart: Maintains forwarding during control plane restart
Path Diversity: Multiple valid routes to every destination
ECMP Load Balancing: Traffic distributed across equal-cost paths

3. Hypervisor Layer Redundancy

Proxmox Cluster Features

High Availability Configuration:
├── Live Migration: Zero-downtime VM movement
├── HA Manager: Automatic VM restart on node failure
├── Shared Storage: Distributed access to VM data
└── Fencing: Ensures failed nodes are isolated

Per-Hypervisor Network Redundancy

Each hypervisor (BKK06, BKK07, BKK08) implements:

Dual Uplink Configuration:
├── bond-bkk00 (Active Path)
│   ├── Primary: vlan1X7 @ 100G
│   └── Backup: vlan1X7 @ 100G (different switch)
└── bond-bkk20 (Standby Path)
    ├── Primary: vlan2X7 @ 100G
    └── Backup: vlan2X7 @ 100G (different switch)

Failover Time: <100ms using active-backup bonding

4. Storage Layer Redundancy

ZFS Resilience by Node

BKK06 - Maximum Redundancy (Mirror)

Configuration: 6x mirror vdevs (2-way mirrors)
Fault Tolerance: 
- Can lose 1 disk per mirror (up to 6 disks total)
- Instant read performance during rebuild
- 50% storage efficiency for maximum protection
Recovery: Hot spare activation < 1 minute

BKK07 - Balanced Redundancy (RAIDZ2)

Configuration: 12-disk RAIDZ2
Fault Tolerance:
- Can lose any 2 disks simultaneously
- Maintains full operation during rebuild
- 83% storage efficiency
Recovery: Distributed parity reconstruction

BKK08 - Performance Redundancy (RAIDZ1)

Configuration: 2x 4-disk RAIDZ1 vdevs
Fault Tolerance:
- Can lose 1 disk per vdev (2 disks total)
- Parallel vdev operation for performance
- 75% storage efficiency
Recovery: Per-vdev independent rebuild

5. Service Layer Redundancy

Container Distribution Strategy

Service Deployment:
├── Primary Instance: BKK06
├── Secondary Instance: BKK07
├── Tertiary/Specialized: BKK08
└── Load Balancing: HAProxy on each node

Failover Mechanism:
- Health checks every 2 seconds
- Automatic traffic rerouting
- Session persistence via cookies
- <5 second service failover

Anycast Services

Global Anycast: 160.22.180.180/32
├── Announced from all locations
├── BGP-based geographic routing
└── Automatic closest-node selection

Local Anycast: 160.22.181.81/32
├── Thailand-specific services
├── Lower latency for regional users
└── Fallback to global on failure

6. Failure Scenarios & Recovery

Single Component Failures (No Impact)

1 Transit Link Down: Traffic reroutes via alternate transit
1 Route Reflector Down: Secondary handles all traffic
1 Hypervisor Down: VMs migrate or run from replicas
1 Disk Failure: ZFS continues with degraded redundancy
1 PSU Failure: Secondary PSU maintains operation

Multiple Component Failures (Degraded but Operational)

Both HK Links Down: Singapore paths maintain connectivity
Entire Hypervisor Failure: Services run from remaining 2 nodes
Multiple Disk Failures: ZFS tolerates per design limits
Primary + Backup Network Path: Tertiary paths available

Disaster Recovery

Complete DC Failure:
- Off-site backups available
- DNS failover to alternate regions
- Recovery Time Objective (RTO): 4 hours
- Recovery Point Objective (RPO): 1 hour

Redundancy Validation & Testing

Automated Testing

BGP Session Monitoring: Real-time alerting on session drops
Path Validation: Continuous reachability tests
Storage Health: ZFS scrubs weekly, SMART monitoring
Service Health: Prometheus + Grafana dashboards

Manual Testing Schedule

Monthly: Controlled failover testing
Quarterly: Full redundancy validation
Annually: Disaster recovery drill

Compliance with IBP Rank 6 Requirements

✓ No Single Point of Failure

Every critical component has at least one backup:

Dual route reflectors with automatic failover
Multiple transit providers and exchange points
Redundant power, cooling, and network paths
Distributed storage with multi-disk fault tolerance

✓ Sub-Second Network Convergence

BFD: 100ms detection + 200ms convergence = 300ms total
Bond failover: <100ms for layer 2 switchover
BGP: Graceful restart maintains forwarding plane

✓ Geographic & Provider Diversity

Transit via Hong Kong and Singapore
Peering in Bangkok, Hong Kong, and Amsterdam
Multiple submarine cable systems
Carrier-neutral facility

✓ Automated Recovery

HA cluster manages VM availability
BGP automatically selects best paths
ZFS self-healing with checksum validation
Container orchestration via systemd

This comprehensive redundancy architecture ensures 99.95% uptime SLA compliance and exceeds all IBP Rank 6 requirements for infrastructure resilience.

Rotko Networks infrastructure