Keyboard shortcuts

Press or to navigate between chapters

Press ? to show this help

Press Esc to hide this help

IBP Rank 6 Network Infrastructure Documentation

ROTKO NETWORKS OÜ (AS142108)

Executive Summary

ROTKO NETWORKS operates a highly redundant, carrier-grade network infrastructure in Bangkok, Thailand, designed to meet and exceed IBP Rank 6 requirements. Our infrastructure features full redundancy at every layer - from multiple 10G/100G transit connections to redundant route reflectors, hypervisors, and storage systems.

Network Architecture Overview

Core Routing Infrastructure

Our network utilizes a hierarchical BGP architecture with dedicated route reflectors providing redundancy and scalability:

Route Reflectors (Core Routers)

  • BKK00 (CCR2216-1G-12XS-2XQ) - Primary Route Reflector

    • Router ID: 10.155.255.4
    • 100G connections to edge routers
    • Multiple 10G transit/IX connections
  • BKK20 (CCR2216-1G-12XS-2XQ) - Secondary Route Reflector

    • Router ID: 10.155.255.2
    • 100G connections to edge routers
    • Diverse 10G transit/IX connections

Both route reflectors run iBGP with full mesh connectivity and serve as aggregation points for external BGP sessions.

Layer 2 Switching Infrastructure

Core Switches

  • BKK30 (CRS504-4XQ) - 4x100G QSFP28 switch
  • BKK40 (CRS504-4XQ) - 4x100G QSFP28 switch
  • BKK10 (CCR2216-1G-12XS-4XQ) - 4xSFP+ 25G 12x 1G (fallback for 100G)
  • BKK60 (CRS354-48G-4S+2Q+) - 48x1G + 4xSFP+ + 2xQSFP+ switch (reserved)
  • BKK50 (CCR2004-16G-2S+) - Management network

These switches provide Q-in-Q VLAN tagging for service isolation and traffic segregation.

Multi-Layer Redundancy Architecture

1. Physical Layer Redundancy

Diverse Fiber Paths

  • 3x Independent 10G fiber uplinks from different carriers
  • Physically diverse cable routes to prevent simultaneous cuts
  • Multiple cross-connects within the data center
  • Dual power feeds to all critical equipment

Hardware Redundancy

  • Dual PSU on all servers and network equipment
  • Hot-swappable components (fans, PSUs, drives)
  • N+1 cooling in server chassis
  • Spare equipment pre-racked for rapid replacement

2. Network Layer Redundancy

Transit Diversity

Our multi-homed architecture eliminates single points of failure:

Primary Transit - HGC (AS9304)

Hong Kong Path:
├── Active: BKK00 → VLAN 2519 → 400M (burst 800M)
└── Backup: BKK20 → VLAN 2517 → 800M standby

Singapore Path:
├── Active: BKK20 → VLAN 2520 → 400M (burst 800M)
└── Backup: BKK00 → VLAN 2518 → 800M standby

Internet Exchange Diversity

Local (Thailand):
├── BKNIX: 10G direct peering (200+ networks)
└── AMS-IX Bangkok: 1G peering (100+ networks)

Regional:
├── AMS-IX Hong Kong: 200M connection
└── HGC IPTx: Dual paths to HK/SG

Global:
└── AMS-IX Europe: 100M connection

BGP Redundancy Features

  • Dual Route Reflectors: Automatic failover with iBGP
  • BFD Detection: 100ms failure detection
  • Graceful Restart: Maintains forwarding during control plane restart
  • Path Diversity: Multiple valid routes to every destination
  • ECMP Load Balancing: Traffic distributed across equal-cost paths

3. Hypervisor Layer Redundancy

Proxmox Cluster Features

High Availability Configuration:
├── Live Migration: Zero-downtime VM movement
├── HA Manager: Automatic VM restart on node failure
├── Shared Storage: Distributed access to VM data
└── Fencing: Ensures failed nodes are isolated

Per-Hypervisor Network Redundancy

Each hypervisor (BKK06, BKK07, BKK08) implements:

Dual Uplink Configuration:
├── bond-bkk00 (Active Path)
│   ├── Primary: vlan1X7 @ 100G
│   └── Backup: vlan1X7 @ 100G (different switch)
└── bond-bkk20 (Standby Path)
    ├── Primary: vlan2X7 @ 100G
    └── Backup: vlan2X7 @ 100G (different switch)

Failover Time: <100ms using active-backup bonding

4. Storage Layer Redundancy

ZFS Resilience by Node

BKK06 - Maximum Redundancy (Mirror)

Configuration: 6x mirror vdevs (2-way mirrors)
Fault Tolerance: 
- Can lose 1 disk per mirror (up to 6 disks total)
- Instant read performance during rebuild
- 50% storage efficiency for maximum protection
Recovery: Hot spare activation < 1 minute

BKK07 - Balanced Redundancy (RAIDZ2)

Configuration: 12-disk RAIDZ2
Fault Tolerance:
- Can lose any 2 disks simultaneously
- Maintains full operation during rebuild
- 83% storage efficiency
Recovery: Distributed parity reconstruction

BKK08 - Performance Redundancy (RAIDZ1)

Configuration: 2x 4-disk RAIDZ1 vdevs
Fault Tolerance:
- Can lose 1 disk per vdev (2 disks total)
- Parallel vdev operation for performance
- 75% storage efficiency
Recovery: Per-vdev independent rebuild

5. Service Layer Redundancy

Container Distribution Strategy

Service Deployment:
├── Primary Instance: BKK06
├── Secondary Instance: BKK07
├── Tertiary/Specialized: BKK08
└── Load Balancing: HAProxy on each node

Failover Mechanism:
- Health checks every 2 seconds
- Automatic traffic rerouting
- Session persistence via cookies
- <5 second service failover

Anycast Services

Global Anycast: 160.22.180.180/32
├── Announced from all locations
├── BGP-based geographic routing
└── Automatic closest-node selection

Local Anycast: 160.22.181.81/32
├── Thailand-specific services
├── Lower latency for regional users
└── Fallback to global on failure

6. Failure Scenarios & Recovery

Single Component Failures (No Impact)

  • 1 Transit Link Down: Traffic reroutes via alternate transit
  • 1 Route Reflector Down: Secondary handles all traffic
  • 1 Hypervisor Down: VMs migrate or run from replicas
  • 1 Disk Failure: ZFS continues with degraded redundancy
  • 1 PSU Failure: Secondary PSU maintains operation

Multiple Component Failures (Degraded but Operational)

  • Both HK Links Down: Singapore paths maintain connectivity
  • Entire Hypervisor Failure: Services run from remaining 2 nodes
  • Multiple Disk Failures: ZFS tolerates per design limits
  • Primary + Backup Network Path: Tertiary paths available

Disaster Recovery

  • Complete DC Failure:
    • Off-site backups available
    • DNS failover to alternate regions
    • Recovery Time Objective (RTO): 4 hours
    • Recovery Point Objective (RPO): 1 hour

Redundancy Validation & Testing

Automated Testing

  • BGP Session Monitoring: Real-time alerting on session drops
  • Path Validation: Continuous reachability tests
  • Storage Health: ZFS scrubs weekly, SMART monitoring
  • Service Health: Prometheus + Grafana dashboards

Manual Testing Schedule

  • Monthly: Controlled failover testing
  • Quarterly: Full redundancy validation
  • Annually: Disaster recovery drill

Compliance with IBP Rank 6 Requirements

✓ No Single Point of Failure

Every critical component has at least one backup:

  • Dual route reflectors with automatic failover
  • Multiple transit providers and exchange points
  • Redundant power, cooling, and network paths
  • Distributed storage with multi-disk fault tolerance

✓ Sub-Second Network Convergence

  • BFD: 100ms detection + 200ms convergence = 300ms total
  • Bond failover: <100ms for layer 2 switchover
  • BGP: Graceful restart maintains forwarding plane

✓ Geographic & Provider Diversity

  • Transit via Hong Kong and Singapore
  • Peering in Bangkok, Hong Kong, and Amsterdam
  • Multiple submarine cable systems
  • Carrier-neutral facility

✓ Automated Recovery

  • HA cluster manages VM availability
  • BGP automatically selects best paths
  • ZFS self-healing with checksum validation
  • Container orchestration via systemd

This comprehensive redundancy architecture ensures 99.95% uptime SLA compliance and exceeds all IBP Rank 6 requirements for infrastructure resilience.