Keyboard shortcuts

Press or to navigate between chapters

Press ? to show this help

Press Esc to hide this help

Introduction

These docs are open draft space of chief operator officer and not necessarily uptodate deployed setup in production. We are in process of moving docs into website/blog format so treat info here as raw as hackmd posts.

Expect low level mumbling about internet technologies as well as hardware spcecs of infra we have built ourselves part by part in place to meet the unconvetional specs for server world that hosting web3 succesfully requires. Most important components for blockchain perfomance are memory and especially storage for merkle trie state as well as high clock speed CPU.

Some rule of a thumb when looking components for validator infra, you want CPU capable to function at 5GHz(singlethread performance in cpubenhcmark +3k), at least pcie4.0 nvme with constant performance of 1800MB/s(usually marketed with temporary cahce performance 7000MB/s). Ideally collators/blockbuilder should be run with pcie5/pcie6 and executed with multiple cores/validators.

Random Access Memory, better known RAM, should be functioning at 4800MHz, meaning you can only use single stick per CPU because motherboards controller is not able to deliver enough voltage for more. For example running all 4 slots in AM5 motherboards limits performance of RAM to only 3600MHz instead of functioning at full 4800-6000MHz.

Polkadot new relaychain JAM specs are going to be 16 core 5GHz SMT disabled CPU, 8 TB of NVMe, 64GB DDR5 and networking at ~500Mbits having global routing table without too over subscribed routes since networking is not anymore gossip based but direct authenticated point to point.

Hardware Infrastructure

Machines at rack

Our primary goal is to deliver a high-performance and secure blockchain infrastructure that fosters trust and reliability. We aim to achieve this by focusing on the critical elements of blockchain technology - namely, ensuring high single-thread performance for validator services, and low latency for RPC services, among others.

Network Architecture

Network Diagram

For reference, our network topology includes the following components:

Upstream Providers

  • BKNIX, AMS-IX (Bangkok, Hong Kong, Europe)
  • HGC IPTX (Hong Kong, Singapore)
  • Backup circuits: HGC IPTX TH-HK and TH-SG

Edge Routers

  • BKK00 (MikroTik CRR2216)
  • BKK20 (MikroTik CRR2216)

Core Routers

  • BKK10 (MikroTik CRR2116) - VRRP Primary
  • BKK50 (MikroTik CCR2004) - VRRP Fallback

Storage Infrastructure

  • High-speed switches: BKK30, BKK40 (100G)
  • Storage servers: BKK06 (EPYC 7713), BKK07 (EPYC 9654), BKK08 (EPYC 7742)
  • HA Quorum Network for redundancy

Validator Nodes

  • BKK01, BKK02 (Ryzen 5600G)
  • BKK03 (Ryzen 7950X3D)
  • BKK04 (Ryzen 9950X)
  • BKK09 (Ryzen 7945HX)
  • BKK11, BKK13 (Ryzen 5950X)
  • BKK12 (Ryzen 7950X)

Management

  • BKK60 (MikroTik CRS354) for IPMI access to all devices

Performance Considerations

Validator services in blockchain infrastructures demand high single-thread performance due to the nature of their operations. Validators, in essence, validate transactions and blocks within the blockchain. They act as the arbitrators of the system, ensuring the veracity and accuracy of the information being added to the blockchain. This is an intensive process that involves complex computations and encryption, thus requiring a high-performance, single-thread system to maintain efficiency.

The low latency required for our RPC services is another vital factor in our hardware design. RPC, or Remote Procedure Call, is a protocol that allows a computer program to execute a procedure in another address space, usually on another network, without the programmer needing to explicitly code for this functionality. In simpler terms, it's a way for systems to talk to each other. Low latency in these operations is crucial to ensure a smooth and seamless dialogue between various systems within the blockchain. A delay or a lag in these communications can cause bottlenecks, leading to a slowdown in overall operations.

Hardware Specifications

The hardware components and their configurations we have selected are specifically designed to address these needs. By leveraging advanced technologies like the AMD Ryzen 9 7950X for its superior single-thread performance, DDR5 memory for fast data retrieval, and NVMe SSDs for their exceptional speed in data storage and retrieval, we aim to provide an infrastructure that can effectively handle the demands of blockchain technology.

Key Components

Node TypeProcessorMemoryStoragePurpose
ValidatorAMD Ryzen 7950X/5950XDDR5NVMe SSDsTransaction validation
StorageAMD EPYC 7713/9654/7742High-capacityRedundant arraysBlockchain data storage
EdgeMikroTik CRR2216--Network edge routing
CoreMikroTik CRR2116/CCR2004--VRRP redundant core routing

Scalability and Resilience

Our infrastructure is also designed to ensure scalability and flexibility. As the demands of the blockchain ecosystem grow, so too should our capacity to handle these increasing demands. Hence, our hardware design also incorporates elements that will allow us to easily scale up our operations when necessary.

The network features full redundancy with:

  • Cross-connected edge routers with multiple upstream providers
  • Geographic diversity with connections to Hong Kong, Singapore, Bangkok, and Europe
  • Backup IPTX circuits for fallback connectivity
  • VRRP router redundancy for core routing
  • High-speed 100G storage networking
  • HA Quorum Network for storage resilience

In essence, our hardware is purpose-built to deliver high-performance blockchain operations that are secure, reliable, and capable of scaling with the demands of the evolving blockchain ecosystem.

Rack

Rack from front

At the heart of our operations is a meticulously designed server infrastructure, securely housed within ten units of a top-tier 42U rack. Providing approximately 230 liters of computational capacity, our setup is powered by a robust dual 2kW, 220V power system that underlines our commitment to delivering superior performance and steadfast availability.

Our server infrastructure is hosted within a carrier-neutral facility, strategically enabling seamless and robust connections with a broad range of service providers, ISPs, and cloud platforms. This network versatility fosters enhanced performance and unyielding reliability, thus ensuring a consistently superior user experience.

More than a mere assembly of servers, our setup is a comprehensively designed ecosystem meticulously architected to achieve maximum efficiency. Leveraging location flexibility, our infrastructure can be configured across multiple strategic points to guarantee optimal network connectivity and minimized latency.

Direct peering arrangements with major local and international internet exchanges ensure broad bandwidth and unwavering connectivity. Coupled with floor and inter-floor cross-connect cabling, we have fostered a well-connected network capable of facilitating smooth data transfer between servers and racks.

Our infrastructure is further enhanced with a suite of cutting-edge networking devices, including industry-leading routers and switches. Services such as KVM over IP for remote server management, alongside on-site technical support and smart hands as a service, amplify our operational efficiency.

To guarantee optimal performance and longevity of our hardware, a tightly regulated environment is maintained. Our facility features controlled air temperature and humidity, ensuring the hardware operates within optimal conditions. Additionally, we have installed a UPS and backup power generators to mitigate the risk of power interruptions.

Security is paramount. Our facility, with ISO 27001 certification, employs a rigorous system of access control with logging and video surveillance, ensuring a safe and secure environment for our infrastructure. Additional safety measures such as fire alarms and smoke protection systems are in place to protect our hardware. A dedicated network operations center, operational 24/7, stands ready to promptly address any technical concerns.

Our setup also incorporates a raised floor design, an element that demonstrates our meticulous attention to detail. This design improves air distribution and cable management, leading to thermal efficiency and a well-organized operational environment.

Networking Hardware Overview

Edge Routers

  • CRR2216 (bkk10): Edge router for Bangkok site.
  • CRR2116 (bkk20): Edge router for Bangkok site.

Core Routers

  • CRR2004 (bkk50): Core router for Bangkok site.

High Availability (HA) Switches

  • CSR504 (bkk30): HA switch for Bangkok site.
  • CSR504 (bkk40): HA switch for Bangkok site.

BKK01 - Validator

BKK01

CPU

AMD RYZEN 5 5600G 6-Core 3.7 GHz (4.6 GHz Max Boost) Socket AM4 65W

The heart of our operations, the 6-core AMD RYZEN 5 5600G, offers excellent performance for blockchain applications. It provides robust and reliable service even under demanding workloads.

CPU Cooler

COOLSERVER P32 CPU Cooler

For managing the thermal performance of our CPU, we use the COOLSERVER P32 CPU Cooler. It's equipped with high-quality heatpipes and can handle the Ryzen 5 5600G even under intensive workloads.

RAM

2x32GB Hynix DDR4 3200MHz

Our setup uses 2 modules of 32GB DDR4 RAM from Hynix, providing us with ample bandwidth and ensuring smooth server operations.

Motherboard

MSI A550M-ITX/ac

The MSI A550M-ITX/ac motherboard is an engineering marvel that brings together the performance of the consumer world with the dependability of server-grade hardware. This motherboard supports the AMD Ryzen series CPUs and DDR4 memory, promising speed, reliability, and scalability.

Storage

4x 2TB NVME Monster Storage 3D TLC SSD - R:7400Mb/s W:6,600MB/s

For storage, we use 4 Monster Storage 3D TLC NVMe SSDs, each of 2TB capacity. These high-speed SSDs are known for their exceptional performance and efficiency in data storage and retrieval.

Power unit

400W PSU

This second-hand case comes with an integrated 400W Power Supply Unit. The PSU is essential for providing power to your internal components. It converts the power from the wall outlet into a usable form for your computer's components. Despite being second-hand, the PSU is in good condition and will provide a reliable power source for your system.

Chassis

Mini-ITX 1U chassis

This second-hand case comes with an integrated 400W Power Supply Unit. The PSU is essential for providing power to your internal components. It converts the power from the wall outlet into a usable form for your computer's components. Despite being second-hand, the PSU is in good condition and will provide a reliable power source for your system.

KVM

BliKVM v1 CM4 "KVM over IP" Raspberry Pi CM4 HDMI CSI PiKVM v3

A modern, highly secure, and programmable KVM solution running on Arch Linux, which provides exceptional control over your server, akin to physical access. With an easy build process, it boasts minimal video latency (about 100 ms) and a lightweight Web UI accessible from any browser. It emulates mass storage drives and allows ATX power management, secure data transmission with SSL, and local Raspberry Pi health monitoring. You can also manage GPIO and USB relays via its web interface. The PiKVM OS is production-ready, supports a read-only filesystem to prevent memory card damage, offers extensible authorization methods, and enables automation with macros.

Features of PiKVM:

  • Fully-featured and modern IP-KVM: PiKVM is up-to-date with the latest KVM technologies.
  • Easy to build: PiKVM offers ready-to-use OS images and a friendly build environment.
  • Low video latency: With approximately 100 milliseconds of video latency, it provides one of the smallest delays of all existing solutions.
  • Lightweight Web UI and VNC: The user interface is accessible through any browser, with no proprietary clients required. VNC is also supported.
  • Mass Storage Drive Emulation: On Raspberry Pi 4 and ZeroW, PiKVM can emulate a virtual CD-ROM or Flash Drive. A live image can be uploaded to boot the attached server.
  • ATX power management: PiKVM supports simple circuits for controlling the power button of the attached server.
  • Security: PiKVM is designed with strong security, using SSL to protect traffic.
  • Local monitoring: PiKVM monitors the health of the Raspberry Pi board and provides warnings for potential issues.
  • GPIO management: Control GPIO and USB relays via the web interface.
  • Production-ready: PiKVM OS is based on Arch Linux ARM and can be customized for any needs.
  • Read-only filesystem: The OS runs in read-only mode to prevent damage to the memory card due to a sudden power outage.
  • Extensible authorization methods: PiKVM supports integration into existing authentication infrastructure.
  • Macro scripts: Repetitive actions can be automated with keyboard & mouse action macros.
  • Open & free: PiKVM is open-source software, released under the GPLv3.

BKK02 - Validator 2

BKK01

CPU

AMD RYZEN 5 5600G 6-Core 3.7 GHz (4.6 GHz Max Boost) Socket AM4 65W

The heart of our operations, the 6-core AMD RYZEN 5 5600G, offers excellent performance for blockchain applications. It provides robust and reliable service even under demanding workloads.

CPU Cooler

COOLSERVER P32 CPU Cooler

For managing the thermal performance of our CPU, we use the COOLSERVER P32 CPU Cooler. It's equipped with high-quality heatpipes and can handle the Ryzen 5 5600G even under intensive workloads.

RAM

2x32GB Hynix DDR4 3200MHz

Our setup uses 2 modules of 32GB DDR4 RAM from Hynix, providing us with ample bandwidth and ensuring smooth server operations.

Motherboard

MSI A520M-ITX/ac

The MSI A520M-ITX/ac motherboard is an engineering marvel that brings together the performance of the consumer world with the dependability of server-grade hardware. This motherboard supports the AMD Ryzen series CPUs and DDR4 memory, promising speed, reliability, and scalability.

Storage

4x 2TB NVME Monster Storage 3D TLC SSD - R:7400Mb/s W:6,600MB/s

For storage, we use 4 Monster Storage 3D TLC NVMe SSDs, each of 2TB capacity. These high-speed SSDs are known for their exceptional performance and efficiency in data storage and retrieval. Downside is that lacks DRAM for caching.

Power unit

400W PSU

This second-hand case comes with an integrated 400W Power Supply Unit. The PSU is essential for providing power to your internal components. It converts the power from the wall outlet into a usable form for your computer's components. Despite being second-hand, the PSU is in good condition and will provide a reliable power source for your system.

Chassis

Mini-ITX 1U chassis

This second-hand case comes with an integrated 400W Power Supply Unit. The PSU is essential for providing power to your internal components. It converts the power from the wall outlet into a usable form for your computer's components. Despite being second-hand, the PSU is in good condition and will provide a reliable power source for your system.

KVM

BliKVM v1 CM4 "KVM over IP" Raspberry Pi CM4 HDMI CSI PiKVM v3

A modern, highly secure, and programmable KVM solution running on Arch Linux, which provides exceptional control over your server, akin to physical access. With an easy build process, it boasts minimal video latency (about 100 ms) and a lightweight Web UI accessible from any browser. It emulates mass storage drives and allows ATX power management, secure data transmission with SSL, and local Raspberry Pi health monitoring. You can also manage GPIO and USB relays via its web interface. The PiKVM OS is production-ready, supports a read-only filesystem to prevent memory card damage, offers extensible authorization methods, and enables automation with macros.

Features of PiKVM:

  • Fully-featured and modern IP-KVM: PiKVM is up-to-date with the latest KVM technologies.
  • Easy to build: PiKVM offers ready-to-use OS images and a friendly build environment.
  • Low video latency: With approximately 100 milliseconds of video latency, it provides one of the smallest delays of all existing solutions.
  • Lightweight Web UI and VNC: The user interface is accessible through any browser, with no proprietary clients required. VNC is also supported.
  • Mass Storage Drive Emulation: On Raspberry Pi 4 and ZeroW, PiKVM can emulate a virtual CD-ROM or Flash Drive. A live image can be uploaded to boot the attached server.
  • ATX power management: PiKVM supports simple circuits for controlling the power button of the attached server.
  • Security: PiKVM is designed with strong security, using SSL to protect traffic.
  • Local monitoring: PiKVM monitors the health of the Raspberry Pi board and provides warnings for potential issues.
  • GPIO management: Control GPIO and USB relays via the web interface.
  • Production-ready: PiKVM OS is based on Arch Linux ARM and can be customized for any needs.
  • Read-only filesystem: The OS runs in read-only mode to prevent damage to the memory card due to a sudden power outage.
  • Extensible authorization methods: PiKVM supports integration into existing authentication infrastructure.
  • Macro scripts: Repetitive actions can be automated with keyboard & mouse action macros.
  • Open & free: PiKVM is open-source software, released under the GPLv3.

BKK03 - Bootnode/RPC

BKK03

CPU: AMD Ryzen™ 9 7950X3D 16-Core 32-Thread 5NM

The AMD Ryzen 9 7950X3D, with its 16-core 32-thread architecture, is the driving force behind our server's high performance. The CPU's multi-core design and high clock speeds are specifically optimized for blockchain applications, ensuring efficient chain synchronization and reliable endpoint service.

Featuring advanced technologies like PCI Express® 5.0 and DDR5, the CPU provides rapid data transfer, essential for low-latency blockchain transactions. The large 128MB L3 cache further bolsters performance by facilitating quick access to frequently used data, enhancing efficiency.

CPU Cooling System

BKK03

COOLSERVER P32 CPU Cooler

Thermalright aluminium alloy AM5 frame

Cooling efficiency is paramount in maintaining stable performance. Our server utilizes the COOLSERVER P32 AM5 Server CPU Cooler, in conjunction with the Thermalright AM5 frame, to maximize cooling capabilities.

Motherboard: AsRock Rack B650D4U-2L2T/BCM(LGA 1718) Dual 10G LAN

This Micro-ATX motherboard stands as a testament to AsRock's engineering prowess, blending high-performance consumer technology with the robustness of server-grade hardware. The board offers full PCIe 5.0 support and features up to 7 M.2 slots for NVMe storage, enhancing data transfer speeds. Its compatibility with DDR5 ECC UDIMM memory further underlines its suitability for demanding server applications.

Memory: 4x 32GB MICRON DDR5 UDIMM/ECC 4800MHz

Our selection of server-grade DDR5 memory modules provides substantial bandwidth for smooth server operations. Equipped with ECC technology, these modules maintain data integrity, ensuring the reliability of our transactions.

SSD Expansion: NVMe PCIe RAID Adapter 4 Ports NVME SSD to PCI-E 4.0 X16

This expansion card plays a vital role in our data management strategy by enabling the integration of top-tier NVMe SSDs. It contributes to our server's responsiveness by facilitating faster access to stored data.

Storage: 5x 2TB Hanye ME70 NVMe PCI-E4.0 7200mb/s

Our system's storage is equipped with 2TB High-Performance ME70 M.2 NVMe SSDs, providing 12TB of high-speed storage. The SSDs' Gen4 PCIe tech and LDPC error correction ensure quick data access and integrity.

Benchmarks

2023-07-31 13:55:08 Running machine benchmarks...
2023-07-31 13:55:34
+----------+----------------+-------------+-------------+-------------------+
| Category | Function       | Score       | Minimum     | Result            |
+===========================================================================+
| CPU      | BLAKE2-256     | 1.56 GiBs   | 783.27 MiBs | ✅ Pass (203.8 %) |
|----------+----------------+-------------+-------------+-------------------|
| CPU      | SR25519-Verify | 788.10 KiBs | 560.67 KiBs | ✅ Pass (140.6 %) |
|----------+----------------+-------------+-------------+-------------------|
| Memory   | Copy           | 27.73 GiBs  | 11.49 GiBs  | ✅ Pass (241.3 %) |
|----------+----------------+-------------+-------------+-------------------|
| Disk     | Seq Write      | 2.99 GiBs   | 950.00 MiBs | ✅ Pass (322.0 %) |
|----------+----------------+-------------+-------------+-------------------|
| Disk     | Rnd Write      | 1.29 GiBs   | 420.00 MiBs | ✅ Pass (313.4 %) |
+----------+----------------+-------------+-------------+-------------------+
From 5 benchmarks in total, 5 passed and 0 failed (10% fault tolerance).
2023-07-31 13:55:34 The hardware meets the requirements
Read Latency Statistics in nanoseconds:
-------------------------
Minimum: 520 ns
Maximum: 22540 ns
Mean: 914.595734 ns
Standard Deviation: 222.087316 ns
Read IOPS: 953140.861971

Chassis: 1U Case, TGC H1-400

Power Supply Unit: 400W Compuware 80 PLUS Platinum PSU

KVM: Asrock Rack BCM/IPMI

The AsRock Rack motherboard includes a BCM for remote control, offering robust management capabilities.

BKK04 - Bootnode/RPC

BKK04

CPU

Model: AMD Ryzen 9 7950X R9 7950X CPU Processor 16-Core 32-Thread 5NM L3=64M Socket AM5

  • Core Count: 16 cores
  • Technology: 5NM process
  • L3 Cache: 64MB
  • PCI Express: 5.0
  • Memory Support: DDR5

Capabilities: Designed to manage multiple tasks with ease, such as running multiple networks simultaneously, efficient blockchain sync, and low latency transactions.

CPU Cooler

Motherboard

Motherboard as new Model: AsRock Rack B650D4U(LGA 1718)

  • Form Factor: Micro-ATX
  • Memory Support: DDR5 ECC UDIMM
  • PCIe Slots: Full PCIe 5.0 support, M.2 slot, x16 slot, and x4 slot
  • Storage Support: Up to 7 M.2 slots

Memory

Model: 4x Server Memory Module|MICRON|DDR5|32GB|UDIMM/ECC|4800MHz|CL 40|1.1V|MTC20C2085S1EC48BA1R

  • Capacity: 4 modules of 32GB DDR5 each
  • Technology: ECC for increased data integrity
  • Performance: Low-latency

SSD Expansion Cards

Extension card

  • Model: NVMe SSD Expansion Card NVMe PCIe RAID Adapter 4 Ports NVME SSD To PCI-E 4.0 X16 Expansion Card

Storage

  • Primary Storage: 6x 2TB Samsung SSD Pro 980
  • Backup Storage: 3x 16TB disks in ZFS RAIDZ (32TB total)

Benchmarks

``+----------+----------------+-------------+-------------+-------------------+
| Category | Function       | Score       | Minimum     | Result            |
+===========================================================================+
| CPU      | BLAKE2-256     | 1.65 GiBs   | 783.27 MiBs | ✅ Pass (215.8 %) |
|----------+----------------+-------------+-------------+-------------------|
| CPU      | SR25519-Verify | 832.82 KiBs | 560.67 KiBs | ✅ Pass (148.5 %) |
|----------+----------------+-------------+-------------+-------------------|
| Memory   | Copy           | 16.99 GiBs  | 11.49 GiBs  | ✅ Pass (147.9 %) |
|----------+----------------+-------------+-------------+-------------------|
| Disk     | Seq Write      | 2.09 GiBs   | 950.00 MiBs | ✅ Pass (225.3 %) |
|----------+----------------+-------------+-------------+-------------------|
| Disk     | Rnd Write      | 885.35 MiBs | 420.00 MiBs | ✅ Pass (210.8 %) |
+----------+----------------+-------------+-------------+-------------------+
From 5 benchmarks in total, 5 passed and 0 failed (10% fault tolerance).
2023-08-03 00:49:00 The hardware meets the requirements
Read Latency Statistics in nanoseconds:
-------------------------
Minimum: 460 ns
Maximum: 535014 ns
Mean: 968.885148 ns
Standard Deviation: 280.737214 ns
99.99th Percentile Read Latency: 350 ns
-------------------------
Read IOPS: 906996.500117

The read latency meets the 2000 ns and lower QoS requirement

Chassis

  • Model: TGC-24550 2U
  • Design: 2U rackmount, efficient airflow design

Power Supply Unit

  • Model: T.F.SKYWINDINTL 1U MINI Flex ATX Power Supply Unit 400W Modular PSU
  • Capacity: 400W
  • Features: Built-in cooling fan, overcurrent, overvoltage, and short-circuit protection

KVM

  • Model: Asrock Rack BCM/IPMI for remote control

Summary

The BKK04 Bootnode leverages a combination of cutting-edge components to deliver high performance, reliability, and scalability. From the AMD Ryzen 9 7950X processor with 32 threads high core clock cycles to the efficient memory and robust storage solutions, every part of this server is designed to handle demanding server applications, particularly those related to blockchain processing. The use of advanced cooling and power supply units ensures long-term sustainability and stable operation. All components have been chosen as well to be most energy efficient solutions that market can currently provide.

BKK05: Pioneering RISC-V Debian Webhosting

Introduction

Welcome to the frontier of web hosting with our BKK05 server, a trailblazing platform powered by the RISC-V Debian. This server represents not just a shift in technology but an experimental leap into the future of open-source computing.

BKK05

Technical Specifications

ComponentSpecifications
ProcessorStarFive JH7100 64bit SoC with RV64GC, up to 1.5GHz
MemoryLPDDR4, Configurable up to 8GB
Storage2TB Samsung 980 NVMe
NetworkingDual RJ45 Gigabit Ethernet
ExpansionM.2 M-Key for NVMe SSDs
USB Ports2x USB 2.0 + 2x USB 3.0
Video OutHDMI 2.0, supporting 4K resolution

Performance

The BKK05 server is equipped with a VisionFive 2 SBC at its core, featuring a StarFive JH7110 SoC. With 8GB of LPDDR4 RAM and a 2TB Samsung 980 NVMe drive, it's designed to handle web hosting and experimental server tasks with ease. Its RISC-V architecture ensures an open and versatile computing experience.

RISC-V Debian: The New Era

RISC-V brings a breath of fresh air to the server landscape, offering an open ISA (Instruction Set Architecture) that fosters innovation. Debian's adoption of RISC-V for our BKK05 server underpins our commitment to pioneering technology and community-driven development.

Web Hosting Capabilities

BKK05 runs Debian 12 (Bookworm), optimized for the RISC-V architecture. The server's configuration, which includes a robust 2TB NVMe drive, is particularly suited for web hosting, offering rapid data retrieval and ample storage for web applications.

Experimental Projects

The open nature of RISC-V and Debian makes BKK05 the perfect candidate for experimental projects. Its platform is ideal for developers looking to explore the capabilities of RISC-V architecture and contribute to the growth of the ecosystem.

Conclusion

The BKK05 server is a testament to our commitment to embracing open and innovative technologies. By leveraging the power of RISC-V Debian, we provide a stable and forward-thinking web hosting service while also contributing to an exciting new chapter in computing.

BKK06 - High-Performance Storage/Compute Node

BKK06

CPU

Model: AMD EPYC™ 7713

  • Core Count: 64 cores
  • Threads: 128
  • Max. Boost Clock: Up to 3.675GHz
  • Base Clock: 2.0GHz
  • L3 Cache: 256MB
  • TDP: 225W

Capabilities: Optimized for high-throughput storage operations and parallel processing workloads. The EPYC 7713 provides exceptional multi-threaded performance for distributed storage and compute tasks.

Motherboard

Model: Supermicro H11SSL-i

  • Chipset: System on Chip
  • Form Factor: E-ATX
  • Memory Slots: 8 x DIMM slots supporting DDR4
  • PCIe Slots: Multiple PCIe 3.0 slots

Memory

Model: 8x Micron 32GB ECC Registered DDR4 3200

  • Total Capacity: 256GB
  • Technology: ECC Registered for data integrity
  • Speed: 3200MHz

Storage Configuration

Pool: bkk06pool
State: ONLINE (with errors requiring attention)
Configuration: 6 mirror vdevs (12x Samsung 990 PRO 4TB NVMe)
Status: 16 data errors detected during scrub
Recent: Removal of vdev 5 completed, 93.6M memory used for mappings

Networking

Primary Interfaces:

  • 2x Intel E810-C 25G QSFP ports (100G capable)
  • 2x Broadcom BCM5720 1G management ports

Anycast Configuration:

  • Participates in 100G anycast routing via BGP
  • Route reflector client with dedicated ASN
  • Q-in-Q VLAN tagging for service isolation
  • Bond interfaces: bond-bkk00 and bond-bkk20 for redundancy

Active Services (28 containers)

Primary RPC nodes for:

  • Polkadot relay chain and system parachains
  • Kusama relay chain and system parachains
  • Multiple Polkadot parachains (Moonbeam, Acala, Hydration, Bifrost, etc.)
  • Paseo testnet infrastructure
  • Penumbra node
  • HAProxy load balancer

Chassis and Power

  • Chassis: 2U Supermicro server chassis
  • PSU: Greatwall Dual PSU 2U 1+1 CRPS redundant 800W

BKK07 - High-Performance Storage/Compute Node

BKK07

CPU

Model: AMD EPYC™ 9654

  • Core Count: 96 cores
  • Threads: 192
  • Max. Boost Clock: Up to 3.7GHz
  • Base Clock: 2.4GHz
  • L3 Cache: 384MB
  • TDP: 360W (Configurable 320-400W)

Capabilities: Cutting-edge Genoa generation processor delivering exceptional performance for high-concurrency storage operations and compute-intensive workloads.

Motherboard

Model: Supermicro H13SSL-N

  • Chipset: System on Chip
  • Form Factor: ATX
  • Memory Slots: 12 x DIMM slots supporting DDR5

Memory

Model: 6x SuperMicro 64GB ECC Registered DDR5 4800

  • Current Capacity: 384GB (6 modules)
  • Technology: DDR5 ECC Registered
  • Speed: 4800MHz
  • Upgrade Path: Expandable to 12 modules for full bandwidth

Storage Configuration

Pool: bkk07pool
State: ONLINE
Configuration: 1x 12-disk RAIDZ2
- 12x Samsung 990 PRO 4TB NVMe
Redundancy: Can tolerate 2 simultaneous disk failures
Capacity: ~32TB usable (10 data + 2 parity)
Last scrub: 0 errors (July 13, 2025)

Networking

Primary Interfaces:

  • 2x Intel E810-C 25G QSFP ports (100G capable)
  • 2x Broadcom BCM5720 1G management ports

Anycast Configuration:

  • Participates in 100G anycast routing via BGP
  • Route reflector client with dedicated ASN
  • VLAN configuration: 100p1.400 for backbone connectivity
  • Dual bond interfaces for active-backup failover

Active Services (24 containers)

Secondary RPC nodes for:

  • Polkadot relay chain and all system parachains
  • Kusama relay chain and all system parachains
  • Paseo testnet (relay and system chains)
  • Penumbra node
  • IBP infrastructure
  • HAProxy load balancer
  • Build and Docker services

Board Management Controller (BMC)

  • BMC Model: Aspeed AST2600
  • Capabilities: Full remote management with KVM over IP

Chassis and Power

  • Chassis: Ultra Short 2U rackmount chassis
  • PSU: Greatwall Dual PSU 2U 1+1 CRPS redundant 800W

BKK08 - High-Performance Storage/Compute Node

BKK08

CPU

Model: AMD EPYC™ 7742

  • Core Count: 64 cores
  • Threads: 128
  • Max. Boost Clock: Up to 3.4GHz
  • Base Clock: 2.25GHz
  • L3 Cache: 256MB
  • TDP: 225W

Capabilities: Optimized for parallel processing and storage workloads, delivering excellent multi-threaded performance for distributed systems.

Motherboard

Model: Supermicro H11SSL-series

  • Chipset: System on Chip
  • Form Factor: E-ATX
  • Memory Slots: 8 x DIMM slots supporting DDR4

Memory

Configuration: 8x 32GB DDR4 ECC Registered

  • Total Capacity: 256GB
  • Speed: 3200MHz
  • Technology: ECC for data integrity

Storage Configuration

Pool: bkk08pool State: ONLINE Configuration: 2x 4-disk RAIDZ1 vdevs

8x Samsung 990 PRO 4TB NVMe total Redundancy: Can tolerate 1 disk failure per vdev Capacity: ~24TB usable (6 data + 2 parity) Last scrub: 0 errors (July 13, 2025)

Networking

Primary Interfaces:

  • 2x Intel E810-C 25G QSFP ports (100G capable)
  • 2x Broadcom BCM5720 1G management ports

Anycast Configuration:

  • Participates in 100G anycast routing via BGP
  • Route reflector client with dedicated ASN
  • Redundant path configuration for high availability
  • Q-in-Q VLAN tagging for traffic isolation

Active Services (24 containers)

Tertiary/specialized nodes for:

  • Polkadot RPC and parachains (Moonbeam, Nexus, Acala, Kilt, Hydration, Bifrost, Ajuna, Polimec, Unique, XCavate, Mythos)
  • Paseo validator and RPC nodes
  • Penumbra node
  • IBP infrastructure
  • IX peering management
  • HAProxy load balancer
  • Build services and logging infrastructure

Board Management Controller (BMC)

  • BMC Model: Aspeed AST2500
  • Capabilities: Remote management with IPMI/KVM over IP

Chassis and Power

  • Chassis: Ultra Short 2U rackmount chassis
  • PSU: Greatwall Dual PSU 2U 1+1 CRPS redundant 800W

Network Architecture Overview

All three nodes (BKK06, BKK07, BKK08) participate in our advanced networking infrastructure:

Anycast Implementation

  • 100G Backbone: High-speed interconnects between storage nodes
  • BGP Routing: Each node operates as a route reflector client
  • Dedicated ASN: Independent autonomous system number per edge node
  • Service Isolation: Q-in-Q VLAN tagging for traffic segregation

Redundancy Features

  • Dual Uplinks: Each node has redundant connections to core routers
  • Active-Backup Bonds: Sub-second failover between interfaces
  • Path Diversity: Multiple routing paths for fault tolerance
  • VLAN Segmentation: Separate VLANs for management, storage, and service traffic

Performance Optimization

  • Hardware Offload: Intel E810-C NICs provide advanced packet processing
  • NUMA Awareness: Memory and network affinity for optimal performance
  • Jumbo Frames: MTU 9000 enabled on storage network
  • Flow Control: Configured for lossless Ethernet on storage paths

BKK09

BKK11

BKK12

BKK13

Software Infrastructure

Our infrastructure leverages several powerful technologies and platforms to provide a robust and efficient environment for our operations.

Debian

Our servers run on Debian, a highly stable and reliable Linux-based operating system. Debian provides a strong foundation for our operations, with its wide array of packages, excellent package management system, and strong community support. Its stability and robustness make it an excellent choice for our server environments.

Proxmox Virtual Environment

We utilize Proxmox, an open-source server virtualization management solution. Proxmox allows us to manage virtual machines, containers, storage, virtualized networks, and HA clustering from a single, integrated platform. This is crucial in ensuring we have maximum control and efficiency in managing our various server processes. We utilize linux 6.1 lts pve kernel.

LXC (Linux Containers)

We leverage LXC (Linux Containers) to run multiple isolated Linux systems (containers) on a single host. This containerization technology provides us with lightweight, secure, and performant alternatives to full machine virtualization.

ZFS

ZFS, the Zettabyte File System, is an advanced filesystem and logical volume manager. It was designed to overcome many of the major issues found in previous designs and is used for storing data in our Proxmox environment. It provides robust data protection, supporting high storage capacities and efficient data compression, and allows us to create snapshots and clones of our filesystem.

Ansible

We use Ansible for automation of our system configuration and management tasks. Ansible enables us to define and deploy consistent configurations across multiple servers, and automate routine maintenance tasks, thus increasing efficiency and reducing the risk of errors.

MikroTik RouterOS

Our network infrastructure relies on MikroTik RouterOS, a robust network operating system. This system offers a variety of features such as routing, firewall, bandwidth management, wireless access point, backhaul link, hotspot gateway, VPN server, and more. This helps us ensure secure, efficient, and reliable network operations.

All these technologies are intertwined, working together to support our operations. They are chosen not just for their individual capabilities, but also for their compatibility and interoperability, creating an integrated, efficient, and reliable software infrastructure.

Network Configuration Documentation

Overview

This document outlines the BGP routing configuration for our multi-homed network with connections to various internet exchanges and transit providers in Bangkok, Hong Kong, Singapore, and Europe. Currently 3x 10G fibers are used for physical uplinks.

Bandwidth

Scaling bandwidth

Network Topology Diagram

graph TD
    BKK50((BKK50 Gateway Router<br>CCR2004-16G-2S+<br>ECMP with 10G connections))
    BKK50 --> |10G| BKK20
    BKK50 --> |10G| BKK10

    subgraph BKK20[BKK20 Edge Router<br>CCR2216-1G-12XS-2XQ]
        B20_AMSIX[AMSIX-LAG<br>10G Physical Port]
    end

    subgraph BKK10[BKK10 Edge Router<br>CCR2116-12G-4S+]
        B10_AMSIX[AMSIX-LAG<br>10G Physical Port]
        B10_BKNIX[BKNIX-LAG<br>10G Physical Port]
    end

    B20_AMSIX --> |VLAN 911<br>1G| AMS_IX_BKK[AMS-IX Bangkok]
    B20_AMSIX --> |VLAN 3994<br>200M| AMS_IX_HK[AMS-IX Hong Kong]
    B20_AMSIX ==> |VLAN 2520<br>500M<br>Active| IPTX_SG[IPTX Singapore]
    B20_AMSIX -.-> |VLAN 2517<br>500M<br>Passive| IPTX_HK[IPTX Hong Kong]

    B10_AMSIX ==> |VLAN 2519<br>500M<br>Active| IPTX_HK
    B10_AMSIX -.-> |VLAN 2518<br>500M<br>Passive| IPTX_SG
    B10_AMSIX --> |VLAN 3995<br>100M| AMS_IX_EU[AMS-IX Europe]
    B10_BKNIX --> |10G| BKNIX[BKNIX]

    AMS_IX_BKK --> INTERNET((Internet))
    AMS_IX_HK --> INTERNET
    AMS_IX_EU --> INTERNET
    IPTX_SG --> INTERNET
    IPTX_HK --> INTERNET
    BKNIX --> INTERNET

    classDef router fill:#1a5f7a,color:#ffffff,stroke:#333,stroke-width:2px;
    classDef ix fill:#4d3e3e,color:#ffffff,stroke:#333,stroke-width:2px;
    classDef internet fill:#0077be,color:#ffffff,stroke:#333,stroke-width:2px;
    classDef active stroke:#00ff00,stroke-width:4px;
    classDef passive stroke:#ff0000,stroke-dasharray: 5 5;

    class BKK50,INTERNET internet;
    class BKK20,BKK10 router;
    class AMS_IX_BKK,AMS_IX_HK,AMS_IX_EU,IPTX_SG,IPTX_HK,BKNIX ix;

    linkStyle default stroke:#ffffff,stroke-width:2px;

Routing Configuration

NameSpeedPath PrependMEDLocal PrefExternal CommunitiesDescriptionRouter
BKNIX10G010200142108:1:764Local IXBKK10
AMS-IX Bangkok1G020190142108:1:764Local IXBKK20
IPTX Singapore-BKK10500M150170142108:2:35Regional TransitBKK10
IPTX Singapore-BKK20500M150160142108:2:35Regional TransitBKK20
AMS-IX Hong Kong200M2100150142108:2:142Regional IXBKK20
IPTX HK-BKK20500M2100130142108:2:142Regional TransitBKK20
IPTX HK-BKK10500M2100120142108:2:142Regional TransitBKK10
AMS-IX EU100M3200100142108:2:142Remote IXBKK10

This way:

  1. We keep our internal communities for our own routing decisions
  2. We share geographic information allowing peers to optimize their routing
  3. Peers can make informed decisions about traffic paths to our network

Would this be a useful approach?

Traffic Engineering Principles

  1. Local Preference: Higher values indicate more preferred routes. Local routes are preferred over regional, which are preferred over remote routes.
  2. MED (Multi-Exit Discriminator): Lower values are preferred. Used to influence inbound traffic when other attributes are equal.
  3. AS Path Prepending: Increases AS path length to make a route less preferred. Used for coarse control of inbound traffic.

HGC Circuit Configuration for ROTKO NETWORKS OÜ

The following circuits terminate at STT Bangkok 1 Data Center, MMR 3A facility in Bangkok, Thailand. All circuits are delivered to ROTKO NETWORKS OÜ infrastructure.

Primary Transit Circuits

Circuit PP9094729 (OTT00003347)

  • Service: 400M (Burst to 800M) Thailand IPTx FOB Hong Kong
  • Delivery: Customer Headend #2 (PEH8001159)
  • VLAN: 2519
  • Status: Hot upgrade + renewal

Circuit PP9094730 (OTT00003348)

  • Service: 400M (Burst to 800M) Thailand IPTx FOB Singapore
  • Delivery: Customer Headend #1 (PEH8001158)
  • VLAN: 2520
  • Status: Hot upgrade + renewal

Backup Circuits

Circuit PP9094735 (OTT00003349)

  • Service: 800M Leased Line Internet Service FOB Thailand (Backup)
  • Delivery: Customer Headend #1 (PEH8001158)
  • VLAN: 2517
  • Status: Hot upgrade + renewal

Circuit PP9094736 (OTT00003350)

  • Service: 800M Leased Line Internet Service FOB Thailand (Backup)
  • Delivery: Customer Headend #2 (PEH8001159)
  • VLAN: 2518
  • Status: Hot upgrade + renewal

Configuration Validation

BKK00: HK-HGC-IPTx-vlan2519 (Primary HK - PEH8001159)
BKK00: SG-HGC-IPTx-backup-vlan2518 (Backup SG - PEH8001159)
BKK20: SG-HGC-IPTx-vlan2520 (Primary SG - PEH8001158)  
BKK20: HK-HGC-IPTx-backup-vlan2517 (Backup HK - PEH8001158)

IBP Rank 6 Network Infrastructure Documentation

ROTKO NETWORKS OÜ (AS142108)

Executive Summary

ROTKO NETWORKS operates a highly redundant, carrier-grade network infrastructure in Bangkok, Thailand, designed to meet and exceed IBP Rank 6 requirements. Our infrastructure features full redundancy at every layer - from multiple 10G/100G transit connections to redundant route reflectors, hypervisors, and storage systems.

Network Architecture Overview

Core Routing Infrastructure

Our network utilizes a hierarchical BGP architecture with dedicated route reflectors providing redundancy and scalability:

Route Reflectors (Core Routers)

  • BKK00 (CCR2216-1G-12XS-2XQ) - Primary Route Reflector

    • Router ID: 10.155.255.4
    • 100G connections to edge routers
    • Multiple 10G transit/IX connections
  • BKK20 (CCR2216-1G-12XS-2XQ) - Secondary Route Reflector

    • Router ID: 10.155.255.2
    • 100G connections to edge routers
    • Diverse 10G transit/IX connections

Both route reflectors run iBGP with full mesh connectivity and serve as aggregation points for external BGP sessions.

Layer 2 Switching Infrastructure

Core Switches

  • BKK30 (CRS504-4XQ) - 4x100G QSFP28 switch
  • BKK40 (CRS504-4XQ) - 4x100G QSFP28 switch
  • BKK10 (CCR2216-1G-12XS-4XQ) - 4xSFP+ 25G 12x 1G (fallback for 100G)
  • BKK60 (CRS354-48G-4S+2Q+) - 48x1G + 4xSFP+ + 2xQSFP+ switch (reserved)
  • BKK50 (CCR2004-16G-2S+) - Management network

These switches provide Q-in-Q VLAN tagging for service isolation and traffic segregation.

Multi-Layer Redundancy Architecture

1. Physical Layer Redundancy

Diverse Fiber Paths

  • 3x Independent 10G fiber uplinks from different carriers
  • Physically diverse cable routes to prevent simultaneous cuts
  • Multiple cross-connects within the data center
  • Dual power feeds to all critical equipment

Hardware Redundancy

  • Dual PSU on all servers and network equipment
  • Hot-swappable components (fans, PSUs, drives)
  • N+1 cooling in server chassis
  • Spare equipment pre-racked for rapid replacement

2. Network Layer Redundancy

Transit Diversity

Our multi-homed architecture eliminates single points of failure:

Primary Transit - HGC (AS9304)

Hong Kong Path:
├── Active: BKK00 → VLAN 2519 → 400M (burst 800M)
└── Backup: BKK20 → VLAN 2517 → 800M standby

Singapore Path:
├── Active: BKK20 → VLAN 2520 → 400M (burst 800M)
└── Backup: BKK00 → VLAN 2518 → 800M standby

Internet Exchange Diversity

Local (Thailand):
├── BKNIX: 10G direct peering (200+ networks)
└── AMS-IX Bangkok: 1G peering (100+ networks)

Regional:
├── AMS-IX Hong Kong: 200M connection
└── HGC IPTx: Dual paths to HK/SG

Global:
└── AMS-IX Europe: 100M connection

BGP Redundancy Features

  • Dual Route Reflectors: Automatic failover with iBGP
  • BFD Detection: 100ms failure detection
  • Graceful Restart: Maintains forwarding during control plane restart
  • Path Diversity: Multiple valid routes to every destination
  • ECMP Load Balancing: Traffic distributed across equal-cost paths

3. Hypervisor Layer Redundancy

Proxmox Cluster Features

High Availability Configuration:
├── Live Migration: Zero-downtime VM movement
├── HA Manager: Automatic VM restart on node failure
├── Shared Storage: Distributed access to VM data
└── Fencing: Ensures failed nodes are isolated

Per-Hypervisor Network Redundancy

Each hypervisor (BKK06, BKK07, BKK08) implements:

Dual Uplink Configuration:
├── bond-bkk00 (Active Path)
│   ├── Primary: vlan1X7 @ 100G
│   └── Backup: vlan1X7 @ 100G (different switch)
└── bond-bkk20 (Standby Path)
    ├── Primary: vlan2X7 @ 100G
    └── Backup: vlan2X7 @ 100G (different switch)

Failover Time: <100ms using active-backup bonding

4. Storage Layer Redundancy

ZFS Resilience by Node

BKK06 - Maximum Redundancy (Mirror)

Configuration: 6x mirror vdevs (2-way mirrors)
Fault Tolerance: 
- Can lose 1 disk per mirror (up to 6 disks total)
- Instant read performance during rebuild
- 50% storage efficiency for maximum protection
Recovery: Hot spare activation < 1 minute

BKK07 - Balanced Redundancy (RAIDZ2)

Configuration: 12-disk RAIDZ2
Fault Tolerance:
- Can lose any 2 disks simultaneously
- Maintains full operation during rebuild
- 83% storage efficiency
Recovery: Distributed parity reconstruction

BKK08 - Performance Redundancy (RAIDZ1)

Configuration: 2x 4-disk RAIDZ1 vdevs
Fault Tolerance:
- Can lose 1 disk per vdev (2 disks total)
- Parallel vdev operation for performance
- 75% storage efficiency
Recovery: Per-vdev independent rebuild

5. Service Layer Redundancy

Container Distribution Strategy

Service Deployment:
├── Primary Instance: BKK06
├── Secondary Instance: BKK07
├── Tertiary/Specialized: BKK08
└── Load Balancing: HAProxy on each node

Failover Mechanism:
- Health checks every 2 seconds
- Automatic traffic rerouting
- Session persistence via cookies
- <5 second service failover

Anycast Services

Global Anycast: 160.22.180.180/32
├── Announced from all locations
├── BGP-based geographic routing
└── Automatic closest-node selection

Local Anycast: 160.22.181.81/32
├── Thailand-specific services
├── Lower latency for regional users
└── Fallback to global on failure

6. Failure Scenarios & Recovery

Single Component Failures (No Impact)

  • 1 Transit Link Down: Traffic reroutes via alternate transit
  • 1 Route Reflector Down: Secondary handles all traffic
  • 1 Hypervisor Down: VMs migrate or run from replicas
  • 1 Disk Failure: ZFS continues with degraded redundancy
  • 1 PSU Failure: Secondary PSU maintains operation

Multiple Component Failures (Degraded but Operational)

  • Both HK Links Down: Singapore paths maintain connectivity
  • Entire Hypervisor Failure: Services run from remaining 2 nodes
  • Multiple Disk Failures: ZFS tolerates per design limits
  • Primary + Backup Network Path: Tertiary paths available

Disaster Recovery

  • Complete DC Failure:
    • Off-site backups available
    • DNS failover to alternate regions
    • Recovery Time Objective (RTO): 4 hours
    • Recovery Point Objective (RPO): 1 hour

Redundancy Validation & Testing

Automated Testing

  • BGP Session Monitoring: Real-time alerting on session drops
  • Path Validation: Continuous reachability tests
  • Storage Health: ZFS scrubs weekly, SMART monitoring
  • Service Health: Prometheus + Grafana dashboards

Manual Testing Schedule

  • Monthly: Controlled failover testing
  • Quarterly: Full redundancy validation
  • Annually: Disaster recovery drill

Compliance with IBP Rank 6 Requirements

✓ No Single Point of Failure

Every critical component has at least one backup:

  • Dual route reflectors with automatic failover
  • Multiple transit providers and exchange points
  • Redundant power, cooling, and network paths
  • Distributed storage with multi-disk fault tolerance

✓ Sub-Second Network Convergence

  • BFD: 100ms detection + 200ms convergence = 300ms total
  • Bond failover: <100ms for layer 2 switchover
  • BGP: Graceful restart maintains forwarding plane

✓ Geographic & Provider Diversity

  • Transit via Hong Kong and Singapore
  • Peering in Bangkok, Hong Kong, and Amsterdam
  • Multiple submarine cable systems
  • Carrier-neutral facility

✓ Automated Recovery

  • HA cluster manages VM availability
  • BGP automatically selects best paths
  • ZFS self-healing with checksum validation
  • Container orchestration via systemd

This comprehensive redundancy architecture ensures 99.95% uptime SLA compliance and exceeds all IBP Rank 6 requirements for infrastructure resilience.

Firmware Updates

Samsung 980Pro NVMe

wget https://semiconductor.samsung.com/resources/software-resources/Samsung_SSD_980_PRO_5B2QGXA7.iso
apt-get -y install gzip unzip wget cpio
mkdir /mnt/iso
sudo mount -o loop ./Samsung_SSD_980_PRO_5B2QGXA7.iso /mnt/iso/
mkdir /tmp/fwupdate
cd /tmp/fwupdate
gzip -dc /mnt/iso/initrd | cpio -idv --no-absolute-filenames
cd root/fumagician/
sudo ./fumagician

This .iso is for 980 Pro, if you have different model replace ISO with link on https://semiconductor.samsung.com/consumer-storage/support/tools/

ASRock Rack Motherboard

This guide outlines the steps to update the firmware on your ASRock Rack motherboard. The update includes the BIOS, BMC (Baseboard Management Controller), and the networking firmware.

Before starting, download the following files:

BIOS Update

ASRock Rack provides a BIOS flash utility called ASRock Rack Instant Flash, embedded in the Flash ROM, to make the BIOS update process simple and straightforward.

  1. Preparation: Format a USB flash drive with FAT32/16/12 file system and save the new BIOS file to your USB flash drive.

  2. Access ASRock Rack Instant Flash: Restart the server and press the <F6> key during the POST or the <F2> key to enter the BIOS setup menu. From there, access the ASRock Rack Instant Flash utility.

  3. Update BIOS: Follow the instructions provided by the utility to update the BIOS.

BMC Firmware Update

In order to keep your BMC firmware up-to-date and have the latest features and improvements, regular updates are recommended. This guide provides step-by-step instructions on how to update your BMC firmware.

  1. Preparation: Download the correct BMC firmware update file from the ASRock Rack website. Ensure the firmware version is later than the one currently installed on your device. Save the firmware file on your local system.

  2. Access BMC Maintenance Portal: Open your web browser and navigate to the BMC maintenance portal using the IP address of the BMC. Typically, the URL is https://[BMC IP Address]/#maintenance/firmware_update_wizard, for instance, https://192.168.33.114/#maintenance/firmware_update_wizard.

  3. Login: Use your BMC username and password to log into the portal.

  4. Firmware Update Section: Navigate to the firmware update section.

  5. Upload Firmware Image: Click on "Select Firmware Image" and upload the firmware file you downloaded earlier. The firmware files typically end with .ima. For instance, B650D4U_2L2T_4.01.00.ima.

  6. Preserve Configuration: If you want to preserve all the current configurations during the update, check the box "Preserve all Configuration". This will maintain all the settings irrespective of the individual items marked as preserve/overwrite in the table below.

  7. Start Update: Click "Firmware Update". The system will validate the image and if successful, the update process will start. The progress will be shown on the screen.

  8. Reboot: Once the update is completed, the system will reboot automatically.

WARNING: Please note that after entering the update mode, other web pages, widgets, and services will not work. All the open widgets will be automatically closed. If the update is cancelled in the middle of the process, the device will be reset only for BMC BOOT, and APP components of Firmware.

NOTE: The IP address used in this guide is an example. Replace it with the actual IP address of your BMC. Also, remember to use a reliable network connection during the update process to prevent any interruption.

Networking Firmware Update

  1. Preparation: Format a USB flash drive and copy the "bnxtmt-lite-uefi-221.0.117.0-x64" folder from the downloaded Broadcom 57416 LAN Flashing Firmware onto it.

  2. Access UEFI Shell: Insert the USB flash drive into your ASRock Rack server board, reboot the server, and hit F11 at the POST screen. Select "UEFI: Built-in EFI Shell". The USB flash drive should be named FS0:. Type FS0: and press enter.

    UEFI Shell

  3. Verify Current Firmware Version: Execute bnxtmt.efi to display the current firmware version.

    bnxtmt efi execution

  4. Erase the Flash: Navigate to the "bnxtmt-lite-uefi-221.0.117.0-x64" folder and execute bnxtmt.efi -sysop -none -no_swap –blank to erase the current firmware.

    Erase Flash

  5. Flash the Firmware: Execute bnxtmt.efi -sysop -none -no_swap -fnvm ASRR57416-2T.PKG -log FW1.log to flash the new firmware. Here, ASRR57416-2T.PKG is the firmware file.

    Flash Firmware

  6. Verify Updated Firmware Version: Run bnxtmt.efi again to verify the new firmware version.

    bnxtmt efi execution

  7. Flash LAN Ports' MAC Addresses: Execute bnxtmt.efi -none –m -log MAC1.log.

    Flash MAC

  8. Input MAC Addresses: Enter the MAC addresses of both LAN ports when prompted. Write down these addresses beforehand.

    Enter MAC Addresses

  9. Power Cycle: Turn off the system, power cycle the PSU, and then power everything back on.

Note: The MAC addresses for your LAN ports are crucial. Write them down before starting the update process as they need to be added during the command at step 8.

Proxmox Network Configuration Guide

Overview

Networking in Proxmox is managed through the Debian network interface configuration file at /etc/network/interfaces. This guide will walk you through the process of configuring the network interfaces and creating a Linux bridge for your Proxmox server.

Pre-requisites:

Before we begin, you should have:

  1. A Proxmox VE installed and configured on your server.
  2. Administrative or root access to your Proxmox VE server.

Step 1: Understand Proxmox Network Configuration Files

Proxmox network settings are mainly configured in two files:

  • /etc/network/interfaces: This file describes the network interfaces available on your system and how to activate them. This file is critical for setting up bridged networking or configuring network interfaces manually.
  • /etc/hosts: This file contains IP address to hostname mappings.

Step 2: Configure Primary Network Interface

First, open the network interfaces configuration file for editing: Set your primary network interface (e.g., enp9s0) to manual:

nano /etc/network/interfaces
auto enp9s0
iface enp9s0 inet manual

Step 3: Configure Linux Bridge

Next, create a Linux bridge (vmbr0):

auto vmbr0
iface vmbr0 inet static
    address 192.168.69.103
    netmask 255.255.255.0
    gateway 192.168.69.1
    bridge_ports enp9s0
    bridge_stp off
    bridge_fd 0

Make sure to replace the address, netmask, and gateway parameters with the correct values for your network.

Step 4: Apply Configuration

Save and exit the file, then restart the network service for the changes to take effect:

systemctl restart networking.service

Step 5: Verify Configuration

Use the ip a command to verify that the bridge was created successfully:

ip a

Step 6: Configure the Hosts File

The /etc/hosts file maps network addresses to hostnames. Open this file in a text editor:

nano /etc/hosts

Then, define the IP address and corresponding FQDN and hostname for your Proxmox server:

127.0.0.1	localhost
192.168.69.103	bkk03.yourdomain.com	bkk03

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

Remember to replace 192.168.69.103, bkk03.yourdomain.com, and bkk03 with your server's IP address, FQDN, and hostname, respectively.

After updating the /etc/hosts file, save and exit the editor.

Important: Ensure the FQDN in your /etc/hosts matches the actual FQDN of your server. This FQDN should be resolvable from the server itself and any machines that will be accessing it. The Proxmox web interface uses this hostname to generate SSL certificates for the HTTPS interface, so incorrect resolution may lead to certificate issues.

By carefully following the instructions provided in this guide, administrators can ensure a robust and secure networking setup for their Proxmox servers. This guide should provide a good starting point for both new and experienced Proxmox administrators to understand and manage the network settings of their servers effectively.

Troubleshooting

If you run into issues during this process, you can use the following commands to troubleshoot:

  • systemctl status networking.service: Displays the status of the networking service.

  • journalctl -xeu networking.service: Provides detailed logs for the networking service.

  • ip addr flush dev <interface> and ip route flush dev <interface>: Clears IP addresses and routes on a given interface.

  • ip link delete <bridge> and ip link add name <bridge> type bridge: Deletes and recreates a bridge.

  • ip link set <interface> master <bridge>: Assigns an interface to a bridge.

  • ip addr add <ip>/<subnet> dev <bridge>: Assigns an IP address to a bridge.

Remember to replace <interface>, <bridge>, <ip>, and <subnet> with the appropriate values for your network.

For more detailed information about Proxmox networking, refer to the official Proxmox documentation.

Filesystem

Blockchain nodes, such as validators and archive nodes, require a highly reliable and efficient filesystem to operate effectively. The choice of filesystem can significantly affect the performance and reliability of these nodes. In light of performance concerns with ZFS, especially in ParityDB workloads as discussed in paritytech/polkadot-sdk/pull/1792, this guide provides a detailed approach to configuring a robust filesystem in Proxmox.

Filesystem Choices and Their Impact

The extensive use of I/O operations by blockchain nodes means the filesystem must manage write and read operations efficiently. CoW filesystems, while feature-rich and robust, introduce overhead that can degrade performance, as evidenced by the cited benchmarks.

Why Not ZFS or Btrfs for Blockchain Nodes?

  • ZFS: While ZFS is revered for its data integrity, the added overhead from features like snapshotting, checksums, and the dynamic block size can significantly slow down write operations crucial for blockchain databases.
  • Btrfs: Similar to ZFS, Btrfs offers advanced features such as snapshotting and volume management. However, its CoW nature means it can suffer from fragmentation and performance degradation over time, which is less than ideal for write-intensive blockchain operations.

Given these insights, a move towards a more traditional, performant, and linear filesystem is recommended.

For high I/O workloads such as those handled by blockchain validators and archive nodes, LVM-thin provisioned with ext4 stands out:

  • ext4: Offers a stable and linear write performance, which is critical for the high transaction throughput of blockchain applications.
  • LVM-thin: Allows for flexible disk space allocation, providing the benefits of thin provisioning such as snapshotting and easier resizing without the CoW overhead.

Strategic Partitioning for Maximum Reliability and Performance

A well-thought-out partitioning scheme is crucial for maintaining data integrity and ensuring high availability.

RAID 1 Configuration for the Root Partition

Using a RAID 1 setup for the root partition provides mirroring of data across two disks, thus ensuring that the system can continue to operate even if one disk fails.

Implementing RAID 1:

  1. Disk Preparation:

    • Select two identical disks (e.g., /dev/sda and /dev/sdb).
    • Partition both disks with an identical layout, reserving space for the root partition.
  2. RAID Array Creation:

    • Execute the command to create the RAID 1 array:
      mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
      
    • Format the RAID array with a resilient filesystem like ext4:
      mkfs.ext4 /dev/md0
      
    • Mount the RAID array at the root mount point during the Proxmox installation or manually afterward.

Boot Partition Configuration

Having two separate boot partitions provides redundancy, ensuring the system remains bootable in the event of a primary boot partition failure.

Configuring Boot Partitions:

  • Primary Boot Partition:

    • On the first disk, create a boot partition (e.g., /dev/sda2).
    • Install the bootloader and kernel images here.
  • Fallback Boot Partition:

    • Mirror the primary boot partition to a second disk (e.g., /dev/sdb2).
    • Configure the bootloader to fall back to this partition if the primary boot fails.

LVM-Thin Provisioning on Data Disks

LVM-thin provisioning is recommended for managing data disks. It allows for efficient disk space utilization by provisioning "thin" volumes that can be expanded dynamically as needed.

Steps for LVM-Thin Provisioning:

  1. Initialize LVM Physical Volumes:

    • Use the pvcreate command on the designated data disks:
      pvcreate /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
      
  2. Create a Volume Group:

    • Group the initialized disks into a volume group:
      vgcreate vg_data /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
      
  3. Establish a Thin Pool:

    • Create a thin pool within the volume group to hold the thin volumes:
      lvcreate --size 100G --thinpool data_tpool vg_data
      
  4. Provision Thin Volumes:

    • Create thin volumes from the pool as needed for containers or virtual machines: ```bash lvcreate --virtualsize 500G --thin data_tpool --name data_volume
      
      
  5. Format and Mount Thin Volumes:

    • Format the volumes with a filesystem, such as ext4, and mount them:
      mkfs.ext4 /dev/vg_data/data_volume
      mount /dev/vg_data/data_volume /mnt/data_volume
      

Integrating LVM-Thin Volumes with Proxmox

Proxmox's pct command-line tool can manage container storage by mapping LVM-thin volumes to container mount points.

Unlabored

Effortless Proxmox Infrastructure is deployed using comprehensive collection of rolebooks use to deploy complete infrastructure as a code can be found now in Github.

Minimizing Our Carbon Footprint

In the relentless march of blockchain technology, we've chosen a path less trodden. Our commitment isn't just to innovation, but to the delicate balance between progress and planetary health.

Polkadot

We've anchored our on-chain operations to the Polkadot network. It's not merely a blockchain; it's a testament to what's possible when brilliant minds converge on the problem of sustainable computation. Polkadot doesn't just process transactions—it redefines the very notion of blockchain efficiency.

The Migration to Conscious Computing

Our servers have found a new home in STT's cutting-edge data center. It's more than a facility; it's a bold statement against the status quo of energy-hungry tech. By 2030, it aims to erase its carbon footprint entirely. Today, it already neutralizes 100% of its emissions—a rare beacon of responsibility in our industry.

The Numbers Don't Lie

Our current rack sip a mere 595.1 kWh monthly. In Thailand's energy landscape, that translates to 335 kg of CO2 equivalent. It's a number we're not satisfied with, but one we're actively working to drive down to zero.

The Road Ahead

Our journey with STT and Polkadot isn't just about ticking boxes or greenwashing our image. It's a fundamental shift in how we view the relationship between technology and our planet. We're not waiting for regulations or market pressures—we're driving change from within.

First TerraPass certificate

Team

Tommi, the founder of Rotko Network, represents a generation that mastered the QWERTY keyboard before perfecting handwriting. With over two decades of experience in building hardware, software, and managing servers, Tommi's journey through the digital landscape is as old as the commercial internet itself.

His adventure with Bitcoin began in 2008, running the first version on a Pentium D950. However, it wasn't until 2013, when Snowden's revelations confirmed his most paranoid thoughts about mass surveillance, that Tommi fully grasped the importance of decentralized systems. He realized that the internet, once a user-driven landscape, had fallen under the control of a handful of corporations, compromising user privacy and freedom.

This eye-opening realization led to the creation of Rotko Network. Tommi's mission is to reshape the internet into a space truly owned by its users, where privacy is fundamental and centralized control is minimized. With the dedication of a seasoned software enthusiast and the heart of a digital freedom fighter, Tommi stands at the helm of this initiative, working to end data exploitation and build a user-centric, user-owned internet infrastructure.

Meet Dan, a bona fide wizard of programming who cut his teeth coding back in the 90s, drawing inspiration from the music demoscene. With more than two decades under his belt, he's a seasoned veteran who understands the ins and outs of the game.

Just like many of us old-timers, he has a deep appreciation for functional programming and a penchant for clarity in code, with Rust being his go-to tool. He's got this knack for building software that runs as close to the metal as possible, extracting every bit of performance he can get.

One of his remarkable feats is constructing intricate drum machines entirely from scratch, a testament to his understanding of complex systems and algorithmic creativity. He's not just a coder; he's a craftsman.

Meet Al, our SEA timezone DevOps maestro, whose journey from pure mathematics to the world of backend development and deployment is as fascinating as a well-optimized algorithm. With a fresh master's degree in mathematics, Al's love affair with Linux was the plot twist that redirected his career path from abstract theorems to applied math in from of the code.

While his classmates were wrestling with complex integrals, Al was falling head over heels for the elegance of Linux systems. His setup is a masterpiece of minimalism and functionality, reflecting the same precision he once applied to mathematical proofs. You might catch him explaining load balancing algorithms with the same enthusiasm he once reserved for discussing the Riemann hypothesis.

But it's in the realm of deployment where Al truly shines. He treats our infrastructure like a complex equation, constantly seeking the most elegant solution. His latest obsession? Exploring how NixOS can bring the immutability and reproducibility of mathematical constants to our systems. Al's unique background brings a fresh perspective to our team, proving that in the world of tech, a solid foundation in mathematical thinking is an invaluable variable in the equation of success.

Walt, our Americas timezone NOC virtuoso, is the digital equivalent of a Swiss Army knife - versatile, reliable, and always ready to tackle the unexpected. With a rich background in full-stack development and systems administration, Walt brings a unique perspective that bridges the often treacherous gap between application development and infrastructure management. His expertise is a tapestry woven from threads of Linux wizardry, Docker sorcery, and cloud platform mastery, creating a skillset that's as diverse as it is deep.

In the realm of automation, Walt is nothing short of a maestro. Armed with Python and Bash as his instruments, he orchestrates symphonies of scripts that turn complex, time-consuming tasks into elegant, efficient processes. But Walt's true superpower lies in his approach to problem-solving. Where others see insurmountable obstacles, Walt sees puzzles waiting to be solved, often conjuring creative solutions that leave the rest of the team wondering if he's secretly a tech wizard in disguise.

With over 40 years of diverse and significant technical experience, Mikko is the ideal advisor for creating Internet and networking infrastructure at Rotko Networks. His expertise spans across all technical layers, from layer 1 hardware programming to layer 7 application interfaces, making him an essential asset, especially considering the CEO's top-to-bottom learning path.

His technical journey began in the mid-1980s at Nokia Mobile Phones, where he hand-wrote UI with NEC's Assembly without a compiler, demonstrating a profound understanding of low-level programming. His most notable achievement at Nokia was the invention of the menu buttons on the display, a pioneering feature that has become ubiquitous in mobile user interfaces.

One of the most noteworthy roles was serving as the IT Manager at the University of Turku, where he was responsible for managing and upgrading the entire IT infrastructure, including modern Data Center and network services. He implemented crucial projects like Datacenter upgrades, WLAN enhancements, network topology redesigns, and developed vital services such as private cloud storage and learning platforms.

His profound knowledge of technologies like Novell NetWare, AD, MS Exchange, backup and storage systems, IIS, ISA Firewall, DNS, and DHCP, coupled with his broad understanding of both low-level and high-level systems, makes him a tremendous asset for Rotko Networks. His broad and deep technical expertise ensures he will provide significant guidance in building a robust and efficient Internet and networking infrastructure.

Resources

Monitoring

  • speedtest-go - A simple command line interface for testing internet bandwidth globally
  • vaping - A vaping monitoring framework written in Python
  • smokeping - Latency Monitor & Packet Loss Tracker
  • bird - BIRD Internet Routing Daemon
  • Atlas Probe - RIPE Atlas Software Probe
  • gatus - Service health dashboard

Alerts

  • pushover - Multidevice Real-time notifications/alerts
  • snitch - Watchdog for prometheus alerts

Web Tools

Post Mortems

Why We Write Postmortems

At Rotko Network, we believe in radical transparency. While it's common in our industry to see providers minimize their technical issues or deflect blame onto others, we choose a different path. Every failure is an opportunity to learn and improve - not just for us, but for the broader network engineering community.

We've observed a concerning trend where major providers often:

  • Minimize the scope of incidents
  • Provide vague technical details
  • Deflect responsibility to third parties
  • Hide valuable learning opportunities

A prime example of this behavior can be seen in the October 2024 OVHcloud incident, where their initial response blamed a "peering partner" without acknowledging the underlying architectural(basic filtering) vulnerabilities that allowed the route leak to cause such significant impact.

In contrast, our postmortems:

  • Provide detailed technical analysis
  • Acknowledge our mistakes openly
  • Share our learnings
  • Document both immediate fixes and longer-term improvements
  • Include specific timeline data for accountability
  • Reference relevant RFCs and technical standards

Directory

2024

  • 2024-12-19: Edge Router Configuration Incident
    • Impact: 95-minute connectivity loss affecting AMSIX, BKNIX, and HGC Hong Kong IPTx
    • Root Cause: Misconceptions about router-id uniqueness requirements and OSPF behavior
    • Status: Partial resolution, follow-up planned for 2025

2025

  • 2025-3-6: Router Misconfiguration Incident
    • Impact: 18-minute network outage affecting lower era points for validator and RPC nodes downtime
    • Root Cause: Misconfiguration between a router and switch.
    • Status: Fully recover

Network Outage Postmortem (2024-12-19/20)

Summary

A planned intervention to standardize router-id configurations across our edge routing infrastructure resulted in an unexpected connectivity loss affecting our AMSIX Amsterdam, BKNIX, and HGC Hong Kong IPTx peering sessions. The incident lasted approximately 95 minutes (23:55 UTC to 01:30 UTC) and impacted our validator performance on both Kusama and Polkadot networks. Specifically, this resulted in missed votes during Kusama Session 44,359 at Era 7,496 and Polkadot Session 10,010 at Era 1,662 with a 0.624 MVR (missed vote ratio).

Misconfguration illustrated

Technical Details

The root cause was traced to an attempt to resolve a pre-existing routing anomaly where our edge routers were operating with multiple router-ids across different uplink connections and iBGP sessions. The heterogeneous router-id configuration had been causing nexthop resolution failures and inability to transit in our BGP infrastructure.

The original misconfiguration stemmed from an incorrect assumption that router-ids needed to be publicly unique at Internet exchange points. This is not the case - router-ids only need to be unique within our Interior Gateway Protocol (IGP) domain. This misunderstanding led to the implementation of multiple router-ids in loopback interfaces, creating unnecessary complexity in our routing infrastructure.

During the remediation attempt to standardize OSPF router-ids to a uniform value across the infrastructure, we encountered an unexpected failure mode that propagated through our second edge router, resulting in a total loss of connectivity regardless of router&&uplink redundancy. The exact mechanism of the secondary failure remains under investigation - the cascade effect that caused our redundant edge router to lose connectivity suggests an underlying architectural vulnerability in our BGP session management.

Response Timeline

  • 23:55 UTC: Initiated planned router-id standardization
  • ~23:56 UTC: Primary connectivity loss detected
  • ~23:56 UTC: Secondary edge router unexpectedly lost connectivity
  • 01:30 UTC: Full service restored via configuration rollback

Mitigation

Recovery was achieved through an onsite restoration of backed-up router configurations. While this approach was successful, the 95-minute resolution time indicates a need for more robust rollback procedures and most of all precaution during network configuration.

Impact

  • Kusama validator session 44,359 experienced degraded performance with MVR 1 in Era 7,496 and missed votes in Era 7,495
  • Polkadot validator session 10,010 experienced degraded performance with 0.624 MVR in Era 1,662
  • Temporary loss of peering sessions with AMSIX, BKNIX, and HGC Hong Kong IPTx

Correct connectivity

Current Status and Future Plans

The underlying routing issue (multiple router-ids in loopback) remains unresolved. Due to the maintenance freeze in internet exchanges during the holiday period, the resolution has been postponed until next year. To ensure higher redundancy during the next maintenance window, we plan to install a third edge router before attempting the configuration standardization again.

Future Work

  1. Implementation of automated configuration validation testing
  2. Enforce usage of Safe Mode during remote maintenance to prevent cascading failures
  3. Investigation into BGP session interdependencies between edge routers
  4. Read RFC 2328 to understand actual protocol and how vendor implementation differ
  5. Installation and configuration of third edge router to provide N+2 redundancy during upcoming maintenance
  6. Study route reflector architechture to move route management from edge routers to centralized route server like birdc that is known for correctness in implementation of RFC specs.
  7. Implementation of RFC 8195 for improved traffic steering via large BGP communities

Network Outage Postmortem (2025-3-6)

Summary

On March 6, 2025, at approximately 09:31 UTC, a network misconfiguration on router caused an outage, rendering the router and other dependent infrastructure unreachable. Reversing the misconfiguration did not immediately restore network connectivity, leading to an 18-minute downtime until a router reboot resolved the issue. The outage impacted a lower era points for our validators, while our RPC nodes SLA allows ~40 mins of weekly downtime.

Technical Details

The issue arose when configuring our router & switch infra. Approximately 1-3 minutes later after doing the suspected cause the infra is not reachable using ssh and our proxy for ssh is inside the infra. And after tried to use WireGuard, we can ssh into the router and reverse the changes. However, this reversal did not take effect immediately, leaving the router in a stuck state where SSH access was unreachable. This situation persisted until a manual reboot restored proper functionality.

Response Timeline

  • 09:31 UTC: Network outage occurred & connectivity loss detected
  • ~09:37 UTC: Connectivity with WireGuard established
  • ~09:38 UTC: Reversal of the root cause, but not solve the problem
  • ~09:47 UTC: Decision to reboot the router
  • 09:49 UTC: Service restored after rebooting the router

Mitigation

The immediate resolution was to reboot bkk50, which restored its functionality as the network gateway. However, this incident highlighted the lack of a fully redundant infrastructure.

Impact

  • Network downtime lasted approximately 18 minutes, affecting internal infrastructure.
  • No major disruption to services, but some validators experienced mildly lower era points in Era 7801 for Kusama and Era 1738 for Polkadot.
  • No impact on RPC services, which remain within SLA allowances (weekly downtime limit: 40 minutes).

Current Status and Future Plans

The network is now fully operational. However, this incident underscores the need for more robust redundancy and streamlined access to critical infrastructure.

Future Work

Identified Problems:

  • Lack of a redundant fallback mechanism for critical routers.
  • Current proxy method for router access relies on internal infrastructure instead of an external jump server.
  • Forced reliance on WireGuard VPN for emergency access, which can be cumbersome.

Improvement Plan:

Jump Server Access: Configure external jump server access, allowing:

  • Simple SSH access via ssh router_name@jump.rotko.net.
  • Users to have pre-configured jump configurations on their machines.
  • Access management via per-user SSH authorized_keys file.

Network Redundancy:

  • Improving infra setup to have more redundancy.

Kusama Validator Outage Post-Mortem: Era 8,219

Validator: rotko.net/ksm02 Stash: DKKax6uZkiNPfd2ATd8cJhyi3c1KZD24VDdoWG9CfTmwgSp

Summary Complete validator outage occurred at era 8,219, lasting 6

consecutive sessions. Root cause: physical power disconnection on bkk04 (7950X) during maintenance combined with monitoring blind spot.

Impact

  • 6 consecutive sessions(over 2eras) offline (no signing/voting/backing/block production)
  • Complete loss of validator duties during outage period

Root Cause Analysis

  1. Hardware failure: Power cable disconnection on bkk04 during routine maintenance. 1U form factor precludes redundant PSU configuration.
  2. Monitoring failure: Node excluded from Prometheus configuration, creating visibility gap. No alerts triggered during outage.

Immediate Actions

  • Migrated validation from failed bkk04 (7950X) to higher performance server (9950X on bkk03)
  • Scheduled onsite assessment for 2025-06-20

Corrective Measures Implemented

  1. Dual monitoring system:
    • Direct node metrics monitoring (CPU, memory, network, sync status)
    • Performance monitoring via forked turboflakes API with custom Prometheus exporter (era points, backing performance, block production)
  2. Redundancy: Two independent monitoring paths ensure failure detection even if one system fails
  3. Configuration audit: All active validators now properly configured in both monitoring systems

Future Prevention & Automated Failover

  • Enhanced physical maintenance protocols to prevent cable disconnections
  • Automated configuration validation for monitoring systems
  • Automated Failover System (in development):
    • Trigger: MVR=1 (100% missed votes) for single session
    • 30-second polling interval via turboflakes API
    • Automatic session key rotation to backup node
    • Maximum 1-2 session downtime (rotation delay only)
    • 30-minute cooldown prevents flip-flopping
    • Eliminates manual detection and reaction window at possibly error prone timeslots

The dual monitoring approach ensures immediate detection of failures. Automated failover implementation will reduce outage duration from hours to single session, providing true high availability for our validation services.

post mortem: anycast ip 160.22.181.81 unreachable from internet

tldr

problem: external users couldn't reach anycast ip 160.22.181.81 despite correct bgp announcements
root cause: anycast ip commented out in bkk06 interface config, bird announcing as unreachable, routing table corruption preventing bird reloads
immediate fix: uncommented ip in /etc/network/interfaces, cleaned duplicate routing table entries, restarted bird
systematic fix: updated network config generator to prevent routing table corruption and improve anycast management
result: external connectivity restored in <5 minutes, prevention system deployed
action items: configure redundant anycast instances on bkk07/bkk08, add external monitoring


problem statement

external users could not ping 160.22.181.81 from the internet, despite:

  • aggregate 160.22.180.0/23 being properly announced to upstream providers
  • internal ibgp routing showing correct paths to the anycast ip
  • no obvious bgp or routing issues at the edge

investigation timeline

initial analysis

  • confirmed aggregate announcement to upstreams (amsix, hgc, bknix)
  • verified internal routing table showed 160.22.181.81/32 via 10.155.206.1 (bkk06)
  • ping from edge routers (bkk00/bkk20) to 160.22.181.81 failed with timeouts
  • discovered icmp redirects when pinging from bkk06 itself

root cause discovery

investigation of bkk06 configuration revealed multiple issues:

  1. anycast ip not configured on interface

    • /etc/network/interfaces had the line commented out:
    #up ip addr add 160.22.181.81/32 dev lo
    
  2. bird announcing unreachable route

    • bird static configuration had:
    route 160.22.181.81/32 unreachable;
    
    • this caused bird to announce the route but blackhole traffic locally
  3. routing table corruption preventing bird reloads

    • /etc/iproute2/rt_tables had multiple duplicate entries:
    100 anycast
    100 public  
    100 anycast
    100 anycast  # ... 6+ duplicates
    
    • caused bird reload failures: already defined errors
    • accumulated from repeated network interface restarts

changes implemented

1. immediate fix - restored connectivity

fixed interface configuration:

# before
#up ip addr add 160.22.181.81/32 dev lo

# after  
up ip addr add 160.22.181.81/32 dev lo

cleaned routing table duplicates:

# removed duplicates, restored clean version
255	local
254	main
253	default
0	unspec
100	anycast

restarted bird daemon:

systemctl restart bird

bird's direct protocol automatically detected the newly configured interface ip and began announcing 160.22.181.81 as reachable.

2. systematic fix - prevention system

updated network config generator (commit 7f3da69):

prevented duplicate routing table entries:

# old (caused duplicates on every interface restart)
up echo "100 anycast" >> /etc/iproute2/rt_tables 2>/dev/null || true

# new (checks before adding)
add_rt_table() {
  local table_id="$1" table_name="$2"
  echo "up grep -q \"^${table_id}[[:space:]]${table_name}\" /etc/iproute2/rt_tables || echo \"${table_id} ${table_name}\" >> /etc/iproute2/rt_tables"
}

improved anycast source routing:

  • only creates anycast routing table when anycast IPs are configured
  • proper load balancing across route reflectors
  • comprehensive cleanup rules prevent rule accumulation

enhanced bond configuration:

  • changed default from 802.3ad to active-backup for better compatibility
  • fixed interface naming for split uplink configurations

verification

immediate results:

  • external ping to 160.22.181.81 successful (62ms response time)
  • bird status clean with no configuration errors
  • bgp advertisements showing proper reachable routes

systematic prevention:

  • network interface restarts no longer accumulate routing table entries
  • bird configuration reloads work reliably
  • anycast routing properly configured with failover capabilities

lessons learned

configuration management issues

  • critical services had commented-out configuration
  • no monitoring to detect interface ip mismatches vs bgp announcements
  • accumulated config debris from repeated interface operations
  • fixed: automated config generation prevents manual configuration drift

anycast design gaps

  • only one server (bkk06) configured for the anycast ip
  • no failover capability - single point of failure
  • fixing: systematic anycast deployment across multiple servers

monitoring blindspots

  • edge router bgp announcements showed "valid" but didn't detect local unreachability
  • missing end-to-end connectivity monitoring from external vantage points
  • no alerting on interface configuration drift vs bgp announcements

recommendations

immediate (completed)

  • configure anycast ip on interface
  • clean routing table duplicates
  • restart bird daemon
  • deploy systematic fix to prevent recurrence

short term

  • configure 160.22.181.81 on bkk07 and bkk08 for redundancy using updated config generator
  • test failover scenarios (shutdown bkk06, verify traffic shifts)
  • add external monitoring for anycast ips

long term

  • implement configuration drift detection
  • add monitoring for interface vs bgp announcement consistency
  • document anycast ip allocation and server responsibilities
  • create runbooks for anycast troubleshooting
  • automated anycast health checking and failover testing

technical details

updated network configuration generator

key improvements:

  • duplicate prevention: routing table entries checked before addition
  • anycast routing: proper source-based routing with load balancing
  • cleanup automation: comprehensive rule cleanup prevents accumulation
  • bond compatibility: improved defaults for diverse network environments

anycast source routing example:

# creates anycast table only when needed
add_rt_table "100" "anycast"

# load balancing across route reflectors  
post-up ip route add default table anycast nexthop via 10.155.106.0 weight 1 nexthop via 10.155.206.0 weight 1

# source-based routing for anycast traffic
post-up ip rule add from 160.22.181.81 table anycast priority 100

bgp filtering optimization

during investigation, we also consolidated ibgp gateway filtering rules:

bkk20 (blocks vlans 106,107,108):

/ip/firewall/address-list/add list=ibgp-block-gw-v4 address=10.155.106.0/24
/ip/firewall/address-list/add list=ibgp-block-gw-v4 address=10.155.107.0/24  
/ip/firewall/address-list/add list=ibgp-block-gw-v4 address=10.155.108.0/24

bkk00 (blocks vlans 206,207,208):

/ip/firewall/address-list/add list=ibgp-block-gw-v4 address=10.155.206.0/24
/ip/firewall/address-list/add list=ibgp-block-gw-v4 address=10.155.207.0/24
/ip/firewall/address-list/add list=ibgp-block-gw-v4 address=10.155.208.0/24

prevents routing loops through non-existent local vlans while maintaining proper ibgp route reflection behavior.

impact

service restoration: external connectivity to anycast services restored within 5 minutes
reliability improvement: eliminated recurring bird configuration failures
operational efficiency: automated configuration management reduces manual intervention
scalability: systematic anycast deployment enables rapid service expansion

Penumbra

Ledger Penumbra Development Setup Guide

This guide covers setting up the Penumbra app for Ledger hardware wallets on Linux and macOS. Tested successfully only on NixOS so message me out if you spot any faults in it. It's also good to note out that you should probably be quite advanced computer user to participate in testing the hardware wallet this early.

Prerequisites

System Dependencies

Linux (Ubuntu/Debian):

sudo apt update && sudo apt-get -y install build-essential git wget cmake \
libssl-dev libgmp-dev autoconf libtool python3 python3-venv

macOS:

brew install cmake autoconf automake libtool python3

Docker

Additional Tools

  • Node.js > v14.0 (via nvm or n)
  • Rust toolchain (for running tests)

Setup Instructions

1. Clone and Initialize Repository

# Clone the repository
git clone https://github.com/Zondax/ledger-penumbra.git
cd ledger-penumbra

# Initialize submodules (CRITICAL STEP)
git submodule update --init --recursive

2. Build the Application

Build for your specific device:

# For Nano S+
make buildS2

# For Nano X
make buildX

# For Flex
make buildFL

# For Stax
make buildST

Installation Methods

Method A: Using Development Release (Easiest)

  1. Download the installer from releases page
  2. For Nano S+, download installer_s2.sh
  3. Make executable and run:
chmod +x ./installer_s2.sh
./installer_s2.sh load

Setup Python Environment

Since many systems have read-only Python installations (like NixOS), use a virtual environment:

python3 -m venv venv
source venv/bin/activate
pip install ledgerblue

USB Permissions (Linux only)

# Download and install Ledger udev rules
wget -q -O - https://raw.githubusercontent.com/LedgerHQ/udev-rules/master/add_udev_rules.sh | sudo bash

# Replug your device after installing rules

Load to Device

# Ensure device is connected and unlocked
make loadS2  # or loadX, loadFL, loadST for other devices

Method C: Using Emulator (No Device Needed)

# Install Zemu emulator
make zemu_install

# Run tests in emulator
make zemu_test

Preparing Development Device

⚠️ WARNING: Use a dedicated development device without real funds!

1. Enter Recovery Mode

  • Hold right button while plugging in device
  • Device shows "Recovery"

2. Initialize with Test Mnemonic

make dev_init

This sets:

  • PIN: 5555
  • Mnemonic: equip will roof matter pink blind book anxiety banner elbow sun young

3. Add Development Certificate

# Still in recovery mode
make dev_ca

Running Tests

Unit Tests (Rust)

make rust_test

Integration Tests (Zemu)

make zemu_test

All Tests

make test_all

Troubleshooting

"No dongle found" Error

  1. Check device is connected and unlocked
  2. Verify USB detection: lsusb | grep Ledger
  3. On Linux, ensure udev rules are installed
  4. Try without sudo first, then with sudo if needed

"BOLOS_SDK is not set" Error

Never run make commands from the app/ subdirectory. Always run from project root.

Submodule Errors

If you see errors about missing files in deps/:

git submodule update --init --recursive

Python Installation Issues

If system Python is read-only (NixOS, etc.):

python3 -m venv venv
source venv/bin/activate
pip install ledgerblue

Development Workflow

  1. Make code changes
  2. Rebuild: make buildS2
  3. Reload: make loadS2
  4. Test: make zemu_test

Platform-Specific Notes

macOS

  • Ensure Docker Desktop is running
  • No udev rules needed
  • Device should work out of the box

Linux

  • Install udev rules for device recognition
  • May need to run with sudo for USB access
  • Some distros require user to be in plugdev group

Important Notes

  • The releases on GitHub are unvetted development builds
  • Only use devices without real funds for development
  • Production-ready versions will be available through Ledger Live after security audit