Back to Chronicles
April 2, 202614 min read

The Vault Lattice

We ran a physical wire between two machines and built an operating system on top of it. Permanent link. Shared vault. Synchronized ledgers. Seven agents that treat both nodes as one brain. This is the architecture sprint that turned a loose workstation setup into a coordinated forge fabric.


Distributed Greyforge orchestration diagram

The Vault Lattice: two nodes, one operating fabric.

>_The Split-Brain Problem

Most agent-augmented engineering environments carry a quiet structural flaw. They have models. They have automation. They might even have multiple specialist agents. But every machine is an island. Agent A on one node doesn’t know what Agent B on another node decided ten minutes ago. The human becomes the synchronization layer — and the human is bad at that job.

We had the same problem. Two capable machines, seven trained agents, a growing codebase, and a runtime state that lived in terminal history and half-remembered context windows. One node knew things the other didn’t. Agents would contradict each other across the boundary. Every handoff was a game of telephone.

The architectural response was to stop treating them as two machines entirely.

The Physical Layer

The foundational decision was physical, not architectural. We ran a dedicated ethernet link between Machine A and Machine B on a private subnet. No router hop. No Wi-Fi latency lottery. Just copper between two systems, configured as a permanent point-to-point fabric link with static addressing.

Machine A acts as the NAT gateway. Machine B runs headless with its display off, always on, always reachable. Internet traffic from B forwards through A’s encrypted VPN tunnel with the correct firewall rules, MTU clamping, and connection tracking applied at boot by a systemd-managed service. The link survives VPN renegotiations, reboots, and power state changes.

SSH is instant. Transfers are local-speed. The agents don’t care which machine they’re operating on because the wire is always there.

# topology
Machine A (primary workstation)
    |
    +=== dedicated ethernet (permanent, private subnet) ===+
    |
Machine B (headless server node)

bidirectional SSH • NAT forwarding • VPN passthrough
headless operation • boot-persistent firewall rules

ForgeNode: The Machine Identity Layer

A wire is infrastructure. ForgeNode is what gives it meaning. It is the machine-readable identity and capability layer that tells every agent what this system looks like: node manifests with hardware specs, installed runtimes, active services, network topology, and SSH paths. A central registry maps the full fabric. Capability indices enumerate every tool available on each node.

The design principle: agents consult the manifest before they act. Instead of guessing whether a tool exists or improvising a network route, they read a document that tells the truth. When we added the dedicated link, every agent immediately understood the new topology because we updated one file. No prompt engineering. No retraining. Just infrastructure-as-documentation, version-controlled and cloned to both nodes.

The Memory Plane

The thesis behind this layer is documented in The Lattice Remembers: memory is the load-bearing wall of a multi-agent system, not a feature bolted on afterward. What follows is how we implemented it.

Two layers work in concert. Below the agents sits a canonical database — runtime state, decision logs, telemetry, performance history. Everything writes to the ledger first. Above it sits a synchronized vault: human-legible documents covering architecture decisions, operational doctrine, and system state. A local REST API and an MCP bridge on both machines give agents programmatic access. A planning agent on one node deposits a strategy document; every other agent picks it up on the next cycle without anyone copying files.

The hard rule, inherited from the ForgeClaw v3 rebuild: the database is canonical. The vault, dashboards, and export artifacts are derived views. If a derived view and the database disagree, the derived view is wrong. Period.

Seven Agents, One Context Plane

The Forge Agent Council is seven specialist agents orchestrated through a gateway that runs on the always-on node. Each agent has a defined domain and its own workspace, identity definition, and behavioral constraints. They range from code implementation and architecture to security, infrastructure, knowledge management, content, and market operations.

What makes this work across two physical machines is the shared context plane. Every agent reads from the same node manifests, the same vault, the same pointer files. When the infrastructure agent provisions a service on Machine B, the code agent on Machine A sees the updated topology on its next invocation. When the planning agent writes a migration strategy, the security agent can audit it without anyone copying context between terminals.

The gateway is migrating to the headless node permanently, which means the agent fabric will be available regardless of what the primary workstation is doing — accessible via messaging, CLI, GUI, or direct protocol from anywhere on the fabric.

The Council

Merlin
Implementation
Nabu
Architecture
Vulcan
QA & Security
Gaia
Infrastructure
Thoth
Knowledge
Huginn
Content
Richie
Market Ops
Striker
Primary

Live Telemetry Across the Wire

One of the first tools built on the new fabric was a real-time monitoring dashboard. Machine A opens a persistent tunnel to Machine B and tails its service journals, parsing every runtime decision into a structured, color-coded feed with extracted signals, scores, and classifications. One node watches the other work.

This pattern — cross-node observation through a permanent link — is what makes the two-node topology more than a convenience. Visibility is built into the wire, not bolted on afterward.

The Parts That Were Harder Than Expected

The physical link was trivial. The networking around it was not.

The Silent Kill: VPN vs. Forwarding

When the VPN activates on Machine A, it rewrites the host firewall with a default DROP policy on the FORWARD chain. Machine B’s internet dies silently. The first diagnostic pass was misleading: ICMP still passed. Plain HTTP worked intermittently. The breakage was selective and inconsistent, which pointed away from the actual cause.

The root cause was a two-layer problem. First, the VPN kill-switch dropped all forwarded packets by policy. Second, even after adding explicit FORWARD ACCEPT rules for the private subnet, TCP connections still hung during TLS handshakes. The VPN tunnel’s reduced MTU (1280 bytes for WireGuard) caused large TLS ClientHello packets to be silently dropped — no ICMP “fragmentation needed” response, no timeout, just a connection that never completed.

# the three-layer fix
1. FORWARD chain: ACCEPT for 10.10.10.0/24 in both directions
2. NAT: MASQUERADE on both the VPN interface and the physical LAN
3. Mangle table: TCP MSS clamp-to-PMTU on all SYN packets
# wrapped in an idempotent boot script, managed by systemd
# survives VPN renegotiations, reboots, interface changes

The MSS clamp was the non-obvious fix. It forces TCP endpoints to negotiate a maximum segment size that fits inside the VPN tunnel, preventing the silent large-packet drops that made TLS fail while ICMP succeeded. The kind of problem that takes an afternoon to diagnose and thirty seconds to explain afterward.

The Headless Problem: Two Daemons, One Failure

Running Machine B headless — display closed, always on — required overriding two independent power-management systems. The system-level daemon (logind) had a lid-close policy. The desktop environment had its own, separate lid-close handler. Configuring one did not disable the other. The failure was silent: close the lid, SSH connection holds for ten seconds, then the node vanishes. No log entry, no warning, just a machine that decided to sleep.

Both had to be overridden independently — the system daemon through its configuration file, the desktop environment through its settings registry — for both AC power and battery states. Four configuration values across two subsystems for a single behavioral change. These are the unglamorous details that separate a demo from a system you can leave running overnight and find operational in the morning.

DNS: The Third Surprise

After the firewall fix, Machine B could ping external IPs but not resolve domain names. The VPN’s kill-switch had also blocked UDP forwarding on port 53. The FORWARD chain rules fixed this alongside the TCP issue, but the symptom presented differently enough that it appeared to be a separate problem. It was not. The lesson: when a VPN kill-switch says “drop everything,” it means everything — DNS, HTTP, TLS, all of it. Fix the forwarding policy once and comprehensively.

What’s Running Behind the Curtain

There is a private market-intelligence system in active development on this fabric. It runs live. It makes autonomous decisions against real conditions. We are not discussing the implementation publicly.

What we can say: the vault, the database discipline, the agent mesh, the always-on headless node, and the permanent physical link were not built for elegance. They were built because the next class of systems we are deploying requires zero ambiguity in state, continuous coordination, and agents that never lose context. The infrastructure in this chronicle is what makes that possible.

What Changed Because of This

The AI engineering conversation is stuck on models. Bigger context windows. Better benchmarks. Faster inference. None of that matters if your agents can’t remember what happened on the other node, can’t read the current system state, and can’t coordinate without a human relaying context between terminals.

The Vault Lattice is our response to that gap. It is not a framework and it is not a product. It is an operating topology: two machines, one context plane, a permanent physical link, and a set of conventions that make agents and humans reason against the same truth.

ForgeNode gives agents machine awareness. The vault gives them shared memory. The database gives them a source of truth they cannot corrupt. The wire gives them a link they cannot lose. And the council gives them specialization without isolation.

It is not glamorous plumbing. It is the difference between a pile of tools and an operating system.


Continue Reading

The Vault Lattice is the infrastructure layer. These chronicles cover the systems and doctrine built on top of it.