SEND
RPC
BROADCAST
ARCHITECTURE

How subway works.

Transport stack, protocol design, bridge API, NAT traversal, and resilience — the full picture of agent-to-agent communication.

Transport Stack

Seven layers. Zero ceremony.

From your agent code down to raw UDP — every layer is purpose-built for agent-to-agent communication.

0
YOUR AGENT CODE
1
AGENTNODE
2
SUBWAY-P2P
3
LIBP2P
4
QUIC + WEBTRANSPORT
5
NOISE PROTOCOL
6
UDP
The relay server is a separate repo. Any Rust project can depend on subway-core directly.
Why QUIC

TCP head-of-line blocks on packet loss. QUIC multiplexes independent streams over UDP. One lost packet on stream 3 doesn't block streams 1, 2, 4. Connection setup is 1 RTT vs TCP's 3 for TLS.

Why Noise

Every agent has an Ed25519 keypair. When two agents connect, they perform a Noise handshake:

1. Both sides exchange ephemeral keys
2. They derive a shared secret
3. All subsequent traffic is encrypted with that shared secret

The relay facilitates the connection but cannot decrypt the traffic. Even if someone compromises the relay, they see gibberish.

Workspace

Five crates, zero bloat.

The workspace ships everything an agent needs — core API, networking, bridge, CLI, and protobuf definitions.

subway-core

AgentNode API — connect, send, call, broadcast, subscribe. The primary interface for building agents.

node.rs, identity.rs, error.rs
subway-p2p

Standalone P2P networking layer. Request-response, gossipsub, name registry. Zero external deps.

lib.rs, config.rs, provider.rs, codec.rs
subway-bridge

REST + WebSocket bridge for non-Rust clients. Combined actix-web server on a single port.

rest.rs, ws_actix.rs, protocol.rs, server.rs
subway-proto

Protobuf message definitions — AgentMessage, RpcRequest, RpcResponse.

agent.proto, rpc.proto
subway-cli

CLI binary. Commands: bridge, agent, send, broadcast. Agent mode with interactive shell + control socket.

main.rs
subway-relay ↗

Separate private repo. Handles relay server logic. Deployed to Fly.io.

subway-dev/subway-relay
The Relay

Resilient by Design

A hashmap and a circuit breaker. Nothing more.

The relay does two things. It maps names to peer IDs (an in-memory hashmap). And it bridges connections for agents behind NAT (relay circuits). That's it. No message queue. No database. No state to back up.

Think of it as DNS combined with a mail carrier. The relay knows addresses and helps agents behind firewalls reach each other. Once connected, agents talk directly. The relay sees ciphertext.

Name Renewal
Re-registers every 30s. 5 consecutive failures trigger full reconnect.
Relay Watchdog
Monitors disconnects. Reconnects with exponential backoff (1s → 30s max).
REST + Health
Port 9001: /v1/health, /v1/stats, /multiaddr. bridge.relay agent auto-connects.
Relay Discovery
Agents fetch multiaddr via HTTP. HTTPS fallback. PeerId extracted automatically.
Production
public relay
Fly.io — ~42MB image
Stores
Name → PeerId hashmap
In-memory. Not persisted.
Requires
~50MB RAM, 0 disk
No Postgres. No Redis. No S3.
Per Agent
~1KB memory overhead
One machine handles thousands.
Communication Patterns

The subway protocol

Three messaging primitives. Everything else builds on top.

Send
SUBWAY_MSG
Fire-and-forget (ACK)
one → one

One-way message. Agent A sends to Agent B by name. ACK prevents circuit teardown. Used for notifications, status updates, commands.

let msg = node.new_agent_message("task", payload);
node.send("beta.relay", msg).await?;
Call
SUBWAY_RPC
Request/response (30s)
one ↔ one

Blocks until target responds or 30s timeout. Correlation ID ties request to response. Async handler spawns per-request tasks. Also available by PeerId directly.

let resp = node.call("beta.relay", req).await?;
// or by peer:
node.call_by_peer_id(pid, req).await?;
Broadcast
gossipsub
Pub/sub multicast
one → many

Wildcard subscriptions (metrics.*) match any sub-topic. No self-delivery — broadcasters never receive their own messages. Topics routed via message metadata through the shared gossipsub mesh.

node.subscribe("metrics.*", |msg| {});
node.broadcast("metrics.cpu", m).await?;
node.unsubscribe("metrics.*").await?;
Listening
on_message(handler)

Receive all direct messages (SUBWAY_MSG)

handle_rpc(handler)

Serve RPC requests synchronously

handle_rpc_async(handler)

Serve RPC requests with async handler (spawns per-request)

Utilities
resolve(name) → PeerId

Name resolution via relay

new_agent_message(type, payload)

Create message with auto-generated ID + timestamp

connected_peer_count() → usize

Number of connected peers on the mesh

Pub/Sub

Topic-based messaging with wildcards.

Subscribe to topics. Broadcast to topics. Wildcards match sub-topics automatically. Broadcasters never receive their own messages. Production verified on the live mesh.

LIVE TEST — 3 AGENTS ON PRODUCTION RELAY
// alpha subscribes to wildcard, beta to exact, gamma to different topic
alpha → subscribe test.*
beta → subscribe test.chat
gamma → subscribe test.status
// gamma broadcasts to test.chat
gamma → broadcast test.chat "hello"
// result:
alpha✓ received (wildcard test.* matched test.chat)
beta✓ received (exact match test.chat)
gamma✗ no self-delivery
Wildcard Matching
metrics.* metrics.cpu, metrics.mem
agents.* agents.status, agents.error
* matches everything
No Self-Delivery

When an agent broadcasts, it never receives its own message — even if subscribed to a matching topic. Filtered by sender PeerId at the subscriber level.

Metadata Routing

Topics are embedded in message metadata and routed through the shared gossipsub mesh. No per-topic gossipsub subscriptions needed. Scales to any number of topics.

Access from anywhere
Rust

node.subscribe() / node.broadcast()

CLI

subscribe <topic> / broadcast <topic> <msg>

REST / WebSocket

POST /v1/broadcast · GET /v1/subscribe SSE

Bridge

REST + WebSocket for every language.

The bridge exposes Subway's P2P capabilities over HTTP and WebSocket. TypeScript, Python, Go — anything that speaks JSON can connect.

POST/v1/send
POST/v1/call
POST/v1/broadcast
GET/v1/subscribe?topic=X
GET/v1/resolve/{name}
GET/v1/health
GET/v1/stats
Names without a . TLD get .relay appended automatically.
NAT Traversal

Agents behind firewalls connect automatically.

No port forwarding.No VPNs.No servers.

Most devices don't have public IP addresses. They sit behind NAT — outbound connections work, inbound connections are blocked. Two agents behind NAT can't directly connect to each other.

The relay has a public IP. Both agents connect outbound. The relay bridges them via a circuit. Then libp2p attempts DCUtR — a direct connection upgrade through simultaneous connect. If it works, traffic bypasses the relay entirely.

// initial connection (through relay)
Agent A ──outbound──▸ Relay ◂──outbound── Agent B
// after DCUtR hole-punch succeeds
Agent A ◂═══ direct QUIC ═══▸ Agent B
// relay no longer in path — end-to-end encrypted
Usage
# one command — detects brew, falls back to binary
$ curl -sSL https://subway.dev/install.sh | sh

# what happens:
#   brew found?  → brew install subway-dev/tap/subway
#   no brew?     → downloads binary to ~/.local/bin

  ╭─────────────────────────────────────╮
  │  subway — P2P transport for agents   │
  ╰─────────────────────────────────────╯

  Homebrew detected — installing via brew...

 installed via homebrew
 upgrade later with: brew upgrade subway

# verify
$ subway --version
subway 0.0.1
Resilience

“Subway is designed to survive network disruptions without manual intervention.”

Relay restart
Names re-register in ~30s
Automatic (name renewal loop)
Agent restart
Peers get NameNotFound briefly
Re-registers on reconnect
Network blip
In-flight messages lost
Relay watchdog reconnects with backoff
Direct conn drop
Falls back to relay circuit
ensure_circuit() dials through relay
5 renewal failures
Agent may be unreachable
Full relay reconnect triggered automatically
Relay disconnect
Watchdog detects event
Exponential backoff: 1s → 2s → 4s → ... → 30s max
The relay is stateless
Name registry

In-memory hashmap. Not persisted to disk.

Relay circuits

libp2p connection state. Not persisted.

Identity keypair

Ed25519 keys on disk. The only state that persists.

Philosophy

Why subway exists

Agents shouldn't depend on centralized brokers. Subway provides a native peer-to-peer network for autonomous systems.

Kafka
Centralized

Requires ZooKeeper/KRaft cluster. Stateful brokers. Partition management. Schema registry. Ops overhead.

Designed for log aggregation, not agent communication.
Redis
Centralized

Single point of failure. Pub/sub is ephemeral. Cluster mode adds complexity. Memory-bound scaling.

Designed for caching, not reliable messaging.
subway
Peer-to-peer

No central broker. Relay is stateless. Agents connect directly. End-to-end encrypted. Scales with the mesh.

Designed for agent-to-agent communication.
The transit metaphor:
stations = agents
tracks = connections
trains = messages
Agents talk to agents.

No servers. No brokers. No ceremony.

Connect an agent. Send a message. Subscribe to a topic.

$ curl -sSL https://subway.dev/install.sh | sh