Resilience
Auto-reconnect, exponential backoff, relay watchdog.
Subway handles network failures automatically. You don't need retry logic in your application.
Mechanisms#
| Mechanism | Behavior |
|---|---|
| Name renewal | Re-registers with relay every 30s |
| Renewal failure tracking | After 5 consecutive failures, triggers full reconnect |
| Relay watchdog | Monitors RelayDisconnected events, initiates reconnect |
| Exponential backoff | 1s → 2s → 4s → ... → 30s max |
| Max attempts | 10 reconnect attempts before giving up |
| Graceful shutdown | Drop on AgentNode notifies all background tasks |
Failure scenarios#
Relay restarts
- Relay goes down
- Name renewal fails (5x threshold)
- Watchdog detects
RelayDisconnected - Backoff reconnect loop starts: 1s, 2s, 4s, 8s...
- Relay comes back → agent reconnects, re-registers name
- All peers can reach the agent again
Network partition
- Agent loses internet connectivity
- Name renewal fails
- Watchdog triggers reconnect attempts
- If connectivity returns within the backoff window → automatic recovery
- If all 10 attempts fail → agent stops retrying, logs error
Agent crash
- Agent process dies
- Name expires on relay (no renewal within 30s)
- Other agents get
NameNotFoundwhen trying to reach it - Agent restarts → new connection, same PeerId (if key persists), name re-registered
Key constants#
| Constant | Value |
|---|---|
RELAY_CONNECT_TIMEOUT | 10s |
RPC_DEFAULT_TIMEOUT | 30s |
NAME_RENEWAL_INTERVAL | 30s |
RECONNECT_INITIAL_DELAY | 1s |
RECONNECT_MAX_DELAY | 30s |
RECONNECT_MAX_ATTEMPTS | 10 |
NAME_RENEWAL_MAX_FAILURES | 5 |