WebSocket Connection Failures
Overview
This guide covers WebSocket connection failures in OpenClaw's gateway server, based on the connection handling logic in src/gateway/server/ws-connection.ts. WebSocket connections are critical for real-time communication between clients and agents.
Symptoms
Immediate Disconnection
- Client connects but disconnects within seconds
- No data exchanged after connection established
- Connection appears successful initially but fails quickly
Handshake Timeout
- Log message:
handshake timeout conn=<connection-id> - Connection closed with specific timeout reason
- Client never completes authentication/pairing handshake
Premature Closure
- Warning: "closed before connect" in gateway logs
- Connection closed during handshake phase
- Close event received before handshake completion
Noisy Helper Connections
- Swift Package Manager helper connections that close immediately
- These are filtered and logged separately (not true errors)
Root Causes
1. Handshake Timeout (Lines 256-265)
The gateway enforces a handshake timeout to prevent zombie connections:
Default timeout: Configurable via gateway settings Trigger: Client fails to complete handshake within timeout window Recovery: Connection closed, client must reconnect
2. Socket Connection Error (Lines 185-188)
Low-level socket errors during connection establishment:
Common causes: Network issues, proxy problems, TLS errors Recovery: Immediate closure, no retry at server side
3. Client Closed Before Handshake (Lines 215-223)
Client disconnects before completing authentication:
Detection: conn.ready flag is false at close time
Logging: Includes connection duration for diagnosis
Recovery: Clean disconnect, no state updates needed
4. Failed Ed25519 Pairing
If using Ed25519 authentication, invalid signatures or keys cause handshake failure:
Common causes:
- Mismatched public/private key pairs
- Signature verification failure
- Expired or revoked credentials
Recovery Mechanisms
Server-Side Recovery
- Presence State Update: On disconnect, gateway updates presence state and broadcasts snapshot to remaining clients
- Node Cleanup: For gateway nodes, connection is removed from routing tables
- No Automatic Reconnect: Server does not attempt reconnection; client is responsible
Client-Side Recovery
- Detect Disconnect: Monitor WebSocket
closeevent - Inspect Close Code: Check
event.codeandevent.reasonfor root cause - Exponential Backoff: Implement reconnection with increasing delays
- Handshake Optimization: Ensure handshake completes quickly after connection
Diagnosis Steps
Step 1: Check Gateway Logs
Look for connection-specific log entries:
Step 2: Analyze Close Codes
Common WebSocket close codes:
- 1000: Normal closure (clean disconnect)
- 1002: Protocol error (malformed message)
- 1008: Policy violation (handshake timeout)
- 1011: Internal error (server-side error)
Step 3: Measure Connection Duration
Check the durationMs field in logs:
- < 100ms: Likely network/firewall issue
- 100ms - timeout: Slow handshake, may need optimization
- = timeout: Handshake timeout triggered