WebSocket Connection Failures

Overview

This guide covers WebSocket connection failures in OpenClaw's gateway server, based on the connection handling logic in src/gateway/server/ws-connection.ts. WebSocket connections are critical for real-time communication between clients and agents.

Symptoms

Immediate Disconnection

  • Client connects but disconnects within seconds
  • No data exchanged after connection established
  • Connection appears successful initially but fails quickly

Handshake Timeout

  • Log message: handshake timeout conn=<connection-id>
  • Connection closed with specific timeout reason
  • Client never completes authentication/pairing handshake

Premature Closure

  • Warning: "closed before connect" in gateway logs
  • Connection closed during handshake phase
  • Close event received before handshake completion

Noisy Helper Connections

  • Swift Package Manager helper connections that close immediately
  • These are filtered and logged separately (not true errors)

Root Causes

1. Handshake Timeout (Lines 256-265)

The gateway enforces a handshake timeout to prevent zombie connections:

const timeout = setTimeout(() => {
  closeWithReason(
    conn,
    CloseCode.PolicyViolation,
    `handshake timeout`,
    /* broadcast= */ false
  );
}, handshakeTimeoutMs);

Default timeout: Configurable via gateway settings Trigger: Client fails to complete handshake within timeout window Recovery: Connection closed, client must reconnect

2. Socket Connection Error (Lines 185-188)

Low-level socket errors during connection establishment:

ws.on("error", (err) => {
  logger.warn("connection error", { conn: conn.id, err });
  closeWithReason(conn, CloseCode.InternalError, "connection error", false);
});

Common causes: Network issues, proxy problems, TLS errors Recovery: Immediate closure, no retry at server side

3. Client Closed Before Handshake (Lines 215-223)

Client disconnects before completing authentication:

if (!conn.ready) {
  logger.warn("closed before connect", {
    conn: conn.id,
    code,
    reason,
    durationMs: Date.now() - conn.createdAt,
  });
  return;
}

Detection: conn.ready flag is false at close time Logging: Includes connection duration for diagnosis Recovery: Clean disconnect, no state updates needed

4. Failed Ed25519 Pairing

If using Ed25519 authentication, invalid signatures or keys cause handshake failure:

Common causes:

  • Mismatched public/private key pairs
  • Signature verification failure
  • Expired or revoked credentials

Recovery Mechanisms

Server-Side Recovery

  1. Presence State Update: On disconnect, gateway updates presence state and broadcasts snapshot to remaining clients
  2. Node Cleanup: For gateway nodes, connection is removed from routing tables
  3. No Automatic Reconnect: Server does not attempt reconnection; client is responsible

Client-Side Recovery

  1. Detect Disconnect: Monitor WebSocket close event
  2. Inspect Close Code: Check event.code and event.reason for root cause
  3. Exponential Backoff: Implement reconnection with increasing delays
  4. Handshake Optimization: Ensure handshake completes quickly after connection

Diagnosis Steps

Step 1: Check Gateway Logs

Look for connection-specific log entries:

# Search for handshake timeouts
grep "handshake timeout" gateway.log

# Find connection errors
grep "connection error" gateway.log

# Check premature closures
grep "closed before connect" gateway.log

Step 2: Analyze Close Codes

Common WebSocket close codes:

  • 1000: Normal closure (clean disconnect)
  • 1002: Protocol error (malformed message)
  • 1008: Policy violation (handshake timeout)
  • 1011: Internal error (server-side error)

Step 3: Measure Connection Duration

Check the durationMs field in logs:

  • < 100ms: Likely network/firewall issue
  • 100ms - timeout: Slow handshake, may need optimization
  • = timeout: Handshake timeout triggered

Step 4: Verify Network Path