TOC 
Network Working GroupS. Ferguson
Internet-DraftCaucho
Intended status: Standards TrackMay 10, 2010
Expires: November 11, 2010 


The WebSockets Protocol
draft-ferg-hybi-websockets-latest

Abstract

The WebSocket protocol enables a bidirectional stream of messages between a client and a server. Messages consist of a sequence of binary frames over TCP. The protocol uses HTTP for its handshake, upgrading to the bidirectional binary frames defined in this document.

Status of this Memo

By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

This Internet-Draft will expire on November 11, 2010.



Table of Contents

1.  Introduction
    1.1.  Requirements
    1.2.  Protocol Overview
2.  General Requirements
    2.1.  Requirements
    2.2.  Syntax Notation
    2.3.  Terminology
    2.4.  Basic Rules
3.  Client Requirements
    3.1.  Client handshake variables
    3.2.  Client Handshake
        3.2.1.  Client Grammar
4.  Server Handshake
    4.1.  Server handshake variables
    4.2.  Server Handshake
        4.2.1.  Server Grammar
5.  Stream Protocol
    5.1.  Stream syntax
    5.2.  Frame syntax
    5.3.  Stream Close
    5.4.  Connection Keepalive
6.  HTTP Headers
    6.1.  Connection
    6.2.  Host
    6.3.  Origin
    6.4.  Sec-WebSocket-Location
    6.5.  Sec-WebSocket-Origin
    6.6.  Sec-WebSocket-Protocol
    6.7.  Upgrade
7.  Control Frames
    7.1.  NOP (op=0)
    7.2.  CLOSE (op=1)
    7.3.  INIT (op=2)
    7.4.  KEEPALIVE-REQUEST (op=3)
    7.5.  KEEPALIVE-RESPONSE (op=4)
    7.6.  [NEGOTIATE-HEADER (op=5)]
8.  [Multiplexing Extension]
    8.1.  CHANNEL-SELECT (op=20)
    8.2.  CHANNEL-CLOSE (op=21)
9.  Security Considerations
    9.1.  HTTP
    9.2.  Browser Scripting attacks
10.  Acknowledgements
11.  Normative References
§  Author's Address
§  Intellectual Property and Copyright Statements




 TOC 

1.  Introduction

NOTE: this document is substantially derived from Ian Hickson's "The WebSocket Protocol" http://www.whatwg.org.specs/web-socket-protocol/. The author of this document is named only for identification purposes, not to hijack any credit.

NOTE: this section is copied verbatim from the HYBI (Loreto, S., “HyBi Requirements and Features,” March 2010.) [HYBI] requirements document.

HTTP RFC2616 (Fielding, R., Gettys, J., Mogul, J., Mainter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.) [RFC2616] is a client/server protocol, where the HTTP servers store the data and provide it when it is requested by clients. When used to used to retrieve data from an HTTP server, the client sends HTTP requests to the server, and the server returns the requested data in HTTP responses. So the client has to poll continuously the server in order to receive new data.

Recently techniques that enable bidirectional communication over HTTP have become more pervasive. Those techniques reduce the need to poll continuously the server thanks to the usage of HTTP hanging requests and multiple connections between the client and the server [I-D.loreto-http-bidirectional].

The goal of HyBi is to provide an efficient and clean two-way communication channel between client and server.

The communication channel will:



 TOC 

1.1.  Requirements

NOTE: requirements are from HYBI (Loreto, S., “HyBi Requirements and Features,” March 2010.) [HYBI]

  1. It MUST be possible to send a message when the total size is either unknown or exceeds a fixed buffer size.
  2. The WebSocket server MUST have the ability to send arbitrary binary content to the client on the established communication channel, in the form of ordered discrete blocks.
  3. The WebSocket server MUST have the ability to send arbitrary text content to the client on the established communication channel, in the form of ordered discrete blocks.
  4. Textual data MUST be encoded as UTF-8
  5. The WebSocket protocol MUST allow HTTP and WebSocket connections to be served from the same port.
  6. The WebSocket protocol MUST provide for graceful close of an active WebSocket connection on request from the user Application.
  7. The WebSocket client MUST be able to request the server, during the handshake, to use a specific WebSocket sub-protocol.
  8. WebSocket should be designed to be robust against cross-protocol attacks.



 TOC 

1.2.  Protocol Overview

This section is non-normative.

The WebSocket protocol has two parts: a handshake using HTTP Upgrade to negotiate the connection, and a following bidirectional frame protocol to send application messages between the peers.

The client handshake looks as follows:

  GET /demo HTTP/1.1
  Host: example.com
  Connection: Upgrade
  Sec-WebSocket-Protocol: sample.example.com/1.0
  Upgrade: WebSocket
  Origin: http://example.com

  %x81%x0A%x02WebSocket
  <Stream data follows>

The server handshake looks as follows:

  HTTP/1.1 101 WebSocket Protocol Handshake
  Upgrade: WebSocket
  Connection: Upgrade
  Sec-WebSocket-Origin: http://example.com
  Sec-WebSocket-Location: ws://example.com/demo
  Sec-WebSocket-Protocol: sample.example.com/1.0

  <Stream data follows>

The bidirectional stream consists of a sequence of data frames combined into application messages, and control frames used to manage the connection itself.

  Stream  = *( Message / control-frame )
  Message = *( non-final-frame ) final-frame

A typical message might consist of a single data frame encoding a text message using UTF-8.

  %x00%x0CHello, world

The client and server will send application messages asynchronously until the end of the stream. Each will close the stream with a Close control frame.

Clients and servers may use a Keepalive control frame to verify if a connection is still valid, which may be needed when network routers drop connections silently.



 TOC 

2.  General Requirements



 TOC 

2.1.  Requirements

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

An implementation is not compliant if it fails to satisfy one or more of the MUST or REQUIRED level requirements for the protocols it implements. An implementation that satisfies all the MUST or REQUIRED level and all the SHOULD level requirements for its protocols is said to be "unconditionally compliant"; one that satisfies all the MUST level requirements but not all the SHOULD level requirements for its protocols is said to be "conditionally compliant."



 TOC 

2.2.  Syntax Notation

This specification uses the Augmented Backus-Naur Form (ABNF) notation of [RFC5234] (Crocker, D. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” January 2008.).

The following core rules are included by reference, as defined in [RFC5234] (Crocker, D. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” January 2008.), Appendix B.1: ALPHA (letters), CR (carriage return), CRLF (CR LF), DIGIT (decimal 0-9), OCTET (any 8-bit sequence of data), SP (space), VCHAR (any visible [USASCII] character), and WSP (whitespace).



 TOC 

2.3.  Terminology

This specification the following HyBi-related terms:

connection: A transport layer virtual circuit established between a client and a server for the purpose of communication.

control-frame: A frame used to control connection behavior outside of the application data stream.

frame: The basic unit of WebSocket communication, consisting of a structured sequence of octets matching the syntax defined in the actual protocol and transmitted on the established communication channel.

message: user message: a block of related data with identified boundaries.

origin server: The server on which a given resource resides or is to be created.



 TOC 

2.4.  Basic Rules

The following basic rules follow the definitions in HTTP [RFC2616] (Fielding, R., Gettys, J., Mogul, J., Mainter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.).

  token          = 1*tchar

  tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
                 / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
                 / DIGIT / ALPHA
                 ; any VCHAR, except special

The following URI definitions come from [RFC3986] (Berners-Lee, T., Fielding, R., and L. Mainter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.).

  URI           = <URI, defined in [RFC3986], Section 3>
  URI-reference = <URI-reference, defined in [RFC3986], Section 4.1>
  absolute-URI  = <absolute-URI, defined in [RFC3986], Section 4.3>
  relative-part = <relative-part, defined in [RFC3986], Section 4.2>
  authority     = <authority, defined in [RFC3986], Section 3.2>
  path-abempty  = <path-abempty, defined in [RFC3986], Section 3.3>
  path-absolute = <path-absolute, defined in [RFC3986], Section 3.3>
  port          = <port, defined in [RFC3986], Section 3.2.3>
  query         = <query, defined in [RFC3986], Section 3.4>
  uri-host      = <host, defined in [RFC3986], Section 3.2.2>

  partial-URI   = relative-part [ "?" query ]

The following is the format of a WebSocket URI.

     ws-URI = "ws:" "//" authority partial-URI

The following is the format of a WebSocket URI.

     wss-URI = "ws:" "//" authority partial-URI


 TOC 

3.  Client Requirements



 TOC 

3.1.  Client handshake variables

Before establishing a connection, the client will gather the following data, typically from a WebSocket URL or client API call: /secure/, /host/, /port/, /resource/ and an optional /protocol/.

/secure/ is a flag indicating whether TLS will be used or not. The WebSocket URL uses the "wss:" scheme to indicate TLS, and "ws:" to indicate non-TLS.

/host/ is the virtual host name of the WebSocket server, as defined by the uri-host of [RFC3986] (Berners-Lee, T., Fielding, R., and L. Mainter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.). Since it is a US-ASCII lower-case name, any non-ascii names MUST be encoded.

/port/ is the TCP port of the WebSocket server, by the port of [RFC3986] (Berners-Lee, T., Fielding, R., and L. Mainter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.).

/resource/ is the server's resource name, as defined by the relative-part of [RFC3986] (Berners-Lee, T., Fielding, R., and L. Mainter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.).

/protocol/ is the sub-protocol name, if specified. as defined by the port of [RFC3986] (Berners-Lee, T., Fielding, R., and L. Mainter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.).

/origin/ calculated from the server's HTTP resource that initiated the WebSockets connection, for browser-client. [Ed: Non-browser clients will use ??]

/location/ is the WebSocket URL constructed from /secure/, /host/, /port/, and /resource/.



 TOC 

3.2.  Client Handshake

After gathering the handshake information above, the client initiates the handshake. If the handshake fails at any step, the client MUST close the connection.

The client send portion of the handshake proceeds as follows:

  1. The client MUST wait for any other handshake to the same server identified by /host/ and /port/ to complete or fail, i.e. the client MUST serialize handshakes to a server.
  2. If /secure/ is true, the client initiates a TLS handshake.
  3. The client sends the HTTP request header consisting of the HTTP "GET" method, /resource/ as the HTTP request-target, and "HTTP/1.1" as the version.
  4. The client MUST send the following HTTP headers:
    1. The HTTP "Host" header with /host/ as its value
    2. The HTTP "Connection: Upgrade" header
    3. The HTTP "Upgrade: WebSocket" header
    4. An "Origin" header with /origin/ as its value.
    5. The "Sec-WebSocket-Protocol" header with /protocol/ as its value if and only if /protocol/ is set.
  5. The client may send additional HTTP headers.
  6. The client MUST send the final CRLF for HTTP.
  7. The client may send the Init control frame followed by initial frames to cut down on round trip times. [Ed: whether to allow this pipeline behavior is under discussion by the working group.]

After sending the HTTP handshake header (and any optional pipelined data) the client MUST wait for the server handshake. The client MUST close the connection if it detects any errors detected when parsing the handshake.

The handshake parsing preceeds as follows:

  1. The client parses the HTTP Status-Line. If the Status-Line version is "HTTP/1.1" and the status code is 101, proceed.
  2. The client parses the HTTP headers, reading all until the final CRLF and saving their values.
  3. If the received headers do not satisfy the following requirements, the client MUST close the connection.
    1. The "Connection" header MUST exist with value "Upgrade" .
    2. The "Upgrade" header MUST exist with value "WebSocket".
    3. The "Sec-WebSocket-Origin" header MUST exist with value /origin/.
    4. The "Sec-WebSocket-Location" header MUST exist with value /location/.
    5. The "Sec-WebSocket-Protocol" header MUST exist if and only if /protocol/ is set.
  4. [Ed: here the spec could be changed to validate a server Init control frame with a digest.]
  5. The client MUST send its Init control frame if and only if it has not already sent that control frame as pipelined data.
  6. The handshake is now complete.
  7. The client sends data as defined by the Stream syntax.
  8. When complete, the client MUST send a Close control frame before closing the socket.



 TOC 

3.2.1.  Client Grammar

The full syntax for the WebSocket client is as follows:

   Request        = Request-Line
                    *( header-field CRLF )
                    CRLF
                    Init
                    Stream

   Request-Line   = "GET" SP request-target SP "HTTP/1.1" CRLF

   request-target = absolute-URI
                  / ( path-absolute [ "?" query ] )

   header-field   = field-name ":" SP [ field-content ]
   field-name     = token
   field-content  = *( SP / VCHAR )

   Init           = control-frame
                  ; where OP = 2 (INIT)

   client-header  = "Connection"
                  / "Host"
                  / "Origin"
                  / "Upgrade"
                  / ["Sec-WebSocket-Protocol"]
                  / *( field-name)
                  ; required, optional and extension client headers

[Ed: Because of the need to pipeline data to save on round-trip time, the "GET" method might be replaced by a method that allows a Content-Length header.



 TOC 

4.  Server Handshake



 TOC 

4.1.  Server handshake variables

Before establishing a connection, the server will gather the following data for a WebSocket resource: /secure/, /host/, /port/, /resource/ and an optional /origin/ and /protocol/.

  1. /secure/ is a flag indicating whether [RFC2246] (Dierks, T. and C. Allen, “The TLS Protocol Version 1.0,” January 1999.) will be used or not.
  2. /host/ is the virtual host name of the WebSocket server, as defined by the uri-host of [RFC3986] (Berners-Lee, T., Fielding, R., and L. Mainter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.). Since it is a US-ASCII lower-case name, any non-ascii names must be encoded.
  3. /port/ is the TCP port of the WebSocket server, as defined by the port of [RFC3986] (Berners-Lee, T., Fielding, R., and L. Mainter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.).
  4. /resource/ is the server's resource name. by the relative-part of [RFC3986] (Berners-Lee, T., Fielding, R., and L. Mainter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.).
  5. /origin/ calculated from the server's HTTP resource that initiates the WebSockets connection for browser clients, for example an HTML page that launches the client JavaScript. If the server does not launch the client in this fashion, /origin/ is unset.
  6. /protocol/ is the sub-protocol name, if specified.
  7. /location/ is the WebSocket URL constructed from /secure/, /host/, /port/, and /resource/.



 TOC 

4.2.  Server Handshake

After gathering the handshake information above, the server initiates the handshake. If the handshake fails at any step, the server MUST reject the request. [Ed: Can the server return a valid HTTP error response or must it close the connection?]

The server handshake parsing preceeds as follows:

  1. The server parses the HTTP Request-Line. If the Request-Line does not satisfy the following requirements, the server MUST reject the request.
    1. The HTTP method MUST be "GET".
    2. The relative-part MUST match /resource/.
    3. The HTTP version MUST be "HTTP/1.1".
  2. The server parses the HTTP headers, reading all until the final CRLF and saving their values.
  3. The received headers MUST satisfy the following requirements. If not, the server MUST reject the request.
    1. The "Connection" header MUST exist with value "Upgrade" .
    2. The "Upgrade" header MUST exist with value "WebSocket".
    3. If /origin/ is set, the "Origin" header MUST exist with value /origin/.
    4. If /protocol/ is set, the "Sec-WebSocket-Protocol" header MUST exist with value /origin/.

If the client request is valid, the server sends its handshake response as follows:

  1. The HTTP Status-Line version is "HTTP/1.1" and status code is "101". For example, "HTTP/1.1 101 WebSocket Upgrade".
  2. The server sends the following HTTP headers:
    1. "Connection" with value "Upgrade".
    2. "Upgrade" with value "WebSocket".
    3. "Sec-WebSocket-Origin" with the value of the client's "Origin" header.
    4. "Sec-WebSocket-Location" with the value /location/.
    5. "Sec-WebSocket-Protocol" header with value /protocol/ if and only if /protocol/ is set.
  3. The server may send additional HTTP headers.
  4. The server sends the final CRLF.

[Ed: here the spec could be changed to add a server Init control frame with a digest.]

The server must now read the client's Init control frame. If the control frame does not satisfy the following requirements, the server MUST close the connection.

  1. The frame code MUST be a control frame (%x80-%x8F).
  2. The control code MUST be 2. (The Init control code.)
  3. The control data MUST be "WebSocket".

The handshake is now complete.

The server now sends and receives data as defined by the Stream syntax.

The server may send a Close control frame and close the connection at any time.

When the server receives a Close control frame, it MUST stop reading, and close the connection.



 TOC 

4.2.1.  Server Grammar

   Response      = Status-Line
                   *( header-field CRLF )
                   CRLF
                   Stream

   Status-Line   = "HTTP/1.1" SP "101" SP Reason-Phrase CRLF
   Reason-Phrase = *( SP / VCHAR )

   header-field   = field-name ":" SP [ field-content ]
   field-name     = token
   field-content  = *( SP / VCHAR )

   server-header  = "Connection"
                  / "Upgrade"
                  / "Sec-WebSocket-Location"
                  / "Sec-WebSocket-Origin"
                  / ["Sec-WebSocket-Protocol"]
                  / *( field-name)
                  ; required, optional and extension client headers


 TOC 

5.  Stream Protocol



 TOC 

5.1.  Stream syntax

Once the handshake has been established, the message stream is symmetrical. Each side sends and reads a sequence of data messages split into frames interleaved with any control frames interleaved.

A stream is a sequence of binary data messages, where each data message is a sequence of partial data frames. Control frames may appear between data messages to control the connection.

If the application message data is unicode text, as in a JavaScript browser, the sender MUST encode the text as UTF-8.

Because the sending side may use a fixed-sized buffer, it may split a message into any number of non-final data frames followed by the final data frame. Short messages will fit into a single final-frame.

The syntax of each frame is defined in the next section.

When closing a connection, the client and server MUST send a CLOSE control frame. No bytes may be sent or read after the CLOSE control frame.

A receiver MUST close the connection if it detects any errors while reading, including any illegal frame syntax, too-long frame lengths, or any unknown control frame. The client and server MUST NOT attempt to recover from frame errors.

The stream syntax is defined by the following grammar. The frame section below defines the grammar of the frames themselves.

   Stream               = *( Message / control-frame) Close
   Message              = *( non-final-data-frame ) final-data-frame

   Close                = control-frame
                        ; where control-op=CLOSE


 TOC 

5.2.  Frame syntax

A frame consists of an initial code byte, followed by the length of the frame encoded by a variable number of bytes, followed by the frame data.

   final-data-frame     = final-code length data
   non-final-data-frame = non-final-code length data
   control-frame        = control-code length control-op control-data

   final-code           = %x00-0F
   non-final-code       = %x40-4F
   control-code         = %x80-8F

   length               = N*OCTET
                        ; where N = (code & 0xF)

   data                 = M*OCTET
                        ; where M = length as a big-endian integer

   control-op           = OCTET
   control-data         = (M-1)*OCTET

A frame's initial byte encodes the type of the frame: control, non-final data, and final-data, and it encodes the number of bytes in the length.

The number of length bytes (N above) is variable to both encode short messages efficiently and also allow for large frames, for example read directly using sendfile. Because the N is completely encoded in the first byte, the variable-length can be easily validated, allowing for a restriction to 32-bits if desired. [Ed: the spec could require that N MUST be 4 or less.]

The following non-normative pseudo-code shows parsing of the frame.

  code = read();

  is_control = (code & 0x80) != 0;
  is_final = (code & 0x40) == 0;

  meta_length = code & 0x0f;

  // optional validation of meta_length would be here

  length = 0;
  for (i = 0; i < meta_length; i++) {
    length = 256 * length + read();
  }

  read(buffer, 0, length);

Control frames consist of a single OP byte followed by any control data. Only control OP codes defined by WebSockets are allowed. If a client or server receives a control OP not defined by WebSockets, it MUST close the connection.

Control messages allow the client and server to manage the stream behavior, like graceful close, keepalive messages, and even allowing for multiplexing extensions.



 TOC 

5.3.  Stream Close

Because WebSockets needs to distinguish an intentional close from a dropped connection, the client or server MUST send a CLOSE control frame at the end of the stream. Either side may choose to close the connection gracefully at any time.

When the client or server wishes to close the stream gracefully, it MUST send a CLOSE control frame. After sending the CLOSE, no other data may be sent and the TCP socket must also be closed.



 TOC 

5.4.  Connection Keepalive

Because TCP connection may drop without notification to either client or server, either by network failure or by TCP router timeouts, the WebSocket protocol defines a pair of keepalive control frames. By defining a pair of control frames, WebSockets avoids circular ping cascades.

Either the client or server may send a KEEPALIVE-REQUEST control frame to determine if the connection is still alive. The peer MUST respond with a KEEPALIVE-RESPONSE.

If the KEEPALIVE-REQUEST sending peer does not receive a response within a reasonable time, it may close the connection. The client may establish a new connection, but recovery of the original stream is not defined by WebSockets, and must be defined by the application or sub-protocol.



 TOC 

6.  HTTP Headers

The following describes the HTTP headers used during the initial handshake. Headers defined by HTTP follow the HTTP specification [RFC2616] (Fielding, R., Gettys, J., Mogul, J., Mainter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.).

The client required headers are "Connection", "Host", "Origin", and "Upgrade". The optional header is "Sec-WebSocket-Protocol". Other HTTP headers may be used, but are not defined or mandated by the WebSockets specification.

The server required headers are "Connection", "Sec-WebSocket-Location", "Sec-WebSocket-Origin", and "Upgrade". "Sec-WebSocket-Protocol" is required if the client sends it, and forbidden otherwise. Other HTTP headers may be used, but are not defined or mandated by the WebSockets specification.



 TOC 

6.1.  Connection

[RFC2616] (Fielding, R., Gettys, J., Mogul, J., Mainter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.) defines the Connection header. WebSockets uses the Connection header following the HTTP specification.

The WebSockets client must send a "Connection: Upgrade" header as part of its handshake request to ask the server to upgrade the HTTP connection to a WebSockets connection. If the server does not receive a "Connection: Upgrade" header it must reject the connection.

The WebSockets server must return a "Connection: Upgrade" header as part of its handshake confirmation. If the client does not receive a "Connection: Upgrade" header it must reject the connection.



 TOC 

6.2.  Host

[RFC2616] (Fielding, R., Gettys, J., Mogul, J., Mainter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.) defines the Host header. WebSockets uses the Host header following the HTTP specification.

The WebSockets client MUST send a "Host" header with the virtual host name as required by HTTP. The host name is case-insensitive, and ascii, i.e. non-ascii names MUST be encoded.



 TOC 

6.3.  Origin

The client MUST send an Origin header during the handshake to inform the server of the source HTTP page.

The server may use the Origin header to reject connections from unknown origins, preventing certain kinds of browser hijacking scenarios.

[Ed: must non-browser clients send a dummy Origin even though the concept is meaningless?]



 TOC 

6.4.  Sec-WebSocket-Location

The server MUST return a Sec-WebSocket-Location header as part of its handshake. The value is a WebSocket URL constructed from the /host/, /post/, /resource/, and /secure/ flag.

If the client does not receive a Sec-WebSocket-Location header from the server, or the server's value does not match the client's calculated value, the client MUST reject the connection.



 TOC 

6.5.  Sec-WebSocket-Origin

The server MUST return a Sec-WebSocket-Origin header as part of its handshake. The value is the same as the Origin header sent by the client, /origin/.

If the client does not receive a Sec-WebSocket-Origin header, or if the value does not match /origin/, the client MUST reject the connection.



 TOC 

6.6.  Sec-WebSocket-Protocol

The Sec-WebSocket-Protocol is an optional header used by the client and server to negotiate any sub-protocol during the handshake.

If the client wishes to negotiate or validate a sub-protocol, it will send a Sec-WebSocket-Protocol header with the requested sub-protocol, /protocol/.

If the server does not understand the sub-protocol, it MUST reject the connection.

If the client sends a Sec-WebSocket-Protocol, but the server does not return a Sec-WebSocket-Protocol header, or the server's returned value is not /protocol/, the client MUST reject the connection.

NOTE: Although the sub-protocol value is an field-value, it is recommended to use unique names with a version to avoid conflicts, such as "tictactoe.example.com/1.0".



 TOC 

6.7.  Upgrade

[RFC2616] (Fielding, R., Gettys, J., Mogul, J., Mainter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.) defines the Upgrade header. WebSockets uses the Upgrade header following the HTTP specification.

The client MUST send an "Upgrade: WebSocket" header as part of its handshake request to ask the server to upgrade the HTTP connection to a WebSockets connection. If the server does not receive an "Upgrade: WebSocket" header, it MUST reject the connection.

The server MUST return a "Upgrade: WebSocket" header as part of its handshake confirmation. If the client does not receive an "Upgrade: WebSocket" header, it must REJECT the connection.



 TOC 

7.  Control Frames

Each control frame has an opcode in the range %x00-7F, followed by any control data for the opcode. [Ed: %x80-FF are reserved to allow for multiple byte opcodes beyond 1 byte. In practice, those will never be defined.]

Except for the ops defined here (0-4), the codes are reserved by the specification. Applications MUST NOT define their own control frames.



 TOC 

7.1.  NOP (op=0)

control-op = 0 is the no-operation control frame.

The NOP control frame is three bytes as follows:

    %x81 %x01 %x00


 TOC 

7.2.  CLOSE (op=1)

control-op = 1 is the stream close control frame.

Either the server or client may send a CLOSE at any time. After sending the CLOSE, the stream MUST be closed.

The CLOSE control frame is three bytes as follows:

    %x81 %x01 %x01


 TOC 

7.3.  INIT (op=2)

control-op = 2 is the client initialization control frame. To protect against scripting attacks, the client sends an initial control frame with "WebSocket" as the control-data. If the server does not see the INIT frame or its contents are not WebSocket, the server MUST close the connection.

The INIT control frame is 12 bytes as follows:

    %x81 %x0a %x02 WebSocket

[Ed: The server could also have a Server-Init control frame with a signature of the mandatory client headers, and possibly a nonce value, e.g. MD5(/nonce/ + " " + /host/ + " " + /origin/ + " " + /resource/ + " " + /protocol/)]



 TOC 

7.4.  KEEPALIVE-REQUEST (op=3)

Either the client or the server may initiate a keepalive request to check if the connection is still available. The receiving end MUST respond with a KEEPALIVE-RESPONSE control frame.

Because the WebSocket connection is long-lived, intermediaries like home routers might close idle connections without notifying either end. Clients and servers may use the KEEPALIVE-REQUEST ping to check the status of the connection.

The KEEPALIVE-REQUEST control frame is 3 bytes as follows:

    %x81 %x01 %x03

[Ed: the working group has also discussed asymmetrical heartbeats as an alternative to the ping-style. The heartbeat would require more negotiation than defined in this draft.]



 TOC 

7.5.  KEEPALIVE-RESPONSE (op=4)

When either end receives a KEEPALIVE-REQUEST control frame, it MUST send a KEEPALIVE-RESPONSE, to let the other end know the connection is still available.

The KEEPALIVE-RESPONSE control frame is 3 bytes as follows:

    %x81 %x01 %x04

[Ed: the working group has also discussed asymmetrical heartbeats as an alternative to the ping-style.]



 TOC 

7.6.  [NEGOTIATE-HEADER (op=5)]

This is non-normative

[Ed: this section is not part of the WebSockets proposal itself. It is here to make sure dynamic negotiation is possible to support, for example for asymmetric heartbeats.]

If the spec needs to add dynamic negotiation, for example a heartbeat timeout, the NEGOTIATE-HEADER can be added. Its data is a "key: value" string like a HTTP header.

The NEGOTIATE control frame is variable length and might look like:

    %x81 %x10 %x06 Heartbeat: 120s


 TOC 

8.  [Multiplexing Extension]

This is non normative

[Ed: this section is not part of the websockets proposal itself. It's to check that a multiplexing extension would work as an extension to this draft.]

Because WebSockets connections are very long-lived, they will impose a bigger resource requirement on servers than HTTP. To manage these connections, it can be more efficient to share the same TCP connection for multiple virtual WebSocket streams.

In addition, multiplexing interleaves large data like a video with smaller, more responsive messages, without waiting for the entire data to complete.

The multiplexing extension turns a single stream into a set of interleaved channels. In this extension proposal, the channels switch using control messages.

Multiplexing is enabled by a "WebSocket-Multiplex" header sent by the client. The value is the number of channels supported. If the server supports multiplexing, it returns a "WebSocket-Multiplex" header in its handshake response with the maximum channel allowed.



 TOC 

8.1.  CHANNEL-SELECT (op=20)

The CHANNEL-SELECT control switches the data stream to a new virtual channel. The channel identifier is a 16-bit integer specified by the two bytes in the control-data. If the channel does not exist, it will be created.

The initial (default) channel is channel #0. Client-initiated channels are always even. Server-initiated channels are always odd.



 TOC 

8.2.  CHANNEL-CLOSE (op=21)

The CHANNEL-CLOSE closes an opened channel. Like the connection CLOSE, it's a graceful close. Either end may send a close independently.



 TOC 

9.  Security Considerations

This section is meant to inform application developers and users of security issues related to WebSockets. This list is unlikely to be complete.



 TOC 

9.1.  HTTP

Many, if not most, of the security issues related to HTTP are also present in WebSockets, because WebSockets uses HTTP for its handshake, and because many WebSockets clients and servers will also be HTTP clients and servers.



 TOC 

9.2.  Browser Scripting attacks

Compromised HTTP sites or improperly designed HTTP applications can allow arbitrary JavaScript code to execute on a browser. These scripts may attempt to open WebSockets connections. The risk can be minimized by servers checking the Origin header, but this may not be sufficient.

Hijacked clients may also attempt to open a WebSocket connection using a HTTP/XML connection from the browser, attempting to spoof a valid WebSocket connection. WebSocket servers should be written to minimize these risks.

Hijacked clients may open a WebSocket connection to a non-WebSocket HTTP service.



 TOC 

10.  Acknowledgements

This specification draft is substantially derived from Ian Hickson's "The WebSockets Protocol" at http://www.whatwg.org.specs/web-socket-protocol/.

The author of this draft is named purely for identification purposes, not to claim authorship credit for the specification.



 TOC 

11. Normative References

[HTML] Hickson, I., “HTML,” May 2010.
[HYBI] Loreto, S., “HyBi Requirements and Features,” March 2010.
[ORIGIN] Barth, A., Jackson, C., and I. Hickson, “The HTTP Origin Header,” September 2009.
[RFC2119] Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” RFC 2119, BCP 14, March 1997.
[RFC2246] Dierks, T. and C. Allen, “The TLS Protocol Version 1.0,” RFC 2246, January 1999.
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Mainter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” RFC 2616, June 1999.
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, “Internationalizing Domain Names in Applications (IDNA),” RFC 3490, March 2003.
[RFC3629] Yergeau, F., “UTF-8, a transformation format of ISO 10646,” STD 63, RFC 3629, November 2003.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Mainter, “Uniform Resource Identifier (URI): Generic Syntax,” STD 66, RFC 3986, January 2005.
[RFC3987] Duerst, M. and M. Suignard, “Internationalized Resource Identifier (IRIs),” RFC 3987, January 2005.
[RFC4366] Blake-Wilson, S., Nystrom, M., Hopwood, D., Mikkelsen, J., and T. Wright, “Transport Layer Security (TLS) Extensions,” RFC 4366, April 2006.
[RFC5234] Crocker, D. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” STD 68, RFC 5234, January 2008.
[WEBADDRESSES] Connolly, D. and C. Sperberg-McQueen, “Web addresses in HTML 5,” May 2009.
[WSAPI] Hickson, I., “The Web Sockets API,” May 2010.


 TOC 

Author's Address

  Scott Ferguson
  Caucho Technology


 TOC 

Full Copyright Statement

Intellectual Property