TOC |
|
The WebSocket protocol enables a bidirectional stream of messages between a client and a server. Messages consist of a sequence of binary frames over TCP. The protocol uses HTTP for its handshake, upgrading to the bidirectional binary frames defined in this document.
By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”
This Internet-Draft will expire on November 17, 2010.
1.
Introduction
1.1.
Requirements
1.2.
Protocol Overview
2.
General Requirements
2.1.
Requirements
2.2.
Syntax Notation
2.3.
Terminology
2.4.
Basic Rules
3.
Client Handshake
3.1.
Client handshake variables
3.2.
Client Handshake
3.2.1.
Client Grammar
4.
Server Handshake
4.1.
Server handshake variables
4.2.
Server Handshake
4.2.1.
Server Grammar
5.
Stream Protocol
5.1.
Stream syntax
5.2.
Frame syntax
5.3.
Stream Close
5.4.
Connection Keepalive
6.
Control Frames
6.1.
NOP (op=0)
6.2.
CLOSE (op=1)
6.3.
HELLO (op=2)
6.4.
ERROR (op=3)
6.5.
HEADERS (op=4)
6.6.
PING-REQUEST (op=5)
6.7.
PING-RESPONSE (op=6)
7.
HELLO Headers
7.1.
url
7.2.
origin
7.3.
protocol
7.4.
status
8.
Security Considerations
8.1.
HTTP
8.2.
Browser Scripting attacks
9.
Acknowledgements
10.
Normative References
§
Author's Address
§
Intellectual Property and Copyright Statements
TOC |
NOTE: this section is copied verbatim from the [I.D.loreto‑hybi‑requirements] (Loreto, S., “HyBi Requirements and Features,” March 2010.) requirements document.
HTTP RFC2616 (Fielding, R., Gettys, J., Mogul, J., Mainter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.) [RFC2616] is a client/server protocol, where the HTTP servers store the data and provide it when it is requested by clients. When used to used to retrieve data from an HTTP server, the client sends HTTP requests to the server, and the server returns the requested data in HTTP responses. So the client has to poll continuously the server in order to receive new data.
Recently techniques that enable bidirectional communication over HTTP have become more pervasive. Those techniques reduce the need to poll continuously the server thanks to the usage of HTTP hanging requests and multiple connections between the client and the server [I-D.loreto-http-bidirectional].
The goal of HyBi is to provide an efficient and clean two-way communication channel between client and server.
The communication channel will:
TOC |
NOTE: requirements are from [I.D.loreto‑hybi‑requirements] (Loreto, S., “HyBi Requirements and Features,” March 2010.) requirements document.
TOC |
This section is non-normative.
A WebSockets client connects either to a WebSockets port with a plain TCP connection [Ed: port 880 looks like it's available] or using TLS to a possibly-shared HTTP port 443.
When using TLS, the client will use [I.D.agl‑tls‑nextprotoneg] (Langley, A., “Transport Layer Security (TLS) Next Protocol Negotiation Extension,” January 2010.) to select WebSockets as the NextProtocol, letting WebSockets share the same TCP port with existing HTTP servers.
The WebSocket protocol begins with a handshake using a WebSocket HELLO control frame from the client and a HELLO control frame from the server. After a peer has sent its hello, it may send messages and control frames until the final CLOSE control frame.
The client handshake is a HELLO control frame that looks like the following:
%x80.02.00.86 WebSocket/1.0 url: ws://example.com:880/sample/resource origin: http://example.com/launchpage.php protocol: tictactoe.example.com/1.0 <Stream data follows>
The server handshake is a HELLO control frame that looks like the following:
%x80.02.00.32 WebSocket/1.0 protocol: tictactoe.example.com/1.0 <Stream data follows>
The bidirectional stream consists of a sequence of data frames combined into application messages, and control frames used to manage the connection itself.
Stream = *( Message / control-frame ) Message = *( non-final-frame ) final-frame
A typical message might consist of a single data frame encoding a text message using UTF-8.
%x00.00.00.0C Hello, world
A long message can be broken into multiple frames, where the first frame signals more data is available.
%x40.00.00.06 Hello, %x00.00.00.06 world
Control frames are short WebSocket-specific frames with an 8-bit opcode used to control the connection. The following is a PING-REQUEST to test the liveness of the connection.
%x80.05.00.00
The 32-bit data frame header has the high bit clear, a "more" flag, and a 28-bit payload length.
+---+------+------+------------+ | 0 | M(1) | X(2) | length(28) | +---+------+------+------------+
The 32-bit control frame header has the high bit set, an 8-bit opcode, and a 16-bit payload length.
+---+------+-------+------------+ | 1 | X(7) | op(8) | length(24) | +---+------+-------+------------+
The client and server will send application messages asynchronously until the end of the stream. Each will close the stream with a CLOSE control frame.
Clients and servers may use a Keepalive control frame to verify if a connection is still valid, which may be needed when network routers drop connections silently.
TOC |
TOC |
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
An implementation is not compliant if it fails to satisfy one or more of the MUST or REQUIRED level requirements for the protocols it implements. An implementation that satisfies all the MUST or REQUIRED level and all the SHOULD level requirements for its protocols is said to be "unconditionally compliant"; one that satisfies all the MUST level requirements but not all the SHOULD level requirements for its protocols is said to be "conditionally compliant."
TOC |
This specification uses the Augmented Backus-Naur Form (ABNF) notation of [RFC5234] (Crocker, D. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” January 2008.).
The following core rules are included by reference, as defined in [RFC5234] (Crocker, D. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” January 2008.), Appendix B.1: LF (line feed), SP (space),
TOC |
This specification the following HyBi-related terms:
connection: A transport layer virtual circuit established between a client and a server for the purpose of communication.
control-frame: A frame used to control connection behavior outside of the application data stream.
frame: The basic unit of WebSocket communication, consisting of a structured sequence of octets matching the syntax defined in the actual protocol and transmitted on the established communication channel.
message: user message: a block of related data with identified boundaries.
origin server: The server on which a given resource resides or is to be created.
TOC |
The following URI definitions come from [RFC3986] (Berners-Lee, T., Fielding, R., and L. Mainter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.).
absolute-URI = <absolute-URI, defined in [RFC3986], Section 4.3> relative-part = <relative-part, defined in [RFC3986], Section 4.2> authority = <authority, defined in [RFC3986], Section 3.2> port = <port, defined in [RFC3986], Section 3.2.3> query = <query, defined in [RFC3986], Section 3.4> uri-host = <host, defined in [RFC3986], Section 3.2.2>
The following is the format of a WebSocket URI.
ws-URI = "ws:" "//" host ":" port relative-part [ "?" query ]
The following is the format of a WebSocket TLS URI.
wss-URI = "wss:" "//" host ":" port relative-part [ "?" query ]
TOC |
TOC |
Before establishing a connection, the client will gather the following data, typically from a WebSocket URL or client API call: /secure/, /host/, /port/, /resource/ and an optional /protocol/.
/secure/ is a flag indicating whether TLS will be used or not. The WebSocket URL uses the "wss:" scheme to indicate TLS, and "ws:" to indicate non-TLS.
/url/ is the absolute URL of the server resource. For WebSockets, the URL is constructed from the virtual host, the port, and the resource name.
/protocol/ is the application protocol name. If the application does not set a protocol, use the string "text".
/origin/ calculated from the server's HTTP resource that initiated the WebSockets connection, for browser-client. [Ed: Non-browser clients will use ??]
TOC |
After gathering the handshake information above, the client initiates the handshake. If the handshake fails at any step, the client MUST close the connection.
The client send portion of the handshake proceeds as follows:
TOC |
The full syntax for the data sent by the WebSocket client is as follows:
Request = HELLO Stream CLOSE HELLO = control-frame ; where OP = 1 (HELLO) CLOSE = control-frame ; where OP = 2 (CLOSE) HELLO-payload = "WebSocket/1.0" LF "url:" SP /url/ LF "origin:" SP /origin/ LF "protocol:" SP /protocol/ LF * ( header ":" SP value LF )
TOC |
TOC |
Before establishing a connection, the server will gather the following data for a WebSocket resource: /secure/, /url/, /protocol/, and an optional /origin/.
TOC |
After gathering the handshake information above, the server initiates the handshake. If the handshake fails at any step, the server MUST reject the request with an ERROR control frame.
TOC |
Response = HELLO Stream CLOSE HELLO = control-frame ; where OP = 1 (HELLO) CLOSE = control-frame ; where OP = 2 (CLOSE) HELLO-payload = "WebSocket/1.0" LF "status: 200" LF "origin:" SP /origin/ LF "protocol:" SP /protocol/ LF * ( header ":" SP value LF )
TOC |
TOC |
Once the handshake has been established, the message stream is symmetrical. Each side sends and reads a sequence of data messages split into frames interleaved with any control frames interleaved.
A stream is a sequence of binary data messages, where each data message is a sequence of partial data frames. Control frames may appear between data messages to control the connection.
If the application message data is unicode text, as in a JavaScript browser, the sender MUST encode the text as UTF-8.
Because the sending side may use a fixed-sized buffer, it may split a message into any number of non-final data frames followed by the final data frame. Short messages will fit into a single final-frame.
The syntax of each frame is defined in the next section.
When closing a connection, the client and server MUST send a CLOSE control frame. No bytes may be sent or read after the CLOSE control frame.
A receiver MUST close the connection if it detects any errors while reading, including any illegal frame syntax, too-long frame lengths, or any unknown control frame. The client and server MUST NOT attempt to recover from frame errors.
The stream syntax is defined by the following grammar. The frame section below defines the grammar of the frames themselves.
Stream = *( Message / control-frame) Close Message = *( non-final-data-frame ) final-data-frame Close = control-frame ; where control-op=CLOSE
TOC |
A frame consists of an initial code byte, followed by the length of the frame encoded by a variable number of bytes, followed by the frame data.
final-data-frame = final-header N*OCTET ; where N = the length encoded in the header non-final-data-frame = non-final-header N*OCTET ; where N = the length encoded in the header control-frame = control-header N*OCTET ; where N = the length encoded in the header final-header = %x00-0F 3*OCTET ; where the last 28 bits encode the ; length N as a big endian integer non-final-header = %x40-4F 3*OCTET ; where the last 28 bits encode the ; length N as a big endian integer control-header = %x80 OP 2*OCTET ; and the last 16 bits encode the ; length N as a big endian integer OP = OCTET
The 32-bit data frame header has the high bit clear, a "more" flag, 2 reserved bits, and a 28-bit payload length.
+---+------+------+------------+ | 0 | M(1) | X(2) | length(28) | +---+------+------+------------+
The 32-bit control frame header has the high bit set, a 7-bit opcode, and a 16-bit payload length.
+---+------+---------+------------+ | 1 | X(7) | code(8) | length(16) | +---+------+---------+------------+
The high bit of the initial byte determines a control frame (1) against a data frame.
The following non-normative pseudo-code shows parsing of the frame.
header = read32(); // read 32 bits in network order is_control = (code & 0x80000000) != 0; if (is_control) { // control frame control_op = (code >> 16); length = (code & 0x0000ffff); } else { // data frame is_final = (code & 0x40000000); length = (code & 0x0fffffff); } read(buffer, 0, length);
Control frames OP codes are defined by WebSockets, and may not be used by applications. If a client or server receives a control OP not defined by WebSockets, it MUST close the connection.
Control messages allow the client and server to manage the stream behavior, like graceful close, keepalive messages, and even allowing for multiplexing extensions.
TOC |
Because WebSockets needs to distinguish an intentional close from a dropped connection, the client or server MUST send a CLOSE control frame at the end of the stream. Either side may choose to close the connection gracefully at any time.
When the client or server wishes to close the stream gracefully, it MUST send a CLOSE control frame. After sending the CLOSE, no other data may be sent and the TCP socket must also be closed.
TOC |
Because TCP connection may drop without notification to either client or server, either by network failure or by TCP router timeouts, the WebSocket protocol defines a pair of keepalive control frames. By defining a pair of control frames, WebSockets avoids circular ping cascades.
Either the client or server may send a PING-REQUEST control frame to determine if the connection is still alive. The peer MUST respond with a PING-RESPONSE.
If the PING-REQUEST sending peer does not receive a response within a reasonable time, it may close the connection. The client may establish a new connection, but recovery of the original stream is not defined by WebSockets, and must be defined by the application or sub-protocol.
TOC |
Each control frame has an opcode in the range %x00-7F, followed by any control data for the opcode. [Ed: the code %x7F is reserved to allow for opcodes beyond 127, where the full opcode is encoded as the first bytes of the payload. In practice, this will never be defined.]
Except for the ops defined here (0-5), the codes are reserved by the specification. Applications MUST NOT define their own control frames.
TOC |
control-op = 0 is the no-operation control frame.
The NOP control frame has no payload.
The NOP control frame is the following 4 bytes:
%x80.00.00.00
TOC |
control-op = 1 is the stream close control frame.
Either the server or client may send a CLOSE at any time. After sending the CLOSE, the sender MUST NOT send any data and the receiver MUST NOT read any further data; the stream MUST be closed.
The CLOSE control frame has no payload.
The CLOSE control frame is the following four bytes:
%x80.01.00.00
TOC |
control-op = 2 is the required initial control frame. The HELLO control frame MUST be the first data for both the client and server streams.
The HELLO payload consists of the WebSockets version followed by header: value pairs encoded in UTF-8. The header names must be lower case us-ascii.
The HELLO payload grammar, encoded as UTF-8:
HELLO-payload = "WebSocket/1.0" LF * ( header ":" SP value ) LF header = * ( ["a" - "z"] / ["0" - "9"] / "-" ) value = * ( value-char ) value-char = %x0020-10FFFF SP = %x0020 LF = %x000A
A client HELLO control frame looks like thefollowing:
%x80.02.00.86 WebSocket/1.0 url: ws://example.com:880/sample/resource origin: http://example.com/launchpage.php protocol: tictactoe.example.com/1.0
TOC |
control-op = 3 is an error frame informing a peer that the connection is being closed due to an error connection. In particular, a server will return an error frame to a failed client handshake.
The ERROR payload consists of the WebSockets version, and followed by header: value pairs.
The ERROR payload grammar, encoded as UTF-8:
ERROR-payload = "WebSocket/1.0" LF * ( header ":" SP value ) LF header = * ( ["a" - "z"] / ["0" - "9"] / "-" ) value = * ( value-char ) value-char = %x0020-10FFFF SP = %x0020 LF = %x000A
A server ERROR control frame looks like the following:
%x80.03.00.42 WebSocket/1.0 status: 404 message: the resource is not available.
TOC |
The HEADERS control frame allows for dynamic renegotiation of connection values like heartbeat timeouts, flow-control windows, etc.
HEADERS consists of a list of "header: value" pairs, like the HELLO frame.
The header payload is encoded as UTF-8.
The header names are restricted to US-ASCII lower case alphanumeric characters, plus the "-" character.
The HEADERS payload grammar, encoded as UTF-8:
HEADERS-payload = * ( header ":" SP value ) LF header = * ( ["a" - "z"] / ["0" - "9"] / "-" ) value = * ( value-char ) value-char = %x0020-10FFFF SP = %x0020 LF = %x000A
The HEADERS control frame might look like:
%x80.04.00.22 Heartbeat: 120s Buffer-Max: 65536
TOC |
The PING-REQUEST may be sent by either the client or the server to check if the connection is still valid. The receiving end MUST respond with a PING-RESPONSE control frame.
Because the WebSocket connection is long-lived, intermediaries like home routers might close idle connections without notifying either end. Clients and servers may use the PING-REQUEST ping to check the status of the connection.
It is recommended that clients and servers do not send PING-REQUEST unless specifically configured to do so by the application.
PING-REQUEST does not have a payload.
The PING-REQUEST control frame is the following 4 bytes:
%x80.05.00.00
[Ed: the working group has also discussed asymmetrical heartbeats as an alternative to the ping-style. For the heartbeat to work, the timeouts would need to be negotiated in HELLO or HEADER.]
TOC |
The PING-RESPONSE is a response control frame to the PING-REQUEST. When a peer receives a PING-REQUEST control frame, it MUST send a PING-RESPONSE, to let the other end know the connection is still available.
PING-RESPONSE does not have a payload.
The PING-RESPONSE control frame is 4 bytes as follows:
%x80.06.00.00
[Ed: the working group has also discussed asymmetrical heartbeats as an alternative to the ping-style.]
TOC |
The following describes the HELLO headers used during the initial handshake. HELLO values are UTF-8 strings and the header names are lower-case ALPHA characters plus the "-" character.
The client required headers are "url", "origin", and "protocol". Other HELLO headers may be used, but are not defined or mandated by the WebSockets specification.
The server required headers are "status", "origin", and "protocol". Other HELLO headers may be used, but are not defined or mandated by the WebSockets specification.
TOC |
The "url" must be a valid [RFC3986] (Berners-Lee, T., Fielding, R., and L. Mainter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.) absolute-URI. In particular, the host must be defined.
The URL value is encodes as UTF-8.
TOC |
The client MUST send an "origin" header during the handshake to inform the server of the source HTTP page.
The server may use the "origin" header to reject connections from unknown origins, preventing certain kinds of browser hijacking scenarios.
[Ed: must non-browser clients send a dummy "origin" even though the concept is meaningless?]
TOC |
"protocol" is a required header used to validate the application protocol build on top of WebSockets.
If the server does not understand the protocol, it MUST reject the connection.
The server must return the same protocol in the client HELLO if it does understand the protocol.
Although the protocol value is an arbitrary header-value, it is recommended to use unique names with a version to avoid conflicts, such as "tictactoe.example.com/1.0".
The "text" protocol is reserved by WebSockets. The payload for the "text" protocol MUST be unicode characters encoded in UTF-8.
TOC |
"status" is a required header in the server's HELLO frame giving the handshake value.
For WebSockets, the value must be "200" for a successful connection.
TOC |
This section is meant to inform application developers and users of security issues related to WebSockets. This list is unlikely to be complete.
TOC |
Many, if not most, of the security issues related to HTTP are also present in WebSockets, because WebSockets uses HTTP for its handshake, and because many WebSockets clients and servers will also be HTTP clients and servers.
TOC |
Compromised HTTP sites or improperly designed HTTP applications can allow arbitrary JavaScript code to execute on a browser. The hijacked script might attempt to use a HTTP request for a WebSocket server, or might attempt to use a WebSocket request for a HTTP server.
The script may also use a WebSocket request for an entirely different server than the requesting page. The risk can be minimized by servers checking the "origin" header, but this may not be sufficient.
Hijacked clients may also attempt to open a WebSocket connection using a HTTP/XML connection from the browser, attempting to spoof a valid WebSocket connection. WebSocket servers should be written to minimize these risks.
Hijacked clients may open a WebSocket connection to a non-WebSocket HTTP service.
TOC |
This specification draft is substantially derived from Ian Hickson's "The WebSockets Protocol" at http://www.whatwg.org.specs/web-socket-protocol/.
This draft also incorporates discussions from the HyBi mailing list.
TOC |
[HTML] | Hickson, I., “HTML,” May 2010. |
[I.D.abarth-origin] | Barth, A., Jackson, C., and I. Hickson, “The HTTP Origin Header,” September 2009. |
[I.D.agl-tls-nextprotoneg] | Langley, A., “Transport Layer Security (TLS) Next Protocol Negotiation Extension,” January 2010. |
[I.D.loreto-hybi-requirements] | Loreto, S., “HyBi Requirements and Features,” March 2010. |
[RFC2119] | Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” RFC 2119, BCP 14, March 1997. |
[RFC2246] | Dierks, T. and C. Allen, “The TLS Protocol Version 1.0,” RFC 2246, January 1999. |
[RFC2616] | Fielding, R., Gettys, J., Mogul, J., Mainter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” RFC 2616, June 1999. |
[RFC3490] | Faltstrom, P., Hoffman, P., and A. Costello, “Internationalizing Domain Names in Applications (IDNA),” RFC 3490, March 2003. |
[RFC3629] | Yergeau, F., “UTF-8, a transformation format of ISO 10646,” STD 63, RFC 3629, November 2003. |
[RFC3986] | Berners-Lee, T., Fielding, R., and L. Mainter, “Uniform Resource Identifier (URI): Generic Syntax,” STD 66, RFC 3986, January 2005. |
[RFC3987] | Duerst, M. and M. Suignard, “Internationalized Resource Identifier (IRIs),” RFC 3987, January 2005. |
[RFC4366] | Blake-Wilson, S., Nystrom, M., Hopwood, D., Mikkelsen, J., and T. Wright, “Transport Layer Security (TLS) Extensions,” RFC 4366, April 2006. |
[RFC5234] | Crocker, D. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” STD 68, RFC 5234, January 2008. |
[WEBADDRESSES] | Connolly, D. and C. Sperberg-McQueen, “Web addresses in HTML 5,” May 2009. |
[WSAPI] | Hickson, I., “The Web Sockets API,” May 2010. |
TOC |
Scott Ferguson | |
Caucho Technology |
TOC |
Copyright © The IETF Trust (2010).
This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.
This document and the information contained herein are provided on an “AS IS” basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.