TOC 
SIPPING J. Rosenberg
Internet-Draft dynamicsoft
Expires: August 25, 2003 February 24, 2003

Interactive Connectivity Establishment (ICE): A Methodology for Network Address Translator (NAT) Traversal for the Session Initiation Protocol (SIP)
draft-rosenberg-sipping-ice-00

Status of this Memo

This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on August 25, 2003.

Copyright Notice

Copyright (C) The Internet Society (2003). All Rights Reserved.

Abstract

This document describes a methodology for Network Address Translator (NAT) traversal for the Session Initiation Protocol (SIP). This methodology is called Interactive Connectivity Establishment (ICE). ICE is not a new protocol, but rather makes use of existing protocols, such as Simple Traversal of UDP Through NAT (STUN), Traversal Using Relay NAT (TURN) and even Real Specific IP (RSIP). ICE works through the mutual cooperation of both endpoints in a SIP dialog. By having the endpoints work together in NAT traversal, a number of important properties are obtained. ICE always works, independent of the types or number of NATs. It always represents the cheapest solution for a carrier. It always results in the minimum voice latency. It can be done with no increase in call setup delays. It is far less brittle than STUN. ICE also facilitates the transition of the Internet from IPv4 to IPv6, supporting calls between dual-stack and v6 clients behind a 4to6 NAT. Preconditions can be used in conjunction with ICE, to guarantee that the phone never rings unless the users will both hear and see each other when they pick up.



 TOC 

Table of Contents




 TOC 

1. Introduction

The subject of NAT traversal for SIP has received a profound amount of attention. SIP extensions have been defined for routing responses[11] through NAT, and for routing requests from a public network to a private one through persistent connections[12].

However, far more troubling is the traversal of SIP's media sessions through NAT. Numerous solutions have been proposed for that. These include Application Layer Gateways (ALGs), the Middlebox Control Protocol[2], Simple Traversal of UDP through NAT (STUN)[13], Traversal Using Relay NAT"[14], Realm Specific IP[3][4], Topology Insensitive Service Traversal (TIST)[15], symmetric RTP[16], along with protocol extensions needed to make them work, such as [17]. The sum total is so complex, that an extensive scenarios document[18] has been written. This document outlines the various network situations, and analyzes the various solutions in each. Unsurprisingly, each situation has a specific ideal solution.

However, the result is a system which is incredibly complex and very brittle. What is needed is a single solution which is flexible enough to work well in all situations.

Our proposal for such a solution is called Interactive Connectivity Establishment, or ICE. ICE makes use of many of the protocols above, but uses them in a specific methodology which avoids many of the pitfalls of using any one alone. ICE is not a new protocol, and does not require extensions from STUN, TURN or RSIP. However, it does require some additional SDP attributes, which are discussed below.



 TOC 

2. Overview of ICE

ICE makes the fundamental assumption that clients exist in a network of segmented connectivity. This segmentation is the result of a number of addressing realms in which a client can simultaneously be connected. We use "realms" here in the broadest sense. A realm is defined purely by connectivity. Two clients are in the same realm if, when they exchange the addresses each has in that realm, they are able to send packets to each other. This includes IPv6 and IPv4 realms, which actually use different address spaces, in addition to private networks connected to the public Internet through NAT.

The key assumption in ICE is that a client cannot know, apriori, whether the peer it wishes to communicate with is connected to one or all of the address realms it is in. Therefore, in order to communicate, it has to try them all, and choose the best one that works.

Before a UA makes a call, it obtains as many IP address and port combinations in as many address realsm as it can. These adresses all represent potential points at which the UA will receive a specific media stream. Any protocol that provides a client with an IP address and port on which it can receive traffic can be used. These include STUN, TURN, RSIP, and even a VPN. The client also uses any local interface addresses. A dual-stack v4/v6 client will obtain both a v6 and a v4 address/port. The only requirement is that, across all of these addresses, the client can be certain that at least one of them will work for any peer. This is easily guaranteed by using TURN, RSIP, MIDCOM or a VPN from a server on the public Internet to obtain one of the addresses.

The UAC then makes a STUN server available on each of the address/port combinations it has obtained. This STUN server is running locally, on the UA.

All of these addresses are placed into the SDP offer[5] sent by the UAC. Each of them is represented by a separate m-line. The SDP ALT grouping[19] is used to indicate that each of these m-lines represents an alternate point of connectivity for the media stream. They are ordered in terms of preference. Local IPv6 addresses always have the highest preference, followed by local IPv4 addresses, followed by STUN-allocated addresses, followed last by addresses allocated through protocols using relays, such as TURN and VPN. SDP attributes are also used to convey the STUN username and password which are required to gain access to the STUN server on each address/port combination.

This offer is sent to the called party. ICE also assumes that SIP itself can provide global connectivity across address realms. Indeed, the point of the SIP URI is to act as a globally useful identifier for reaching a user wherever they are.

Once the offer arrives at the UAS, it sends STUN requests to each alternate address/port in the SDP offer, similar to the intra-realm STUN mechanism[20] proposed previously. These STUN requests include the username and password obtained from the SDP. None of the flags are used. The STUN requests serve two purposes. The first is to check for connectivity. If a response is received, the UAS knows that it can reach the UAC at that address. The second purpose is to obtain more addresses at which the UAS can be contacted. If there were NATs between the UAS and UAC, the UAS may discover another address through the STUN responses. In its answer, the UAS includes all addresses that it can unilaterally determine (just as the UAC did), in addition to any that were discovered using the STUN messages to the UAC.

When the answer arrives at the UAC, it performs a similar operation. Using STUN, it checks connectivity to each of the addresses in the answer. Through the STUN responses, it may learn of additional addresses that it can use to receive media. It can therefore generate a re-INVITE or UPDATE[6] request to pass this address to the callee. Generally, at the end of the first exchange, both sides will have discovered one of more addresses which they are capable of successfully sending to. Each side uses the most preferred address amongst the ones which worked.



 TOC 

3. Terminology

Several new terms are introduced in this specification:

Transport Address: The combination of an IP address and port.
Local Transport Address: A local transport address is transport address that has been allocated from the operating system on the host. This includes transport addresses obtained through VPNs, and also transport addresses obtained through RSIP (which lives at the operating system level). Transport addresses are typically obtained by binding to an interface.
Derived Transport Address: A derived transport address is a transport address which is associated with, but different from, a local transport address. The derived transport address is associated with the local transport address in that packets sent to the derived transport address are received on the socket bound to that local transport address. Derived addresses are obtained using protocols like STUN and TURN, and more generally, any UNSAF protocol[8].
Peer Derived Transport Address: A peer derived transport address is a derived transport address learned from a STUN server advertised by a peer in a media session.
TURN Derived Transport Address: A derived transport address obtained from a TURN server.
STUN Derived Transport Address: A derived transport address obtained from a STUN server that has been provisioned into the UA. This, by definition, excludes Peer Derived Transport Addresses.



 TOC 

4. Core ICE Algorithm

At its core, the ICE algorithm is an iterative process in which two cooperating entities, A and B, exchange addresses with each other in an attempt to connect. One side (say A) starts, collecting all of the addresses it can find. It sends those to B. B also collects all of the addresses it can find, including those obtained by sending address-fixing requests (such as STUN requests) to A itself. Those are passed to A. B also checks connectivity to the addresses provided by A. When A gets the set of addresses, it performs connectivity checks, and attempts to obtain further addresses based on the information sent by B. If A learns more addresses, it sends these to B, which checks connectivity to those addresses. This process iterates back and forth until both sides have obtain all the addresses which can be obtained. At least one address past in each direction should work.



 TOC 

5. Detailed ICE Algorithm

This section describes the detailed processing needed for ICE.

5.1 Gathering Transport Addresses

The offerer begins the process by gathering transport addresses. There are two types of addresses it can gather - local transport addresses, and derived transport addresses. Local transport addresses are obtained by binding to an ephemeral port on an interface (physical or virtual) on the host. A multi-homed host SHOULD attempt to bind on all interfaces for all media streams it wishes to receive. For media streams carried using the Real Time Transport Protocol (RTP)[10], the offerer will need to bind to an ephemeral port for both RTP and RTCP.

The result will be a set of local transport addresses. The offerer may also have access to servers that provide unilateral self-address fixing (UNSAF)[8]. Examples of such protocols include STUN, TURN, and TEREDO[21]. For each of these protocols, the offerer may have access to a multiplicity of servers. For example, a user connected to a natted cable access network might have access to a STUN server in the private cable network and in the public Internet. For each local transport address, the offerer SHOULD address-fix against every server for each protocol it supports. The result of this will be a set of derived transport addresses, with each derived address associated with the local transport address it is derived from.

ICE works better the more options exist for connectivity. However, in order to communicate with the peer, at least one of the offered addresses has to be guaranteed to work with any peer that might be called. This generally requires that one of the derived addresses be obtained from a relay service (such as TURN or TEREDO) that exist within the public Internet.

5.2 Enabling STUN on Each Transport Address

Once the offerer has obtained a set of transport addresses, it starts a STUN server on each local transport address (including ones used for RTCP). This, by definition, means that the STUN service will be reached for requests sent to the derived addresses.

However, the client does not need to provide STUN service on any other IP address or port, unlike the normal STUN usage as described in [13]. The need to run the service on multiple ports is to support the change flags. However, those flags are not needed with ICE, and the server SHOULD reject any requests with these flags set.

Furthermore, there is no need to support TLS or to be prepared to receive SharedSecret request messages. Those messages are used to obtain shared secrets to be used with BindingRequests. However, with ICE, usernames and passwords are exchanged in SIP itself.

It is important to note that the transport address being used by the STUN server will also need to support the media stream which is to be sent to that transport address. This will require the offerer to disambiguate STUN messages from messages for the underlying media stream protocol. In the case of RTP/RTCP, this disambiguation is easy. RTP and RTCP packets start with the bits 0b10 (v=2). The first two bits in STUN are always 0b00. Disambiguating STUN with other media stream protocols may be more complicated. However, it is guaranteed to always be possible by selecting an appropriately random username (see below).

The need to run STUN on the same transport address as the media stream represents the "ugliest" piece of ICE. However, it is an essential part of the story. By sending STUN requests to the very same place media is sent, any bindings learned through STUN will be useful even when communicating through symmetric NATs. This results in a substantial increase in the scope of applicability of STUN, in terms of cases where it can provide connectivity. In that sense, the usage of STUN here is radically different than the usage models outlined in [13], where STUN is generally useless for dealing with symmetric NAT.

For each local transport address where a STUN server is running, the client MUST choose a username and password. The username MUST be globally unique, so that no other host will select a username with the same value. This username and password will be passed to the answerer in the SDP. They are used by the answerer to authenticate the STUN requests to the server.

The gloal uniqueness requirement stems from the lack of uniquenes afforded by IP addresses. Consider user agents A, B, and C. A and B are within private enterprise 1, which is using 10.0.0.0/8. C is within private enterprise 2, which is also using 10.0.0.0/8. As it turns out, B and C both have IP address 10.0.1.1. A makes a call to C. C, in its answer, provides A with its transport addresses. In this case, thats 10.0.1.1:8866 and 8877. As it turns out, B is on a call at that same time, and is also using 10.0.1.1:8866 and 8877. This means that B has a STUN server running on those ports, just as C does. A will send a STUN request to 10.0.1.1:8866 and 8877. However, these do not go to C as expected. Instead, they go to B. If B just replied to them, A would believe it has connectivity to C, when it fact it has connectivity to a completely different user, B. To fix this, the STUN username takes on the role of a unique identifier. C provides A with a unique username. A uses this username in its STUN query to 10.0.1.1:8866. This STUN query arrives at B. However, the username is unknown to B, and so the request is rejected. A treats the rejected STUN request as if there were no connectivity to C (which is actually true). Therefore, the error is avoided.

Once the STUN server is started, it MUST run until the first media packet arrives on that address. Once that occurs, the agent MAY terminate the server. It is still possible that a late or lose STUN message will show up, but these will generally fail any media stream validity checks and be discarded (STUN packets always fail the RTP validity checks).

While the server is running, it MUST act as a normal STUN server, but MUST only accept STUN requests from clients that authenticate using the username and password handed out for the dialog.

5.3 Prioritizing the Transport Addresses

With the STUN servers starting, the next step is to prioritize the transport addresses. This priority reflects the desire that the UA has to receive media on that address, and is assigned as a value from 0 to 1 (1 being most preferred). Although any prioritization is possible, it is RECOMMENDED that the prioritization be based on the number of intermediaries that will be traversed. The fewer intermediaries, the higher the priority. As a result of this, local IPv6 transport addresses obtained from physical interfaces have highest priority (it is RECOMMENDED that 1.0 be used). Next are local IPv4 transport addresses obtained from physical interfaces (it is RECOMMENDED that 0.8 be used). Next are STUN derived transport addresses (it is RECOMMENDED that 0.6 be used), followed by TURN, RSIP or TEREDO derived transport addresses (it is RECOMMENDED that 0.4 be used). Last up are local transport addresses obtained from VPN interfaces (it is RECOMMENDED that 0.2 be used).

5.4 Constructing the Offer

The next step is to construct the offer. For each media stream, the client encodes its available set of transport addresses as a series of m-lines. Each m-line conveys a single transport address (or, in the case of RTP, two transport addresses - one for RTP, and one for RTCP). If, in the case of RTP, the RTP and RTCP transport addresses do not follow the even/odd convention, the SDP for NAT[17] extensions can be used to convey them.

Each m-line in a media stream is marked with a unique media stream identifier (MID)[9]. Furthermore, the Alternative Semantics for the SDP Grouping Framework[19] is used to indicate that all of those m-lines are alternatives for that media stream. This is done using the ALT group attribute.

Each m-line is also tagged with its q-value, as obtained from the processing of Prioritizing the Transport Addresses. This is done with a new proposed SDP attribute, "qvalue". The value of this attribute is a q-value, as defined in RFC 3261[1].

The q-value attribute belongs in the ALT specification. That specification provides an ordering of media streams based on the order their MIDs are listed in the ALT attribute. However, the actual absolute values of the preferences are needed for ICE to properly choose the optimal connectivity.

Each m-line also includes the "stun" SDP attribute. This attribute, defined in SDP Extensions for STUN, indicates that a STUN server is running on the transport addresses associated with that m-line, and conveys the username and password used to access that server. In the case of RTP, this means that STUN servers are running on both the RTP and RCP ports, independently of whether the RTCP port is equal to the RTP port plus one, or explicitly signaled using [17]. Note that the stun attribute is included for local transport addresses and derived transport addresses. In the case of derived transport addresses, the username and password is the same as that for the STUN server bound to the associated local transport address. Running STUN on a Derived Transport Addresses discusses running STUN servers on derived transport addresses, and demonstrates that it does the "right thing".

Once the offer is constructed, it is sent.

5.5 Answerer Processing - Connectivity Checks and Gathering

Once the answerer receives the offer, it does several things in parallel. First, it performs the same processing described in Gathering Transport Addresses. Specifically, for each group of m-lines in the offer that represents a distinct media stream, the answerer allocates a set of local transport addresses and the full set of derived transport addresses.

Note that these addresses can be "pre-gathered" before the call is even received, so that a set is always "on-deck". This will avoid any increase in call setup times, at the expense of holding onto addresses which may not get used. Retaining these addresses may also require refresh traffic that consumes network bandwidth.

While the unilateral derived addresses are being obtained, the answerer sends a STUN BindingRequest from each local transport address to each STUN server identified in the offer. This BindingRequest MUST contain the USERNAME attribute, and the value of this attribute MUST equal the username from the stun attribute in the offer. The BindingRequest MUST contain a MESSAGE-INTEGRITY attribute, computed using the username and password from the stun attribute in the offer. The BindingRequest MUST NOT contain the CHANGE-REQUEST or RESPONSE-ADDRESS attribute.

It is RECOMMENDED that these STUN requests be sent in parallel. The answerer MAY alert the user during this time. Generally, if the user is a human (and not an automata), the STUN transactions will have completed before the call is answered.

If the STUN BindingRequest elicits a BindingResponse before the STUN transaction times out, the result is considered a success. For successful transactions, the answerer stores the offered transport address (which identifies both the STUN server and the place where media is sent), the local transport address from which the STUN request was sent, the MID from the offer, and the q-value from the offer. If the STUN transaction succeeds, the client knows for certain that the media can reach the destination as well. That is because the media traffic will be sent from the same transport address, to the same trasport address, as the STUN packet was.

Once at least one succesful transaction has taken place, the answerer MAY begin sending media to that corresponding transport address. If MUST send media from the local address used to send the STUN request. If another transaction completes successfully, resulting in a transport address with higher priority, that transport address MUST be used instead (along with its corresponding local address). Note that, between two transport addresses with the same q-value, a STUN derived address always has higher priority. Furthermore, once an agent sends media to a transport address with a specified priority, it MUST NOT, during the lifetime of the dialog, send media to a connected transport address with a lower priority.

This restriction allows an agent to free derived transport addresses once it receives media on a transport address with a higher priority. The drawback of this restriction is that if connectivity should be lost during the dialog, the client cannot fall back to lower priority address. We believe that it is more important to free unneeded resources than to hold onto them in case of the unlikely event of a problem.

For those successful STUN transactions, the answerer compares the MAPPED-ADDRESS attribute in the response to the local transport address from which the STUN request was sent. If the two differ, the answerer considers the MAPPED-ADDRESS as another transport address that has been gathered for usage in this dialog. This transport address is referred to as a STUN derived transport address (SDTA). The q-value of this transport address is set to the value of the q-value attribute from the offer. For example, if the offerer provides a transport address obtained from a local interface, it would set the q-value to 1.0. If the answerer sends a STUN request to the server and obtains a new transport address, that transport address is assigned a q-value of 1.0. That q-value will be used in comparison to other addresses gathered by the answerer.

Note how the q-value from the offerer is used to compare with q-values set by the answerer. Because q-values are shared between users, they must have a well-defined scale and an absolute order. It is for this reason that the relative ordering defined in the current ALT specification is not sufficient.

If any STUN BindingRequest results in a BindingErrorResponse, the ERROR-CODE is examined. If it is 401, 430, 432 or 500, the client SHOULD retry the request, applying any appropriate fixes specified by the error code. In the case of 400, 431 and 600, the client MUST NOT retry. This case is treated identically to a timeout, so that it is equal to no connectivity at all.

5.6 Generating the Answer

At some point, the called party will decide to accept or reject the call. A rejection terminates ICE processing, of course. In the case of acceptance, the answer is constructed as follows.

At the time when the answer is to be sent, the answerer will have gathered some number of transport addresses. Some of these will be local transport addresses, some will be unilaterally derived addresses, and some will be stun derived from the peer in the dialog. Each of these will have a q-value, based on either the rules in Gathering Transport Addresses or Answerer Processing - Connectivity Checks and Gathering.

Unfortunately, due to the limitations of SDP, the number of m-lines in an answer must be the same as an offer. As a result, additional processing is needed to break the transfer of the gathered addresses into a series of offer/answer cycles, referred to as "ICE cycles".

For each media stream in the offer, let M be the number of m-lines listed as alternates for that stream. Let N be the number of gathered transport addresses for that stream at the time when the offer is to be sent (note that, in the case of RTP, the RTP and RTCP transport addresses together count as one, not two). The N transport addresses are sorted in decreasing order of q-value. If two transport addresses have the same q-value, STUN derived addresses are preferred over any others. Beyond that, the relative ordering is implementation defined. If N is less than M, all N transport addresses are placed into the answer, matching one of the M m-lines. For the remaining M-N m-lines, the transport address with the highest q-value is repeated. If N is greater than or equal to M, the top M transport addresses are placed into the answer. This results in N-M leftovers, which will be sent in the next ICE cycle.

The q-value MUST be placed into each m-line in the answer. If the transport address was stun derived, the SDP "derived" attribute MUST be included. Each of the M m-lines is assigned a unique MID, and the ALT group attribute is added to the SDP, indicating that all M are alternates for the same stream. Each m-line MUST include the stun attribute, including STUN derived transport addresses. For those addresses, the STUN server is the one bound to the transport address where the STUN request was sent from.

The answer is then sent.

5.7 Offerer Processing of the Answer

The processing of the answer by the offerer is nearly identical to that of the answerer processing the offer. Specifically, the offerer will send STUN requests to the STUN servers listed in the answer. This results in the same connectivity processing, and will also result in the gathering of new STUN derived addresses. The offerer can begin sending media to the answerer once it has at least one transport address whose connectivity has been verified.

5.8 Additional ICE Cycles

After the completion of the offer/answer exchange, both sides may continue to obtain more derived transport addresses. This may occur because a STUN transaction took too long to complete, the missed the "window" of the previous offer/answer exchange. Or, it may occur because the previous offer/answer exchange provided additional addresses which resulted in new STUN derived attributes.

At any point when either agent has one or more new gathered addresses, it MAY initiate a new offer/answer exchange, and a new corresponding ICE cycle. It would add m-lines to the existing set containing those new gathered addresses.

Typically, there won't be more than a small number (2-3) ICE cycles before convergence. Assuming that there is no network packet loss (which can extend the STUN transaction) and zero network latency, it appears that a maximum of two ICE cycles are needed to reach convergence.



 TOC 

6. Running STUN on a Derived Transport Addresses

One of the seemingly bizarre operations done during the ICE processing is the transmission of a STUN request to a transport address which is obtained through TURN or STUN itself. This actually does work, and in fact, has extremely useful properties. The subsections below go through the detailed operations that would occur at each point to demonstrate correctness and the properties derived from it.

6.1 STUN on a TURN Derived Transport Address

Consider a client A that is behind a NAT. It connects to a TURN server on the public side of the NAT. To do that, A binds to a local transport address, say 10.0.1.1:8866, and then sends a TURN request to the TURN server. The NAT translates the net-10 address to 192.0.2.88:5063. Assume that the TURN server is running on 192.0.2.1 and listening for TURN traffic on port 7764. The TURN server allocates a derived transport address 192.0.2.1:26524 to the client, and returns it in the TURN response. Remember that all traffic from the TURN server to the client is sent from 192.0.2.1:7764 to 10.0.1.1:8866.

Now, the client runs a STUN server on 10.0.1.1:8866, and advertises that its server actually runs on 192.0.2.1:26524. Another client, B, sends a STUN request to this server. It sends it from a local transport address, 192.0.2.77:1296. When it arrives at 192.0.2.1:26524, the TURN server "locks down" outgoing traffic, so that data packets received from A are sent to 192.0.2.77:1296. The STUN request is then forwarded to the client, sent with a source address of 192.0.2.1:7764 and a destination address of 192.0.2.88:5063. This passes through the NAT, which rewrites the source address to 10.0.1.1:8866. This arrives at A's STUN server. The server observes the source address of 192.0.2.1:7764, and generates a STUN response containing this value in the MAPPED-ADDRESS attribute. The STUN response is sent with a source address fo 10.0.1.1:8866, and a destination of 192.0.2.1:7764. This arrives at the TURN server, which, because of the lock-down, sends the STUN response with a source address of 192.0.2.1:26524 and destination of 192.0.2.77:1296, which is B's STUN client.

Now, as far as B is concerned, it has obtained a new STUN derived transport address of 192.0.2.1:7764. And indeed, it has! STUN derived transport addresses are scoped to the dialog, so they can only be used by the peer in the dialog. Furthermore, that peer has to send requests from the socket on which the STUN server was running. In this case, A is the peer, and its STUN server was on 10.0.1.1:8866. If it sends to 192.0.2.1:7764, the packet goes to the TURN server, and due to lock-down, is forwarded to B, and specifically, is forwarded to the transport address B sent the STUN request from. Therefore, the address is indeed a valid STUN derived transport address.

The benefit of this is that it allows two clients to share the same TURN server for media traffic in both directions. With "normal" TURN usage, both clients would obtain a derived address from their own TURN servers. The result is that, for a single call, there are two bindings allocated by each side from their respective servers, and all four are used. With ICE, that drops to two bindings allocated from a single server. Of course, all four bindings are allocated initially. However, once one of the clients begins receiving media on its STUN derived address, it can deallocate its TURN resources.

[[TODO: Include a diagram that shows this pictorially.]]

6.2 STUN on a STUN Derived Transport Address

Consider a client A that is behind a NAT. It connects to a STUN server on the public side of the NAT. To do that, A binds to a local transport address, say 10.0.1.1:8866, and then sends a STUN request to the STUN server. The NAT translates the net-10 address to 192.0.2.88:5063. Assume that the STUN server is running on 192.0.2.1 and listening for STUN traffic on port 3478, the default STUN port. The STUN server sees a source IP address of 192.0.2.88:5063, and returns that to the client in the STUN response. The NAT forwards the response to the client.

Now, the client runs a STUN server on 10.0.1.1:8866, and advertises that its server actually runs on 192.0.2.88:5063. Another client, B, sends a STUN request to this address. It sends it from a local transport address, 192.0.2.77:1296. When it arrives at 192.0.2.88:5063 (on the NAT), the NAT rewrites the source address to 10.0.1.1:8866, assuming that it is of the full-cone variety, or is restricted, and the permission for 192.0.2.77:1296 is open. This arrives at A's STUN server. The server observes the source address of 192.0.2.77:1296, and generates a STUN response containing this value in the MAPPED-ADDRESS attribute. The STUN response is sent with a source address of 10.0.1.1:8866, and a destination of 192.0.2.77:1296. This arrives at B's STUN client.

Now, as far as B is concerned, it has obtained a new STUN derived transport address of 192.0.2.77:1296. Of course, this is the same address as the local transport address, and therefore this derived address is not used. However, had there been additonal NATs between B and A's NAT, B would end up seeing the binding allocated by that outermost NAT. The net result is that STUN requests sent to a STUN derived address behave as normal STUN would. However, these STUN requests have the side-effect of creating permissions in the NATs which see those requests in the public to private direction. This turns out to be very useful for traversing restricted NATs.



 TOC 

7. SDP Extensions for STUN

Two new SDP attributes are defined to support STUN derived transport addresses. These attributes are "stun" and "derived".

7.1 The stun Attribute

The stun attribute MUST be present within a media stream. It contains a username and password. These are the username and password that the recipient of the SDP MUST use when connecting to that server. The username and password are scoped to the lifetime of the dialog on which the SDP is being exchanged. If the dialog terminates, the username and password are invalid.

The grammar of the stun attribute is:

stun-attribute = "stun" ":" username SP password
username       = non-ws-string
password       = non-ws-string

Note that STUN allows both the username and password to contain the space character. However, usernames and passwords used with ICE cannot contain the space.

The security considerations associated with carrying a username and password in the clear in SIP are not as bad as one might think. If an eavesdropper should see the username and password, the worst they can do is send STUN requests to the host. Since STUN is a stateless protocol, the attacker can not alter the processing of the call or otherwise disrupt it. They could flood the server with BindingRequest packets. However, this would be no worse than if the attacker simply floods the host with any kind of packet.

However, integrity protection of the username and password are more important. If an attacker is capable of intercepting the message and modifying the username or password, they could prevent connectivity from being established between peers, and therefore disrupt the call. Of course, if the attacker can intercept the SIP message, there are many other ways in which they could do that, such as simply discarding the message. Injecting fake SDP with incorect usernames and passwords can also disrupt a call, and does not require the compromise of an intermediate server. A similar attack is possible by modifying most of the SDP attributes. To prevent these kinds of attacks, it is RECOMMENDED that sips be used.

7.2 The derived Attribute

The derived attribute MUST be present within a media block of the SDP. It indicates that the transport addresses in the c and m lines are STUN derived transport addresses, learned from the entity to whom the SDP is being sent (referred to as the peer). The syntax of this attribute is:

derived-attribute = "derived" ":" identification-tag
   ; from RFC 3388

The identification-tag is a MID that identifies the transport address used as the STUN server. This MID is extracted from the SDP from the peer. As an example, consider users A and B. User A sends an SDP offer to B, as part of an offer/answer exchange[5]. A indicates, using the stun attribute, that the stream labeled with MID 23 runs a STUN server. B sends a STUN request to this server, and obtains a MAPPED-ADDRESS in the STUN response. This transport address is used in the answer. The m-line containing this address would contain the attribute "a=derived:23".

When a user sends media to a transport address that has been marked as derived, it MUST do so by sending from the transport address indicated by the identification-tag. Continuing the example above, if the transport address provided by A in MID 23 was 192.0.2.3:3344, when A sends media to a STUN derived transport address derived from MID 23, it MUST send the media from 192.0.2.3:3344. The need for this requirement is described in Running STUN on a Derived Transport Addresses.



 TOC 

8. Connectivity Preconditions

One of the benefits of ICE is that each side knows with certainty when it is able to communicate with its peer. However, neither side knows for certainty when both sides can communicate with each other. That information represents distributed state spread between both peers. It would be extremely useful to know this piece of information, so that a device can hold off on alerting a user until connectivity has been confirmed. This is exactly the kind of function that SIP preconditions[7] has been designed to provide.

This specification therefore defines the "connected" precondition type. A media stream is considered connected from A to B when connectivity from A to B has been confirmed with STUN for at least one of the m-lines in the set of alternate transport addresses.

          A                        B
          |(1) INVITE SDP1         |
          |----------------------->|
          |(2) 183 SDP2            |
          |<-----------------------|
          |(3) PRACK               |
          |----------------------->|
          |(4) 200 PRACK           |
          |<-----------------------|
          |(5) STUN connectivity   |
          |........................|
          |(6) UPDATE SDP3         |
          |----------------------->|
          |(7) 200 UPDATE SDP4     |
          |<-----------------------|
          |(8) 180 Ringing         |
          |<-----------------------|
          |(9) PRACK               |
          |----------------------->|
          |(10) 200 PRACK          |
          |<-----------------------|
          |(11) 200 INVITE         |
          |<-----------------------|
          |(12) ACK                |
          |----------------------->|

Figure 3 shows a typical call flow used in conjunction with preconditions. User A does not want the phone to ring unless connectivity is guaranteed. As a result, it sends an INVITE with a connectivity precondition (message 1) that is mandatory in the sendrecv direction. When this arrives at B, B decides to accept the precondition, and therefore generates an answer indicating that the precondition is mandatory in the sendrecv direction. The current status is none, since connectivity hasn't been established in either direction yet. At that point, the ICE cycles begin (which may themselves use UPDATE for the offer/answer exchanges). Assume that B has established connectivity to A. When A establishes connectivity to B, it sends an UPDATE (message 6) with a current status of sendrecv. This meets the precondition. As a result, B's phone rings, causing a 180 to be sent (message 8).



 TOC 

9. Example Use Cases

This section contains a number of example use cases. They help to demonstrate that the core ICE algorithm always results in the best connectivity. In all cases, only RTP is shown and discussed, to simplify the discussion. RTCP related operations (generally STUN queries parallel to the RTP ones) are omitted.

9.1 Public Internet

In this case, there are two clients A and B. Both are connected via a single-homed, IPv4 machine, to the public Internet. There are no NATs. A calls B. The basic call flow for this case is shown in Figure 4.

          A                   B
          |(1) INVITE         |
          |LTA1               |
          |------------------>|
          |(2) STUN LTA2->LTA1|
          |<------------------|
          |(3) STUN RESP LTA2 |
          |------------------>|
          |(4) 200 OK         |
          |LTA2               |
          |<------------------|
          |(5) ACK            |
          |------------------>|
          |(6) STUN LTA1->LTA2|
          |------------------>|
          |(7) STUN RESP LTA1 |
          |<------------------|
          |(8) RTP LTA1->LTA2 |
          |------------------>|
          |(9) RTP LTA2->LTA1 |
          |<------------------|


The caller, A, has a single interfaec IP1 (192.0.2.1). As a result, it allocates a local transport addresses LTA1 for RTP. It has no derived transport addresses. LTA1 is placed into the SDP. The SDP also indicates that a STUN server is listening on LTA1, and a username and password are provided to access the server. The INVITE (message 1) is sent to B. It would look like, in part:

INVITE sip:B@example.com SIP/2.0
Content-Length: ....
Content-Type: application/sdp

v=0
o=alice 2890844730 2890844731 IN IP4 host.example.com
s=
c=IN IP4 192.0.2.1
t=0 0
m=audio 54344 RTP/AVP 0
a=mid:1
a=qvalue:1.0
a=stun:user 9kksj==

When B receives this, it determines its transport addresses on which it can receive media. It has only one interface as well, so it constructs a single local transport addresse LTA2 on which to receive RTP. While the phone is ringing, B sends a STUN request to LTA1, using the username and password from the SDP. This request is sent from LTA2. The STUN response is returned to B. The response contains a MAPPED-ADDRESS of LTA2, identical to the local transport address the request was sent from. As a result, no additional derived transport addresses are available. However, B now knows it has connectivity to A. When the callee answers, the 200 OK (message 4) gets sent with its only transport address, LTA2. A will also send a STUN request to LTA2 (message 6) using the username and password provided by B. Like B, it will discover that there are no additional derived transport addresses available. However, A has learned that it has connectivity to B. Each will then send each other media.

9.2 Disconnected Enterprise

In this scenario, A and B are both within the enterprise. However, they work in different departments. Both departments are connected to the company backbone. However, B's department has a NAT between it and the company backbone, with B on the private side. There are no public STUN or TURN servers to use.

          A                       NAT                       B
          |(1) INVITE              |                        |
          |LTA1                    |                        |
          |------------------------------------------------>|
          |(2) STUN LTA2->LTA1     |                        |
          |<------------------------------------------------|
          |(3) STUN RESP PDTA2     |                        |
          |------------------------------------------------>|
          |(4) 200 OK              |                        |
          |PDTA2                   |                        |
          |<------------------------------------------------|
          |(5) ACK                 |                        |
          |------------------------------------------------>|
          |(6) STUN LTA1->PDTA2    |                        |
          |------------------------------------------------>|
          |(7) STUN RESP LTA1      |                        |
          |<------------------------------------------------|
          |(8) UPDATE PDTA2, LTA2  |                        |
          |<------------------------------------------------|
          |(9) 200 OK LTA1         |                        |
          |------------------------------------------------>|
          |(10) STUN LTA1->LTA2    |                        |
          |----------------------->|                        |
          |(11) RTP LTA1->PDTA2    |                        |
          |------------------------------------------------>|
          |(12) RTP LTA2->LTA1     |                        |
          |<------------------------------------------------|


The call flow for this case is shown in Figure 6. As in the previous case, A starts with a single local transport address, LTA1, which it places into the INVITE (message 1). The username and password for STUN access are provided. This message is identical to the INVITE shown above for the public Internet case. This request is received by B (recall our assumption that end-to-end SIP connectivity always exists; a proxy would typically be used to provide this, but it is omitted from the flow for clarity).

B's phone rings. In parallel with that, B sends a STUN request from its local transport address, LTA2, to LTA1. This STUN request (message 2) traverses the NAT, which translates the source IP address from LTA2 to Peer Discovered Transport Address 2 (PDTA2). The STUN response (message 3) is propagated back to B. From it, B has learned two things. First, it has a point of connectivity to A. Secondly, it now has another transport address, PDTA2. The relative preference of this transport address is equal to the relative preference of the address providing the STUN service - LTA1, which was listed with q-value of 1.0. Thus, PDTA2 has a q-value of 1.0, as does LTA2. Since there is only one m line in the offer, the answer can contain only one m line. Therefore, B will need to choose its highest preference transport address, and an UPDATE is used later to provide the other one. Given equal q-values, a derived transport address is preferred over a local one. As a result, the 200 OK contains PDTA2 as the transport address. The stun username and password is provided. The SDP also indicates that PDTA2 was derived by stunning off of LTA1 (which had a MID of 1). The 200 OK would look like, in part:

SIP/2.0 200 OK
Content-Length: ....
Content-Type: application/sdp

v=0
o=bob 280744730 28977631 IN IP4 host2.example.com
s=
c=IN IP4 192.0.2.2
t=0 0
m=audio 6886 RTP/AVP 0
a=mid:1
a=qvalue:1.0
a=stun:user asd8866
a=derived:1

A then attempts to verify connectivity to B. It does so by sending a STUN request (message 6) to PDTA2 (192.0.2.2:6886). Because this address is derived from LTA1 (indicated by the derived attribute), A has to send the STUN request from LTA1, which it does. THis request is received by the NAT. Even if this NAT is symmetric, a binding already exists for this message. The destination address is translated from PDTA2 to LTA2, and is forwarded to B. B responds. The response (message 7) contains a MAPPED-ADDRESS of LTA1, the source transport address where the STUN request came from. A now knows two things. It knows it has connectivity to B through the derived address, and it knows that it has no additional derived transport addresses to offer.

B, however, has no offered A all of its transport addresses. It generates an UPDATE request. The SDP in this request contains two m-lines, both of which are alternates, using the ALT group semantics. This UPDATE would look like, in part:

UPDATE sip:A@host.example.com
Content-Length: ....
Content-Type: application/sdp

v=0
o=bob 280744730 28977631 IN IP4 host2.example.com
s=
t=0 0
a=group:ALT 1 2
m=audio 6886 RTP/AVP 0
c=IN IP4 192.0.2.2
a=mid:1
a=qvalue:1.0
a=stun:user asd8866
a=derived:1
m=audio 22334 RTP/AVP 0
c=IN IP4 10.0.1.2
a=mid:2
a=qvalue:1.0
a=stun:user asd8866

This SDP adds LTA2 (10.0.1.2:22334) as another alternate. This UPDATE is received by A, which generates an answer. Since A has fewer transport addresses than B (just one, as opposed to two from B), A replicates the LTA1 information twice in its answer (message 9). Next, A tries to determine connectivity to LTA2. Since this is a private address, it cannot be reached, and the STUN message (message 10) is dropped at the NAT.

The good news is that both users have verified connectivity to each other. Each uses the highest priority point of connectivity. This means that A sends to PDTA2 (sending from LTA1), and B sends to LTA1.



 TOC 

10. Security Considerations

Security considerations are discussed throughout the document. [[Editors Note: Need to summarize here anyway.]].



 TOC 

11. IANA Considerations

This specification registers a new precondition type and two new SDP attributes.

11.1 Precondition Type

Name: connectivity
Description: Confirm end-to-end connectivity with STUN.

11.2 SDP Attributes

This specification defines two new media attributes: stun and derived. Their syntax is defined in SDP Extensions for STUN.



 TOC 

12. IAB Considerations

The IAB has studied the problem of "Unilateral Self Address Fixing", which is the general process by which a client attempts to determine its address in another realm on the other side of a NAT through a collaborative protocol reflection mechanism [8]. ICE is an example of a protocol that performs this type of function. Interestingly, the process for ICE is not unilateral, but bilateral, and the difference has a signficant impact on the issues raised by IAB. The IAB has mandated that any protocols developed for this purpose document a specific set of considerations. This section meets those requirements.

12.1 Problem Definition

From RFC 3424 any UNSAF proposal must provide:

Precise definition of a specific, limited-scope problem that is to be solved with the UNSAF proposal. A short term fix should not be generalized to solve other problems; this is why "short term fixes usually aren't".

The specific problems being solved by ICE are:

Provide a means for two peers to determine the set of transport addresses which can be used for communication.
Provide a means for resolving many of the limitations of other UNSAF mechanisms by wrapping them in an additional layer of processing (the ICE methodology).
Provide a means for a client to determine an address that is reachable by another peer with which it wishes to communicate.

12.2 Exit Strategy

From RFC 3424, any UNSAF proposal must provide:

Description of an exit strategy/transition plan. The better short term fixes are the ones that will naturally see less and less use as the appropriate technology is deployed.

ICE itself doesn't easily get phased out. However, it is useful even in a globally connected Internet, to serve as a means for detecting whether a router failure has temporarily disrupted connectivity, for example. However, what ICE does is help phase out other UNSAF mechanisms. ICE effectively selects amongst those mechanisms, prioritizing ones that are better, and deprioritizing ones that are worse. Local IPv6 addresses are always the most preferred. As NATs begin to dissipate as IPv6 is introduced, derived transport addresses from other UNSAF mechanisms simply never get used, because higher priority connectivity exists. Therefore, the servers get used less and less, and can eventually be remove when their usage goes to zero.

Indeed, ICE can assist in the transition from IPv4 to IPv6. It can be used to determine whether to use IPv6 or IPv4 when two dual-stack hosts communicate with SIP (IPv6 gets used). It can also allow a client in a v6 island to communicate with a v4 host on the other side of a 6to4 NAT, by allowing the v6 host to address-fix against the v4 host, and in the process, obtain a v4 address which can be handed to the v4 client.

12.3 Brittleness Introduced by ICE

From RFC3424, any UNSAF proposal must provide:

Discussion of specific issues that may render systems more "brittle". For example, approaches that involve using data at multiple network layers create more dependencies, increase debugging challenges, and make it harder to transition.

ICE actually removes brittleness from existing UNSAF mechanisms. In particular, STUN has several points of brittleness. One of them is the discovery process which requires a client to try and classify the type of NAT it is behind. This process is error-prone. With ICE, that mechanism is simply not used. Rather than unilaterally assessing the value of the address, its value is dynamically determined by measuring connectivity to a peer. The process of determining connectivity is very robust. The only potential problem is that bilaterally fixed addresses through STUN can expire if traffic does not keep them alive. However, that is substantially less brittleness than the STUN discovery mechanisms.

Another point of brittleness in STUN, TURN, and any other unilateral mechanism is its reliance on an additional server. ICE can work peer-to-peer. Therefore, it will allow to clients to communication through NATs if it is at all possible in the absence of an UNSAF protocol.

Another point of brittleness in STUN is that it assumes that the STUN server is on the public Internet. Interestingly, with ICE, that is not necessary. There can be a multitude of STUN servers in a variety of address realms. ICE will discover the one that has provided a usable address.

The most troubling point of brittleness in STUN is that it doesnt work in all network topologies. In cases where there is a shared NAT between each client and the STUN server, STUN may not work. With ICE, that restriction can be lifted.

STUN also introduces some security considerations. Unfortunately, since ICE still uses network resident STUN servers, those security considerations still exist.

12.4 Requirements for a Long Term Solution

From RFC 3424, any UNSAF proposal must provide:

Identify requirements for longer term, sound technical solutions -- contribute to the process of finding the right longer term solution.

Our conclusions from STUN remain unchanged. However, we feel ICE actually helps because we believe it can be part of the long term solution.

12.5 Issues with Existing NAPT Boxes

From RFC 3424, any UNSAF proposal must provide:

Discussion of the impact of the noted practical issues with existing, deployed NA[P]Ts and experience reports.

A number of NAT boxes are now being deployed into the market which try and provide "generic" ALG functionality. These generic ALGs hunt for IP addresses, either in text or binary form within a packet, and rewrite them if they match a binding. This will interfere with proper operation of any UNSAF mechanism, including ICE.



 TOC 

Informative References

[1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002.
[2] Srisuresh, P., Kuthan, J., Rosenberg, J., Molitor, A. and A. Rayhan, "Middlebox communication architecture and framework", RFC 3303, August 2002.
[3] Borella, M., Lo, J., Grabelsky, D. and G. Montenegro, "Realm Specific IP: Framework", RFC 3102, October 2001.
[4] Borella, M., Grabelsky, D., Lo, J. and K. Taniguchi, "Realm Specific IP: Protocol Specification", RFC 3103, October 2001.
[5] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002.
[6] Rosenberg, J., "The Session Initiation Protocol (SIP) UPDATE Method", RFC 3311, October 2002.
[7] Camarillo, G., Marshall, W. and J. Rosenberg, "Integration of Resource Management and Session Initiation Protocol (SIP)", RFC 3312, October 2002.
[8] Daigle, L. and IAB, "IAB Considerations for UNilateral Self-Address Fixing (UNSAF) Across Network Address Translation", RFC 3424, November 2002.
[9] Camarillo, G., Eriksson, G., Holler, J. and H. Schulzrinne, "Grouping of Media Lines in the Session Description Protocol (SDP)", RFC 3388, December 2002.
[10] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 1889, January 1996.
[11] Rosenberg, J., Schulzrinne, H. and J. Weinberger, "An Extension to the Session Initiation Protocol (SIP) for Symmetric Response Routing", draft-ietf-sip-symmetric-response-00 (work in progress), September 2002.
[12] Mahy, R., "Requirements for Connection Reuse in the Session Initiation Protocol (SIP)", draft-ietf-sipping-connect-reuse-reqs-00 (work in progress), October 2002.
[13] Rosenberg, J., Huitema, C., Mahy, R. and J. Weinberger, "STUN - Simple Traversal of UDP Through Network Address Translators", draft-ietf-midcom-stun-05 (work in progress), December 2002.
[14] Rosenberg, J., "Traversal Using Relay NAT (TURN)", draft-rosenberg-midcom-turn-00 (work in progress), November 2001.
[15] Shore, M., "The TIST (Topology-Insensitive Service Traversal) Protocol", draft-shore-tist-prot-00 (work in progress), May 2002.
[16] Yon, D., "Connection-Oriented Media Transport in SDP", draft-ietf-mmusic-sdp-comedia-04 (work in progress), July 2002.
[17] Huitema, C., "RTCP attribute in SDP", draft-ietf-mmusic-sdp4nat-03 (work in progress), November 2002.
[18] Rosenberg, J., Mahy, R. and S. Sen, "NAT and Firewall Scenarios and Solutions for SIP", draft-ietf-sipping-nat-scenarios-00 (work in progress), June 2002.
[19] Rosenberg, J. and G. Camarillo, "The Alternative Semantics for the Session Description Protocol Grouping Framework", draft-camarillo-mmusic-alt-00 (work in progress), February 2003.
[20] Audet, F., Aoun, C., Sen, S. and F. Meijer, "Identifying Intra-realm Calls using STUN", draft-sen-sipping-intrarealm-stun-00 (work in progress), September 2002.
[21] Huitema, C., "Teredo: Tunneling IPv6 over UDP through NATs", draft-ietf-ngtrans-shipworm-08 (work in progress), September 2002.


 TOC 

Author's Address

  Jonathan Rosenberg
  dynamicsoft
  72 Eagle Rock Avenue
  East Hanover, NJ 07936
  US
Phone:  +1 973 952-5000
EMail:  jdrosen@dynamicsoft.com
URI:  http://www.jdrosen.net/


 TOC 

Intellectual Property Statement

Full Copyright Statement

Acknowledgement