Internet Engineering Task Force SIP WG Internet Draft J.Rosenberg,H.Schulzrinne draft-rosenberg-sip-entfw-01.txt dynamicsoft,Columbia U. March 2, 2001 Expires: September, 2001 SIP Traversal through Residential and Enterprise NATs and Firewalls STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as work in progress. The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract In this draft, we discuss how SIP can traverse enterprise and residential firewalls and NATs. This environment is challenging because we assume here that the end user has little or no control over the firewall or NAT, and that the firewall or NAT is completely ignorant of SIP. Despite this, our solutions for the NAT case are very workable and suffer few disadvantages. 1 Introduction The problem of getting applications through firewalls and NATs has received a lot of attention [1]. Getting SIP through firewalls and NATs is particularly troublesome. In a previous draft [2] we discussed some of the general issues regarding traversal of firewalls, and discussed some solutions for it. Our solutions were J.Rosenberg,H.Schulzrinne [Page 1] Internet Draft entfw March 2, 2001 based on having a proxy server control the firewall/NAT with a control protocol of some sort [3]. This protocol can open and close pinholes in the firewall, and/or obtain NAT address bindings to use in rewriting the SDP in a SIP message. The use of a control protocol in the midcom architecture is ideal for carriers, but it does not work when the SIP service provider is not the same as the ISP and transport provider of the end user. This is frequently the case for users behind enterprise firewalls and NATs who are trying to access SIP services outside of their networks. The same happens for residential NATs and firewalls. These devices are often used by consumers who have cable modem and DSL connections, and wish to connect multiple computers using the single address provided by the cable company or DSL company. [1] Residential firewalls and NATs are often referred to as cable/DSL routers, and are manufactured by companies like Linksys, Netopia, and Netgear. Ultimately, it is our belief and hope that NATs will disappear with the deployment of IPv6. However, that is not likely to happen for some time. Given the existence of NATs, one way to handle SIP is to embed a SIP ALG within enterprise NATs and firewalls. However, this has not happened. The top commercial firewall and NAT products continue to be SIP-unaware. Even if SIP ALG support were added tomorrow, there is still a huge installed based of firewalls and NATs that do not understand SIP. As a result, there is going to be a long period of time during which users will be behind firewalls or NATs that are ignorant of SIP, probably at least two to three years. The SIP community cannot wait for ubiquituous deployment of SIP aware firewalls and NATs. Interim solutions are needed NOW to enable SIP services to be delivered to users behind these devices. In this draft, we propose solutions for getting SIP through enterprise and residential NATs and firewalls that does not require changes to these devices or to their configurations. NATs and firewalls are a reality, and SIP deployment is being hampered by the lack of support for SIP ALGs in these boxes. A solution MUST be found, and we provide one here. 2 Architecture _________________________ [1] The author of this draft is amongst those who have such a residential NAT, and thus feels highly motivated to solve this particular problem J.Rosenberg,H.Schulzrinne [Page 2] Internet Draft entfw March 2, 2001 We assume that the network architecture we are dealing with looks like Figure 1. The caller is a UA in enterprise or residence A, and the called party is a UA in enterprise or residence B. The caller uses proxy X as its local outbound proxy, which forwards the call to the proxy of the called party, Y, also outside of the firewall or NAT. The call is then forwarded to the called party within enterprise or residence B. The firewall and/or NAT (FW/NAT) boxes are off-the-shelf boxes with no support for SIP ALG. We consider NAT and firewall separately. For NATs, we consider specifically a class of devices referred to as residential NATs. Residential NATs are typically placed in the home, and allow multiple devices to make use of a single IP address provided by a cable or DSL provider. The devices generally disallow incoming traffic, but allow outbound TCP and UDP connections. Based on the terminology defined in RFC 2663 [4], residential NATs are Network Address Port Translators (NAPT). Once a connection is established outwards, data on the same connection is allowed inwards from the remote peer. This is true for UDP as well. Specifically, if a user sends UDP packets from local IP address and port pair A,B to remote IP address and port pair C,D, they are natted to have a source address of X,Y. Packets sent from C,D to X,Y have their destination address natted to A,B, and are delivered back to the host behind the NAT. The ability to NAT UDP packets in this way is critical to our solutions. We have verified this feature on the leading residential NAT products. Many small offices and home offices (SOHO) also use these devices to allow their business to connect to the Internet over cable or DSL. Because the device is configured identically in this case, we lump it with the residential NAT. Enterprise firewalls are used in larger enterprises. They are typically configured with much tighter security. We assume the worst case scenario, which is that these boxes will allow users inside their enterprises to browse the web, and specifically, to browse secure web sites. UDP, both inbound and outbound, is disallowed. TCP inbound is disallowed. Outbound TCP from any host within the enterprise is allowed out only to port 80 and 443. Our assumption is that these devices are not running NAT. Handling enterprise devices that are both firewalls and NAPT involves combing the solutions for both cases. Wherever appropriate, we discuss any issues specific to combining the two. In general, getting SIP services to function behind these devices J.Rosenberg,H.Schulzrinne [Page 3] Internet Draft entfw March 2, 2001 +-------+ +-------+ | SIP | | SIP | | Proxy | | Proxy | | X | | Y | | | | | +-------+ +-------+ +-------+ +-------+ ........|FW/NAT |............ ........|FW/NAT |............ . | | . . | | . . +-------+ . . +-------+ . . . . . . . . . . . . . . . . . . . . . . . . . . +-------+ . . +-------+ . . | SIP UA| . . | SIP UA| . . | Joe | . . | Bob | . . +-------+ . . +-------+ . ............................. ............................. Enterprise or Enterprise or Residence A Residence B Figure 1: Network Architecture J.Rosenberg,H.Schulzrinne [Page 4] Internet Draft entfw March 2, 2001 requires resolution of several problems: Originating Requests: Getting SIP requests from the caller, Joe, to proxy X, and responses from proxy X back to the Joe. Receiving Requests: Getting SIP requests from proxy Y to the called party, Bob, and responses from Bob back to proxy Y. Handling RTP: Getting media to go from Joe to Bob and Bob to Joe. We discuss solutions for each in turn. 3 Originating requests The first problem is originating requests from the caller through a firewall/NAT, out to a proxy, and getting the responses from this proxy back to the caller. 3.1 NAT The residential NAT will allow both outgoing UDP and TCP traffic to port 5060. This means that there are no problems in generating an outbound INVITE. However, there are issues with the response. SIP specifies that for UDP, the response is sent to the port number in the Via header and the IP address the request came from. However, due to NAT, the port number in the Via header will be wrong. This means that the response will not be sent to the proper location. However, with TCP, responses are sent over the connection the INVITE arrived on. This means that a response sent over the TCP connection will be received properly by a caller behind a NAT. The simplest solution, therefore, is for the caller to use a TCP connection to send the INVITE, and receive the response. We recommend that this connection be kept open permanently, to avoid the need to establish it for new calls. A persistent connection is also needed for incoming calls in any case (see Section 4). For devices which do not support TCP, UDP may be used. However, the proxy needs to be able to send the UDP response to the address *and* port the request arrived on. This is not standardized behavior, but could potentially be configured for requests from users that are known to be behind residential NATs. In order for this connection to be used for re-INVITEs or BYEs, the proxy needs to record route. 3.2 Firewall J.Rosenberg,H.Schulzrinne [Page 5] Internet Draft entfw March 2, 2001 We assume the firewall (FW) blocks all outgoing UDP, but will allow some outgoing TCP. In the worst case, it will only allow outgoing HTTP traffic on 80, and HTTPS on 443. HTTPS is nothing more than HTTP over TLS/SSL [5]. What's interesting about https is that the connection starts out with TLS, negotiates a secure channel, and then runs HTTP over this channel. All HTTP messages are encrypted. The FW never sees any HTTP messages in the clear, only TLS/SSL messages. The important implication is that there is no way for a FW to have application layer intelligence that depends on the existence of HTTP on port 443. In fact, any protocol can be run over TLS on port 443, and it will look the same to the FW. Since we assume that the FW lets HTTPS through, it should allow SIP over TLS through, running on port 443. Thus, our proposal is to have the caller, Joe, initiate a TLS connection on port 443 to the proxy server X. Once the TLS connection is secured, the client can send SIP messages over this connection. Handling of SIP over TLS/SSL is identical to TCP. Responses from the proxy are sent over this connection as well [6]. We recommend that the client maintain the TLS connection to be open (more on this in Section 4). This avoids the need to re-initiate the TLS connection for every outgoing call. Fooling the FW into believing the traffic is HTTPS by running it over port 443 is not nice. We would strongly recommend that clients first try the IANA registered port for SIP over TLS, port 5061. If no response is received over this connection, the client should then try 443. Note that outgoing requests may work with just vanilla TCP. However, we have observed that some firewalls examine TCP connections to look for specific protocols. Thus, SIP over TCP on 5060 may not work. SIP over TCP on port 80 may also not work, as some firewalls check for HTTP messages. This is why we prefer TLS; we believe that it is most likely to work. In order for this connection to be used for re-INVITEs or BYEs, the proxy needs to record route. 4 Receiving requests Unfortunately, receiving requests is not as simple as sending them. We consider first the NAT case, and then the firewall case. 4.1 NAT The problem has to do with registrations. In Figure 1, the callee, Bob, will receive requests at their UA because they had previously J.Rosenberg,H.Schulzrinne [Page 6] Internet Draft entfw March 2, 2001 sent a REGISTER request to their registrar, which is co-located with proxy Y. This registration contains a Contact header which lists the address where the incoming requests should be sent to. However, in the case of NAT, this address will be wrong. It will contain a domain name or IP address that is within the private space of enterprise B. Thus, the REGISTER might look like: REGISTER sip:Y.com SIP/2.0 From: sip:bob@Y.com To: sip:bob@Y.com Contact: sip:bob@10.0.1.100 This address is not reachable by the proxy. To solve this problem, we need two things. First, we need a persistent connection to be established from Bob to Y. Secondly, we need a way for incoming requests destined for B to be routed over this connection. To address this first problem, we recommend that clients that send REGISTER requests do so over a TCP or TLS connection, as described in Section 3. Furthermore, they keep this connection open permanently. REGISTER refreshes are sent over this connection. We further recommend that the proxy/registrar hold this connection in a table, where the table is indexed by the remote side of the transport connection. When the proxy wishes to send a packet to some server at IP address M, port N, transport O, it looks up the tuple (M,N,O) in the table to see if a connection already exists, and then uses it. Now, a connection is available for contacting the user. However, this connection must be associated with sip:bob@Y.com. Unfortunately, it is not. Calls for sip:bob@Y.com are translated to sip:bob@10.0.1.100, which does not correspond to the remote side connection used to send the register, as seen by the proxy. Thats because of NAT, which will make the remote side appear to be a publically routable address. To handle this problem, the proxy could, in principal, record the IP address and port from the remote side of the connection used to send a REGISTER. Then, it can create a Contact entry of the form sip:bob@[ip-addr]:[port], where [ip-addr] and [port] are the IP address and port of the remote side of the connection. However, this is assuming that the registration is for the purposes of connecting the address in the To field with the machine the connection is coming from. That may not be the intent of the registration. The registration may be used to set up a call forwarding service, for J.Rosenberg,H.Schulzrinne [Page 7] Internet Draft entfw March 2, 2001 example. As a result, it is our proposal that clients be allowed to explicitly ask a proxy to create a Contact entry corresponding to the machine a REGISTER is sent from. We propose that a specific contact hostname value be reserved to have the meaning "I don't know what my address is, please use the IP address, port and transport from the connection over which this REGISTER was delivered". We propose that this host name be "jibufobutbmpu". This name is "I hate NATS a lot" with each letter incremented by one. This name is unlikely to be used in real systems (as opposed to something like "default", which could be real host name). Consider once more the architecture of Figure 1. The callee has an IP address of 10.0.1.100. It initiates a TCP connection to port 5060 on the proxy. This connection goes through the NAT, and the source address is rewritten to 77.2.3.88, and the port to 2937. The registration looks like: REGISTER sip:Y.com SIP/2.0 From: sip:bob@Y.com To: sip:bob@Y.com Contact: sip:bob@jibufobutbmpu The proxy Y then stores the incoming TCP connection into a table: (77.2.3.88,2397,TCP) -> [reference to TCP connection] It also updates the contact list for sip:bob@Y.com to include the URL sip:bob@77.2.3.88:2937;transport=tcp. Now, when an INVITE arrives for sip:b@Y.com, it is looked up in the registration database. The contact is extracted, and the proxy tries to send the request to that address. To do so, it checks its connection table to an open connection to the IP address, port and transport where the request is destined. In this case, such a connection is available, and the request is forwarded over it. The response from the callee is also routed over the same connection. In order for this connection to be used for re-INVITEs or BYEs, the proxy needs to record route. J.Rosenberg,H.Schulzrinne [Page 8] Internet Draft entfw March 2, 2001 4.2 Firewalls The situation is somewhat simpler for the case of firewalls. We still need to have a persistent connection established from Bob out to the proxy, possibly using TLS over port 443. A registration is then sent over this address, which will look like: REGISTER sip:Y.com SIP/2.0 From: sip:bob@Y.com To: sip:bob@Y.com Contact: sip:bob@44.2.4.1;transport=tcp For this to work, incoming calls for sip:bob@Y.com must be routed over the connection established by Bob to proxy Y. We assume the proxy maintains persistent connections in a table, indexed by remote address, port, and transport (as described above for NAT). In order for this connection to be used when contacting Bob, Bob's contact address must be the same as the connection address. This means that the remote connection address, as seen by Y, has to be 44.2.4.1:5060. However, there are several cases where it might not be. In what cases would it not be? First off, the client might be multi- homed. Multi-homed hosts are increasingly common as VPNs become more pervasive. VPNs show up as virtual interfaces, making hosts multihomed. The client may not be able to correctly guess which interface the REGISTER will be sent on. If the client guesses incorrectly, the IP address in the Contact header may be on a different interface than the one used to send the registration. The second case when the connection address and contact address don't match is when the client incorrectly discovers its own IP address, even when singly homed. We have observed this to frequently be the case. In fact, we have seen some systems report back 127.0.0.1 (the loopback address), in fact, as their IP address. Thus, even without NAT, the Contact address may not match the source address of the TLS or TCP connection used to register. In fact, this problem has nothing to do with NATs or firewalls. We have observed it happening in many real world scenarios. As a result, it is our recommendation that, as a general rule, clients use the "Contact cookie" and a persistent connection in order to ensure that they are reachable. This solution works for firewalls, NATs, multi-homed hosts, singly homed hosts, and a variety of other cases. J.Rosenberg,H.Schulzrinne [Page 9] Internet Draft entfw March 2, 2001 Storing incoming connections in a table for later reuse is useful even between proxies. If TCP or TLS is used between proxies X and Y, that connection can be stored by both X and Y, and thus reused for messaging in either direction. It is for this reason that we separate the connection table management from the registration processing. Such table management is needed if one of the proxies was on the inside of the firewall, for example. In that case, responses and requests in the reverse direction would need to be forwarded over the connection initiated by the proxy. 5 Handling RTP Dealing with SIP was the easy part. Getting the media through a NAT or firewall is more complex. RTP is on dynamic ports, peer-to-peer, and UDP, all of which are problematic for NATs, firewalls, or both. Our solution is to use connection oriented media, either UDP, TCP, or TLS, with the entities behind NATs or firewalls initiating the connection. This is discussed in more detail below. 5.1 NATs The trick to getting RTP through a NAT is to make sure it exhibits two characteristics. First, any users behind a NAT have to send the first packet to establish a NAT binding. Secondly, media sent back to that user must be to the source port where the media came from. In other words, if Joe calls Bob, and only Joe is behind a NAT, Joe must send the first UDP packet to Bob. Let's say Joe sends from IP address and port pair A,B to Bob at public address and port C,D. The NAT will translate port pair A,B to X,Y. Bob receives the media. To talk to Joe, it is essential that Joe send his media with source port C,D to destination port X,Y. This will be received by the NAT, and have the destination translated to A,B, where it is sent to Joe. Unfortunately, RTP does not work this way. When used with SIP, a conversation between Joe and Bob will result in two RTP sessions, one from Joe to the address Bob provided in his SDP, and one from Bob to the address provided by Joe in his SDP. This will not work with NAT. 5.1.1 Bi-Directional RTP Our solution is simple: we define bi-directional RTP. Bi-directional RTP runs over UDP. Like TCP, one side initiates a connection to the other side. As a result, one side is active (initiates the connection), and the other side is passive (waits for the connection). Like TCP, data in the reverse direction is sent to the port where the connection came from. Unlike TCP, a bi-directional RTP connection is created when the first packet arrives; there is no J.Rosenberg,H.Schulzrinne [Page 10] Internet Draft entfw March 2, 2001 explicit handshake or setup. There are no retransmissons or changes to the RTP protocol operation. The only difference is that bidirectional RTP involves sending media on the same socket used to receive it. An example flow using bidirectional media is shown in Figure 2. Joe calls Bob. Assume for this flow that Joe is behind a NAT, and Bob is not. For simplicities sake, we don't show proxies, and don't show much of the SIP detail. Joe indicates, in his SDP in the INVITE, that he is capable of bi-directional RTP, and wishes to be the active side of the connection (more on this later). Bob receives the INVITE, and responds with a 200 OK. His SDP indicates that he can be the passive side, and he provides the IP address and port to connect to. When Joe receives the 200 OK, an ACK is sent. Then, Joe sends a RTP packet to the IP address and port provided by Bob. The RTP packet passes through the NAT, and has its source address rewritten. When Bob receives this packet, the connection is established. Bob now has the IP address and port to send media back to. This address/port is the one from the source address of the RTP packet Bob just received (which has been natted). Bob sends media to this address. Those packets have their destination address natted, translated back to the address Joe used to send the first packet. In traditional unidirectional RTP, Joe would have included an IP address and port in the INVITE, and Bob would have sent media to this address, rather than the one in the RTP packet received from Joe. This does not work through NAT, since this address is wrong, and since no NAT binding has been established. Bidirectional RTP does not suffer this problem; note how Joe does not actually need to provide an IP address in the SDP in his INVITE. The call flow when Bob is behind the NAT is very similar, and is shown in Figure 3. Instead of Joe being the active side of the connection, Bob is the active side. It is important to note that the role of active or passive for the RTP connection is not tied to who makes the call. As a result, when only one the participants is behind a NAT, a direct UDP connection can be used between them. When both are behind NATs, an RTP translator is needed. This is described in Section 5.1.3. 5.1.2 Signaling Support SDP extensions are needed to allow the signaling discussed above to take place. Specifically, extensions are needed to indicate that a media stream is bidirectional RTP, and to allow each side to indicate J.Rosenberg,H.Schulzrinne [Page 11] Internet Draft entfw March 2, 2001 that they are active, passive, or can play either role. As it turns out, this is exactly the kind of signaling provided in the SDP extensions for TCP media [7]. That draft only handles TCP and TLS, but the semantics for TCP are identical to bidirectional UDP. Therefore, we propose that a new keyword, BAVP, be used to signal that the RTP is bidirectional. The direction attribute and the exchange procedures defined in [7] works as described for BAVP. Revisiting the flow in Figure 2, the SDP in the INVITE would actually appear as: c=IN IP4 10.0.1.1/127 m=audio 9 RTP/BAVP 0 a=direction:active and in the 200 OK as: c=IN IP4 4.5.11.3/127 m=audio 4444 RTP/BAVP 0 a=direction:passive 5.1.3 Both parties behind NAT The approach described above works if (1) only one of the two parties are behind a NAT, and (2) the party behind a NAT knows they are behind a NAT. To handle these problems, we introduce functionality into the proxies. The proxies can detect, by inspecting components of the messages, which parties are behind NATs. They can rewrite SDP in order to ensure that those parties behind NATs are active. Furthermore, when both are behind a NAT, the proxies can bring an RTP translator into the call. RTP translators can be thought of as RTP routers; they receive RTP packets on a particular incoming port, and send them out on a different port/address. When both parties are behind a NAT, the proxies will rewrite the SDP so that both sides initiate outward connections to the RTP translator. The RTP translator then hands packets back and forth between the connections. We show these boxes incorporated into the architecture in Figure 4. Only one translator is needed per call. Our architecture will only result in usage of the box when both parties are behind NATs, which J.Rosenberg,H.Schulzrinne [Page 12] Internet Draft entfw March 2, 2001 | | | | | | |---------------------------------------------> | | | INV sip:bob@Y.com | | | active | | | | | | | | | | |<--------------------------------------------- | | | 200 OK | | | passive | | | 4.5.11.3:4444 | | | | | | | |---------------------------------------------> | | | ACK | | | | | | | | | RTP from Joe to Bob | |----------------->---------------------------> | |S:10.0.1.1:12 |S:7.1.1.1:227 | |D:4.5.11.3:4444 |D:4.5.11.3:4444 | | | | | | RTP from Bob to Joe | |<--------------<-------------------------------| |S:4.5.11.3:4444 | S:4.5.11.3:4444 | |D:10.0.1.1:12 | D:7.1.1.1:227 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Joe NAT Bob Figure 2: Bi-directional RTP Flow J.Rosenberg,H.Schulzrinne [Page 13] Internet Draft entfw March 2, 2001 | | | | | | |---------------------------------------------> | | | INV sip:bob@Y.com | | | either | | | 7.1.1.1:88 | | | | | | | |<--------------------------------------------- | | | 200 OK | | | active | | | | | | | | | | |---------------------------------------------> | | | ACK | | | | | | | | |RTP from Bob to Joe | |<----------------<---------------------------< | |S:4.5.11.3:654 | S:10.0.1.1:44 | |D:7.1.1.1:88 | D:7.1.1.1:88 | | | | | |RTP from Bob to Joe | |>-------------->------------------------------>| | | | |S:7.1.1.1:88 | S:7.1.1.1:88 | |D:4.5.11.3:654 | D:10.0.1.1:44 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Joe NAT Bob J.Rosenberg,H.Schulzrinne [Page 14] Internet Draft entfw March 2, 2001 Figure 3: Bi-directional RTP Flow, NAT role reversed is the only case when one is needed. Our solution will result in the invocation of RTP forwarding services by the domain of the called party. The basic idea behind the solution is this. User agents must be able to initiate or terminate bidirectional RTP connections. The calling side always indicates support for both. When a proxy for a user in some domain receives a call (either to or from that user), that proxy accepts the responsibility for setting the direction attribute in the SDP in such a way that the client will be able to successfully handle media. Consider first proxy X, representing Joe. When Joe makes an outgoing call, Joe's UA will set the direction attribute in the SDP to "both" and include the IP address and port Joe is prepared to receive media on. This INVITE is sent to proxy X. Proxy X determines if Joe is behind a NAT. This can be done either through configuration (when the user signs up, they indicate whether they are behind a NAT or not), or through packet inspection. If the source address of the INVITE does not match the address and port in the Via header (especially if the ports don't match), Joe is behind a NAT. If Joe is behind a NAT, proxy X knows that Joe can not accept incoming connections. Thus, Joe cannot actually be either active or passive; he must be active. Proxy X therefore rewrites the SDP to indicate a direction of active. If, for some reason, Joe's UA had set the SDP to indicate either active or passive, this can be taken as an indicator that Joe knows he is (active) or is not (passive) behind a NAT, in which case no action is needed by the proxy. When the call arrives at proxy Y, proxy Y first determines the call routing. If it discovers that the call is to be routed to the called party's machine (which it knows based on whether the user registered with the Contact cookie), and it determines that the called party is behind a NAT (based on the source address of the REGISTER compared to the address in the top Via header of the REGISTER), the proxy may need to modify the SDP. If the SDP in the incoming INVITE indicates a direction of both, it is changed to passive (this way, the called party initiates the connection). If the direction is passive, nothing is done. If the SDP in the incoming INVITE indicates a direction of active, there is a problem. Both parties are only capable of initiating active connections. To handle this, proxy Y needs to involve an RTP translator. It allocates a pair of address/port pairs, A and B, from the translator. It rewrites the SDP in the INVITE to indicate a direction of passive, and sets the IP adress and port pair J.Rosenberg,H.Schulzrinne [Page 15] Internet Draft entfw March 2, 2001 +-------+ +-------+ | SIP | | SIP | | Proxy | | Proxy | | X | | Y | | | | | +-------+ +-------+ ---- /RTP \ | Forw.| \ / ---- +-------+ +-------+ ........|FW/NAT |............ ........|FW/NAT |............ . | | . . | | . . +-------+ . . +-------+ . . . . . . . . . . . . . . . . . . . . . . . . . . +-------+ . . +-------+ . . | Joe | . . | Bob | . . | SIP UA| . . | SIP UA| . . +-------+ . . +-------+ . ............................. ............................. Enterprise A Enterprise B Figure 4: RTP Translators J.Rosenberg,H.Schulzrinne [Page 16] Internet Draft entfw March 2, 2001 to A. This will ensure that the called party initiates an RTP connection out to the translator. Similarly, in the SDP in the response, the direction (which will be active) is rewritten to passive, and the IP address is set to B. This will ensure that the calling party initiates an RTP connection out to the translator. The proxy then tells the translator that packets received on A should be relayed to the connection on B, and vice a versa. The actions at the proxies for incoming and outgoing calls are summarized in Table 1. Call Direction SDP direction rewrite to note Incoming both passive Incoming active passive introduce RTP translator Incoming passive - Outgoing both active Outgoing active - Outgoing passive - Table 1: Rules for SDP Rewriting Based on these rules, we can analyze the four cases. In case one, neither party is behind a NAT. The caller indicates a direction of "both" in the SDP. The local outbound proxy does not change that, since it detects that the caller is not behind a NAT. The call is forwarded to the proxy for the called party. It doesn't modify the SDP either, and forwards the call to the called party. In its response, the called party indicates that it can support a direction of "both". When the response is delivered to the calling party, both sides initiate bidirectional RTP connections to each other. One of them is chosen, and is used for media. In the second case, the caller is behind a NAT, but the called party is not. The caller indicates a direction of "both" in the SDP. The local outbound proxy detects that the caller is behind a NAT. It therefore modifies the SDP to indicate a direction of "active". The call is forwarded to the proxy for the called party. It determines that the called party is not behind a NAT. So, it leaves the SDP alone. The called party sees that the caller requested the active side of the connection. So, in the 200 OK response, the called party indicates passive. This 200 OK is forwarded back to the caller. The caller initiates a bidirectional RTP connection the called party, which succeeds. The media is sent over that connection. J.Rosenberg,H.Schulzrinne [Page 17] Internet Draft entfw March 2, 2001 In the third case, the caller is not behind a NAT, but the called party is. The caller indicates a direction of "both" in the SDP. The local outbound proxy does not change that, since it detects that the caller is not behind a NAT. The call is forwarded to the proxy for the called party. This proxy determines that the called party is behind a NAT. It rewrites the direction tag in the SDP in the INVITE from "both" to "passive". This is received at the called party. It has no choice but to respond with a direction of "active" in its 200 OK. This is forwarded to the calling party. The called party then initiates a bidirectional RTP connection to the caller, which succeeds. The media is sent over that connection. In the fourth, and worst case, scenario, both are behind NATs. The caller indicates a direction of "both" in the SDP. The local outbound proxy detects that the caller is behind a NAT. It therefore modifies the SDP to indicate a direction of "active". The call is forwarded to the proxy for the called party. THis proxy also detects that the called party is behind a NAT. However, the SDP indicates a direction of "active", which is bad. The proxy then brings in an RTP translator, and rewrites the direction to be passive. It also sets the c line and m line to contain address/port pair A of the translator. This INVITE received at the called party. It has no choice but to respond with a direction of "active" in its 200 OK. The 200 OK is received at the proxy, where it rewrites the direction tag from "active" to "passive". It also sets the c line and m line to contain address/port pair B of the translator. This INVITE is received at the calling party. Both sides then initiate outbound connections. The caller sends RTP to address/port B, and the callee sends RTP to address/port A. The translator exchanges media between these two connections. Either the proxy or the RTP translator can manage the lifecycle of the connection binding. If the proxy does it, the proxy must record- route When the call is over (known through the BYE), the proxy destroys the connections and connection bindings from the translator. If the RTP translator manages the lifecycles, the proxy need not ever record route or maintain call state. When the call is over, the caller and callee both disconnect their RTP connections to the translator (this is done with an RTCP BYE). When both connections disconnect, the translator can destroy the bindings. In cases where there is no RTP translator available, and both parties are behind a NAT, media cannot flow. In some cases, this will be detectable by the called party or their proxy (if the incoming SDP has bidirectional media with a direction of active, and the called party is behind a NAT, and no translator is available). In this case, the called party or proxy responds with a 488 Not Acceptable Here, and includes a Warning header indicating a code 308 - NAT Traversal J.Rosenberg,H.Schulzrinne [Page 18] Internet Draft entfw March 2, 2001 Failure. 5.2 Firewalls Because firewalls restrict connections to outbound only, the same problem that plagues NATs also plagues firewalls. The same solution as described above can also solve it, with a few minor tweaks. The solution in Section 5.1 is defined for UDP. UDP will not work through firewalls. Therefore, RTP over TCP or TLS is used instead. In the worst case, the RTP would need to be carried over a TLS connection on port 443. Besides this difference, the solution for firewall is the same as described for NAT. Note that since SIP may be over TLS to port 443 as well, the proxy and the RTP translator should not be on the same IP address. 6 Caveats There are many caveats with our proposed solutions, especially for firewall. 6.1 NAT Solutions o RTP translators are horrible. The author spent much time arguing against such devices, on the grounds that the underlying IP network already providing routing capabilities, and that these do not need to be replicated at the voice transport layer. They will increase overall voice latency, introduce another point of failure, and incur additional costs to providers. However, they are unavoidable given that the fundamental semantic of the IP address, that it is a globally reachable point for communications, has been violated by NATs. Perhaps this is argument can be rephrased as, "unreliable and delayed communication beats no communication." o If the RTP translator is not co-resident with the proxy, some kind of control protocol is needed to allocate addresses and to establish bindings. No such protocol exists right now. The midcom protocol [3] or MGCP [8] might be used for this purpose. We expect these translators to be bundled with proxies, and thus make use of proprietary protocols initially. o It is possible that both caller and called party are behind a NAT, but are behind *the same* NAT. In this case, no RTP translator is needed. In theory, this case can be hard to detect, but in practice, can frequently be determined administratively. As an example, a SIP provider might be providing centrex types of services to users in a network behind a NAT. The proxy providing these services will know J.Rosenberg,H.Schulzrinne [Page 19] Internet Draft entfw March 2, 2001 which users belong to the same enterprise, and it can modify its behavior accordingly. Even if the proxy is wrong, the worst case is that an RTP translator is involved, increasing voice latency. o If the calling party is behind a NAT, an RTP connection cannot be established until the 200 OK is returned to the caller. This means that the post-pickup delay increases by an RTT, which introduces additional clipping. This can be solved through early media. The SDP is returned in a 183, allowing the media connection to be established before the 200 OK. o The use of persistent TCP or TLS connections for SIP between the user agents and their proxies makes clustering more complex. With traditional UDP, a call for some user could arrive at any proxy that has access to the location service which can route the call to Bob. Not so any longer. With persistent connections, the users are partitioned across the proxies in a cluster. 6.2 Firewall Solutions o Riding on top of port 443 for SIP over TLS goes against the principles of the guidelines established by the IESG [9]. o TLS or TCP will result in very bad voice delays as soon as the packet loss is nonzero. Interestingly, with zero packet loss, the delays for voice over TCP will be equal to those of voice over UDP. Clients will need adaptive voice buffer algorithms that can tolerate wide swings in latencies. o Current SIP client implementations do not require a TCP stack. The firewall solution will require TCP and/or TLS. o For firewalls, our approach requires a TLS server process (to receive RTP) embedded within a SIP enabled communications client. This will require a public/private key and its associated certificate, available to the client, issued from a Certification Authority (CA) that is known to the other party. Similarly, use of a TLS client will require that the client be configured with the keys of a set of well known CAs. Support for TCP and/or TLS in the softphones can be mitigated by deploying UDP to TCP/TLS translation proxies inside of the firewall. 7 Security Considerations RTP translators are effectively man-in-the middle systems. As a J.Rosenberg,H.Schulzrinne [Page 20] Internet Draft entfw March 2, 2001 result, a rogue proxy and RTP translator can listen in on the media of all users initiating calls through it. To prevent this, clients initiating TLS connections to a server should verify that the server name in the SDP is a subdomain of the name presented in the certificate. Furthermore, the client should only connect to servers whose domains are subdomains of their service provider, or the provider of the other party in the call. 8 Conclusion In this draft, we have proposed some modifications to SIP operation which allow it to successfully pass through NATs and firewalls. We believe our NAT solution is very workable. It has minimal impact on clients, allows voice to run over UDP, and uses direct UDP transport in all but the worst case. Our solutions for firewalls are less palatable. The ideal solution is for firewall administrators to allow SIP (over TCP on 5060 or TLS on 5061) out through the firewall, and to eventually deploy ALGs, preferably using the midcom architecture. We believe that solving the firewall and NAT problems are critical for deployment of SIP. 9 Acknowledgements We would like to thank Jeffrey Citron and John Butz from Vonage for their efforts at verifying UDP NAT capabilities in existing commercial products. 10 Author's Addresses Jonathan Rosenberg dynamicsoft 72 Eagle Rock Avenue First Floor East Hanover, NJ 07936 email: jdrosen@dynamicsoft.com Henning Schulzrinne Columbia University M/S 0401 1214 Amsterdam Ave. New York, NY 10027-7003 email: schulzrinne@cs.columbia.edu J.Rosenberg,H.Schulzrinne [Page 21] Internet Draft entfw March 2, 2001 11 Bibliography [1] M. Holdrege and P. Srisuresh, "Protocol complications with the IP network address translator (NAT)," Internet Draft, Internet Engineering Task Force, Oct. 2000. Work in progress. [2] J. Rosenberg, D. Drew, and H. Schulzrinne, "Getting SIP through firewalls and NATs," Internet Draft, Internet Engineering Task Force, Feb. 2000. Work in progress. [3] P. Srisuresh, J. Kuthan, and J. Rosenberg, "Middlebox communication architecture and framework," Internet Draft, Internet Engineering Task Force, Feb. 2001. Work in progress. [4] P. Srisuresh and M. Holdrege, "IP network address translator (NAT) terminology and considerations," Request for Comments 2663, Internet Engineering Task Force, Aug. 1999. [5] E. Rescorla, "HTTP over TLS," Request for Comments 2818, Internet Engineering Task Force, May 2000. [6] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP: session initiation protocol," Request for Comments 2543, Internet Engineering Task Force, Mar. 1999. [7] D. Yon, "TCP-Based media transport in SDP," Internet Draft, Internet Engineering Task Force, Nov. 2000. Work in progress. [8] M. Arango, A. Dugan, I. Elliott, C. Huitema, and S. Pickett, "Media gateway control protocol (MGCP) version 1.0," Request for Comments 2705, Internet Engineering Task Force, Oct. 1999. [9] K. Moore, "On the use of HTTP as a substrate for other protocols," Internet Draft, Internet Engineering Task Force, Oct. 2000. Work in progress. J.Rosenberg,H.Schulzrinne [Page 22]