Draft: draft-ietf-sipping-sbc-funcs-00.txt Reviewer: Spencer Dawkins [spencer@mcsr-labs.org] Review Date: Tuesday 12/12/2006 5:40 PM CST Review Deadline: 12/20/2006 Status: WGLC Summary: Summary - this draft is on the right track, but has some issues that should be fixed before forwarding to the IESG. Comments: (as follows - technical comments are "Spencer:", editorial comments are "Spencer: Nit:") Abstract This documents describes functions implemented in Session Initiation Protocol (SIP) intermediaries known as Session Border Controllers (SBCs). Although the goal of this document is to describe all the Spencer: Nit - since describing all the functions of an undefined component in the SIP reference architecture is probably unrealistic, is this perhaps "commonly provided functions"? functions of SBCs, a special focus is given to those practices that are viewed to be in conflict with SIP architectural principles. It also explores the underlying requirements of network operators that have led to the use of these functions and practices in order to identify protocol requirements and determine whether those requirements are satisfied by existing specifications or additional standards work is required. 1. Introduction In the past few years there has been a rapid adoption of Session Initiation Protocol (SIP) [1] and deployment of SIP-based communications networks. This has often out-paced the development Spencer: Nit: s/out-paced/outpaced/ and implementation of protocol specifications to meet network operator requirements. This has led to the development of proprietary solutions. Often these proprietary solutions are implemented in network intermediaries known in the marketplace as Session Border Controllers (SBCs) because they typically are deployed at the border between two networks. The reason for this is that network policies are typically enforced at the edge of the network. Even though many SBCs currently break things like end-to-end security Spencer: "things like" seems awfully vague for a technical specification - the "Architectural Issues" subsections probably constitute a canonical list of breakages, but it would be nice to have a canonical list gathered in one place. and can impact feature negotiations, there is clearly a market for them. Network operators need many of the features current SBCs provide and many times there are no standard mechanisms available to provide them in a better way. This document describes the most common functions of current SBCs and the reasons that network operators require them. It also describes the architectural issues with these functions. Although this document focuses on functions common to SBCs, many of the issues raised apply to other types of Back-to-Back User Agents (B2BUAs.) 2. Background on SBCs SBCs usually sit between two service provider networks in a peering environment, or between an access network and a backbone network to provide service to residential and/or enterprise customers. They provide a variety of functions to enable or enhance session-based multi-media services (e.g., Voice over IP). These functions include: a) perimeter defense (access control, topology hiding, DoS prevention, and detection); b) functionality not available in the endpoints (NAT traversal, protocol interworking or repair); and c) network management (traffic monitoring, shaping, and QoS). Some of these functions may also get integrated into other SIP elements (like pre-paid platforms, 3GPP P-CSCF, 3GPP I-CSCF etc). Spencer: Nit - It would be great to have some reference for these acronyms, but please expand them, at a minimum. 2.1. Peering Scenario Spencer: Section 2.1 and 2.2 seem to conflate architecture and functions in the description text - is there any reason to think that functions are only deployed in one of the two scenarios? If not, I'd drop the function lists from these sections, because they aren't explained until Section 3 and only distract from the architectural description here. A typical peering scenario involves two network operators who exchange traffic with each other. For example, in a toll bypass application, a gateway in operator A's network sends an INVITE that is routed to the softswitch (proxy) in operator B's network. The proxy responds with a redirect (3xx) message back to the originating gateway that points to the appropriate terminating gateway in operator B's network. The originating gateway then sends the INVITE to the terminating gateway. Figure 2 illustrates the peering arrangement with a SBC where Operator A is the outer network, and Operator B is the inner network. Spencer: in Section 2.1, I'd end this section here (based on previous comment). Operator B uses the SBC to control access to its network, protect its gateways and softswitches from unauthorized use and DoS attacks, and monitor the signaling and media traffic. It also simplifies network management by minimizing the number ACL (Access Control List) entries in the gateways. The gateways do not need to be exposed to the peer network, and they can restrict access (both media and signaling) to the SBCs. The SBC helps ensure that only media from sessions the SBC authorizes will reach the gateway. 3.1.2. Architectural Issues This functionality is based on a hop-by-hop trust model as opposed to an end-to-end trust model. The messages are modified without subscriber consent and could potentially modify or remove information about the user's privacy, security requirements and higher layer applications which are communicating end-to-end using SIP. Either user in an end-to-end call may perceive this as a Man In The Middle (MitM) attack. Spencer: this seems understated. the text in Section 3.2.2 seems better: "user agents do not have any way to distinguish the SBC actions from an attack by a MitM (Man-in-the-Middle)." Modification of IP addresses in Unifor Resource Indetifiers (URIs) Spencer: Nit: s/Unifor Resource Indetifiers/Uniform Resource Identifiers/ within SIP headers can lead to application failures if these URIs are communicated to other SIP servers outside the current dialog. These URIs could appear in a REFER request or in the body of NOTIFY request as part of an event package. If these messages traverse the same SBC, it has the opportunity to restore the original IP address. On the other hand, if the REFER or NOTIFY message returns to the original network through a different SBC that does not have access to the address mapping, the recipient of the message will not see the original address. This may cause the application function to fail.[[Comment.1: Do we have a sane example of where this is a real problem? It sounds somewhat contrived to me, but I agree it is a theoretical concern - Alan.]][[Comment.2: I personally would like to include this text, although it might be more of a theoretical concern. - Jani]] Spencer: You guys are the experts, but if the SBC is acting as a B2BUA and manages to NOT be on the path for responses, something sounds REALLY broken to me... especially if the SBC inserts a Record-Route with its own SIP URI, as shown in the example. Like a regular proxy server that inserts a Record-Route entry, the SBC handles every single message of a given SIP dialog. If the SBC loses state (e.g., the SBC restarts for some reason), it may not be able to route messages properly. For example, if the SBC removes "Via" entries from a request and then restarts losing state, the SBC may not be able to route responses to that request; depending on the information that was lost when the SBC restarted. [[Comment.3: There are techniques to mitigate this problem, not all SBCs suffer from this. Is this worth capturing in the text? [Alan]]][[Comment.4: No, not all suffer from this, but some do, so I believe we shouldn't remove this text. - Jani]] Spencer: agree with Jani here, since the text says "may", so it's describing a possible problem, not a problem with all SBCs. This is only one example of topology hiding, in some cases, SBCs may modify other headers, including the Contact header field values. Spencer: Is there a canonical list? Perhaps saying "as described in Sections 4.1 and 5 of [2]" would give more clue... 3.2. Media Traffic Shaping 3.2.1. General Information and Requirements Since the media path is independent of the signaling path, the media may not traverse through the operator's network unless the SBC modifies the session description. By modifying the session Spencer: seems slightly backwards - suggest "Since the media path is independent of the signaling path, the SBC must modify the session description to ensure that the media traverses through the operator's network"? description the SBC can force the media to be sent through a media relay which may be co-located with the SBC. Some operators do not want to reshape the traffic, but only to monitor it for collecting statistics and making sure that they are able to meet any business service level agreements with their subscribers and/or partners. The protocol techniques needed for monitoring media traffic are the same as for reshaping media traffic. Spencer: I'm not sure I'm getting this one. You're saying that the operator redirects the media to be sent through a media relay in order to monitor it, but this isn't traffic shaping? I viewed "monitor" as much less intrusive than this... SBCs on the media path are also capable of dealing with the "lost BYE" issue if either endpoint dies in the middle of the session. The SBC can detect that the media has stopped flowing and issue a BYE to the both sides to cleanup any state in other intermediate elements Spencer: Nit: s/the both/both/ and the endpoints. 3.2.3. Example One problem with media traffic shaping is that the SBC needs to understand the session description protocol and all extensions used by the user agents. This means that in order to use a new extension Spencer: this "One problem" is applicable to more than just Section 3.2 - for example, SBCs may need to understand all SIP headers in use in order to perform topology hiding. This is one of several broader problems. (e.g., an extension to implement a new service) or a new session description protocol, SBCs in the network may need to be upgraded in conjunction with the endpoints. Certain extensions that do not require active manipulation of the session descriptors to facilitate traffic shaping will be able to be deployed without upgrading existing SBCs, depending on the degree of transparency the SBC implementation affords. In cases requiring an SBC modification to support the new protocol features, the rate of service deployment may be affected. [[Comment.5: I do not think this will slow down innovation; innovation is a distinct phase of development and separable from operational network deployment. -Alan]][[Comment.6: I don't quite get what you are suggesting. If you want to change the text, go ahead. - Jani]] 3.3. Fixing Capability Mismatches 3.3.2. Architectural Issues SBCs fixing capability mismatches insert a media element in the media path using the procedures described in Section 3.2. Therefore, these SBCs have the same concerns as SBCs performing traffic shaping: the SBC modifies SIP messages without explicit consent from any of the user agents. This may break end-to-end security and application extensions negotiation. Spencer: IP version number splicing (as shown in the example) is one of the more benign mismatches; the 3GPP/Packet Cable mismatch (as mentioned in 3.2.1) is less benign. This section should mention the fragility of fixing capability mismatches in the long term, if over time an increasing number of incompatibilities are built into various network elements that the SBCs must then adjust in order to allow interworking. [[Comment.7: I have removed the network engineering concern; this is an unrealistic anti-Apple-Pie problem that could only arise through a fundamental bug in either configuration or SBC implementation. -Alan]][[Comment.8: Ok. - Jani]] 3.4. NAT Traversal 3.4.1. General Information and Requirements NAT traversal in this instance refers to the specific message modifications required to assist a user-agent in maintaining SIP and media connectivity when there is a NAT device located between the Spencer: "NAT traversal" means something more like ICE to me, and I may not be the only one who's confused. Wasn't this why "traversal" became "toolkit" in the recent STUN rebranding (since STUN wasn't quite "NAT traversal", either)? "Maintaining SIP-related NAT Bindings" as a section title, and elsewhere? user-agent and the proxy/registrar and, most likely, any other user- agent. Spencer: I am confused - why isn't this "between a user-agent and a proxy/registrar"? Even if it's correct, "most likely" seems awfully imprecise. An SBC performing a NAT (Network Address Translator) traversal function for a user agent behind a NAT sits between the user agent and the registrar of the domain. NATs are widely deployed in various access networks today, so operators have a requirement to support it. When the registrar receives a REGISTER request from the user agent and responds with a 200 (OK) response, the SBC modifies such a response decreasing the validity of the registration (i.e., the registration expires sooner). This forces the user agent to send a new REGISTER to refresh the registration sooner that it would have done on receiving the original response from the registrar. The REGISTER requests sent by the user agent refresh the binding of the NAT before the binding expires. Spencer: is there any guidance you can point to on how the SBC chooses a new registration lifetime? I know this is an Informational spec, not a protocol spec, but if not, the heuristic nature of the timer probably needs to be mentioned in Section 3.4.2. 3.5. Access Control 3.5.1. General Information and Requirements Network operators may wish to control what kind of signaling and media traffic their network carries. There is strong motivation and a requirement to do access control on the edge of an operator's network. Access control can be based on, for example, IP addresses or SIP identities. Spencer: hmmm. Isn't this most commonly done in a wireless environment? Most of the wireless carriers I worked with were still using link-layer identifiers - might be clearer if this possibility was also mentioned ("can be based on link-layer identifiers, IP addresses, or SIP identities")... This function can be implemented by protecting the inner network with firewalls and configuring them so that they only accept SIP traffic from the SBC. This way, all the SIP traffic entering the inner network needs to be routed though the SBC, which only routes messages from authorized parties or traffic that meets a specific policy that is expressed in the SBC administratively. Spencer: you don't include the firewalls in Figure 12, which is fine, but it would be nice to show a picture like Figures 2/3 with the user agent, firewall, and SBC, so it's more obvious how "protecting the inner network with firewalls" actually works. 3.5.2. Architectural Issues Since the SBC needs to handle all SIP messages, this function has scalability implications. In addition, the SBC is a single point of failure from an architectural point of view. Although, in practice, many current SBCs have the capability to support redundant configuration, which prevents the loss of calls and/or sessions in the event of a failure on a single node. [[Comment.11: I am tempted to remove this paragraph; this is a general architectural problem that is not truly specific to SBCs. A proxy configured into a SIP architecture that Record-Route'd requests would ALSO be a single point of failure. Provisioning a network to deal with the outage of a single element is just good design. -Alan]][[Comment.12: I agree that this is not specific only to SBCs, but is specific also to SBCs. I wouldn't like to remove this paragraph. - Jani]] Spencer: I would prefer to see the paragraph stay. 3.5.3. Example In this scenario, the SBC first identifies the caller, so it can determine whether or not to give signaling access for the caller. Some SBCs may rely on the proxy to authenticate the user-agent placing the call. After authentication, the SBC modifies the session descriptors in INVITE and 200 OK messages in a way that the media is going to flow through SBC itself. When the media starts flowing, the SBC can inspect whether the callee and caller use the codec(s) that they had previously agreed on. Spencer: is not using the codec(s) you negotiated a common problem? :-( 3.6. Protocol Repair 3.6.2. Architectural Issues In most cases, this function can be seen as being compatible with SIP architectural principles, and it does not violate the end-to-end model of SIP. The SBC repairing protocol messages behaves as a proxy server that is liberal in what it accepts and strict in what it sends. Spencer: this section should also point out that using SBCs to repair non-compliant implementations is a short-term solution to get Vendor A Release 3.2 talking to Vendor B Release 5.1, but if you have three or four vendors, and are slowly upgrading the devices so that you have two (or even three) releases of each vendor's software operational in your network, the complexity of the SBC implementation grows, and the chance for "false positives" that are tagged incorrectly for repair also grows. Knowing what to do with capability mismatches is a lot easier than knowing what to do with protocol repairs. 4. Derived Requirements Spencer: I wish this list appeared earlier in the document - it's almost lost at this stage. Some of the functions listed in this document are more SIP-unfriendly than others. This list requirements that are derived from the functions that break the principles of SIP in one way or the other. Spencer: Nit: s/the other/another/ The derived requirements are: Req-1: There should be a SIP-friendly way to hide network topology information. Currently this is done e.g., by stripping and replacing header fields, which is against the principles of SIP. Spencer: I'm not sure why Req-1 is not "SIP-friendly" - B2BUAs are UAs, so what happens on each side of the SBC seems reasonable if the SBC is a B2BUA. Req-2: There should be a SIP-friendly way to direct media traffic through intermediaries. Currently this is done e.g., by modifying session descriptors, which is against the principles of SIP. Spencer: of the three derived requirements, Req-2 gives me the most heartburn - shouldn't it be "direct media traffic through intermediaries without user consent"? Req-3: There should be a SIP-friendly way to fix capability mismatches in SIP messages. Currently this is done by modifying SIP messages, which violates e.g., end-to-end security. Spencer: Req-3 seems to need scoping - the example given in this document, IPv4/IPv6 mismatch, is easily identified and obviously fine (as long as there is a need for IPv6 transition mechanisms), but the 3GPP/Packet Cable mismatch isn't as easy to identify (can you *know* you need to do the repair, without manual configuration?), and this could even be morphed into a requirement to do protocol repair in a SIP-friendly way, which I have a lot of concerns about. 5. Security Considerations Many of the functions this document describes have important security and privacy implications. If the IETF decides to develop standard mechanisms to address those functions, security and privacy-related aspects will need to be taken into consideration. [[Comment.13: I wonder if it is worth classifying the specific type of security problems and assembling them here. The remainder of this document can then refer to the specific problem a given operational activity has given today's typically implementation mechanisms. [Alan]]][[Comment.14: My gut feeling is that it would require a lot of work. If you want to do it, go ahead, but at least I don't have the time to do it. - Jani]] Spencer: just looking through the "architectural issues" subsections - haven't most of the obvious security considerations been called out previously in the draft? This section needs more content, but I don't think a lot of work is required to develop the content. ]