Draft: draft-ietf-sipping-service-identification-00.txt Reviewer: Spencer Dawkins [spencer@mcsr-labs.org] Review Date: 11/2/2007 Review Deadline: 10/31/2007 Status: RAI review Summary: This draft is on the right track. I have more nit-level comments than anything else. I also wanted to thank you for putting this together. It's one of the most helpful things anyone has done lately to improve interaction with other SDOs ("we're too busy telling other SDOs something is a bad idea to write a draft explaining why"), and it's very clear for a 00 draft. I have a couple of additional topics at the end of Section 6. /Spencer:/for substantive comments, /Spencer (nit):/ for nits. Comments: --------- Abstract This document considers the problem of service identification in the Session Initiation Protocol (SIP). Service identification is the process of determining the user-level use case that is driving the signaling being utilized by the user agent. While seemingly simple, this process is quite complex, and when not addressed properly, can lead to fraud, interoperability problems, and stifling of innovation. Spencer: is this ordering (fraud, interoperability problems, and stifling of innovation) the right one? This document discusses these problems and makes recommendations on how to address them. Spencer: It might be nice to give some indication of WHY the process is complex... 1. Introduction This breadth of applicability is SIPs greatest asset, but it also Spencer (nit): s/SIPs/SIP's/ introduces numerous challenges. One of these is that, when an endpoint generates a SIP INVITE for a session, or receives one, that session can potentially be within the context of any number of different use cases and endpoint types. For example, a SIP INVITE with a single audio stream could represent a Push-To-Talk session Spencer (nit): got a reference for Push-to-Talk? between mobile devices, a VoIP session between softphones, or audio- based access to stored content on a server. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [1]. Spencer: I'm questioning the use of NOT RECOMMENDED in the security section (I don't think it is really 2119 language), and if that's changed, there are no other uses of 2119 language in this document. 3. Services and Service Identification The problem of identifying services within SIP is not a new one. The problem has been considered extensively in the context of presence. In particular, the presence data model for SIP [3] defines the concept of a service as one of the core notions that presence describes. Services are described in Section 3.3 of RFC 4479, which has this to say on the topic: 3.3. Service Spencer (extreme nit): it's probably good to drop the "3.3", since you've already given the detailed reference in the previous paragraph, and it's misleading at first glance. Yeah, this insertion is indented. Yeah, I was still confused. 4.1. IPTV vs. Multimedia IP Television (IPTV) is the usage of IP networks to access traditional television content, such as movies and shows. SIP can be utilized to establish a session to a media server in a network, which then serves up multimedia content and streams it as an audio and video stream towards the client. Whether SIP is ideal for IPTV is, in itself, a good question. However, such a discussion is outside the scope of this document. Spencer (nit): the "whether/however" text in this paragraph could reasonably be moved to Section 4, because it's applicable to multiple example services... Consider multimedia conferencing. The user accesses a voice and video conference at a conference server. The user might join in listen-only mode, in which case the user receives audio and video streams, but does not send. 4.3. Configuration vs. Pager Messaging However, MESSAGE is sometimes used for the delivery of content to a device for other purposes. For example, some providers use it to deliver configuration updates, such as new phone settings or parameters, or to indicate that a new version of firmware is available. Though not designed for this purpose, MESSAGE gets used since, in existing wireless networks, SMS are used for this purpose, Spencer (nit): "SMS" is singular - s/SMS are/SMS is/, which is also more correct... and MESSAGE is the SIP equivalent of SMS. 5. Using Service Identification It is important to understand what the service identity would be utilized for, if known. The discussions in Section 4 give some hints to the possible usages. Here, we explicitly discuss them. Spencer: I may be having pronoun problems ("them"?), but if this paragraph says "hints to the possible usages of service identity", I am confused - I didn't see anything that looked like that. Maybe "The description of possible services in Section 4 provide a basis for discussing how service identity could be used in this section"? 5.2. Application Invocation in the Network Another usage of a service identifier would be to cause servers in the SIP network to provide additional processing, based on the service. For example, an INVITE issued by a user agent for IPTV would pass through a server that does some kind of content rights management, authorizing whether the user is allowed to access that content. On the other hand, an INVITE issued by a user for multimedia conferencing would pass through a server providing "traditional" telephony features, such as outbound call screening and call recording. It would make no sense for the INVITE associated with IPTV to have outbound call screening and call recording applied, and it would make no sense for the multimedia conferencing INVITE to be processed by the content rights management server. Indeed, in these cases, its not just an efficiency issue (invoking servers when Spencer (nit): s/its/it's/ not needed), but rather, truly incorrect behavior can occur. For example, if an outbound call screening application is set to block outbound calls to everything except for the phone numbers of friends and family, an IPTV request that gets processed by such a server would be blocked (as its not targeted to the AOR of a friend or Spencer (nit): s/its/it's/ family member). This would block a user's attempt to access IPTV services, when that was not the goal at all. 5.5. Accounting and Billing Service authorization and accounting/billing go hand in hand. Presumably, one of the primary reasons for authorizing that a user Spencer (nit): I think you can drop the "presumably"... :-) can utilize a service is that they are being billed differently based on the type of service. Consequently, one of the goals of a service identity is to be able to include it in accounting records, so that the appropriate billing model can be applied. 5.6. Negotiation of Service As an example, s user can do both the game and the voice chat service of Section 4.2. They initiate a session to a target AOR, but the devices used by that user can only support voice chat. Consequently, Spencer: this wasn't quite clear enough to me - s/Consequently/The called device responds based on its service capability and/? voice chat gets utilized for the session. 6. Key Principles of Service Identification In this section, we describe some of the key principles of performing service identification. Spencer: this is totally presentation-layer, but it would help me to "get it" if you collected the principles scattered throughout this section, and put them together in one list (probably at this point in the document). I'm saying this because I think they are really important, and they would "stand alone", even if the explanatory text is included in subsequent subsections. Think "numbered list"... 6.1. Services are a By-Product of Signaling This is ultimately an expression of the principle of DWIM vs. DWIS (Do-What-I-Mean vs. Do-What-I-Say). Explicit signaling is DWIS - the user is asking for a service by invoking the signaling that results in the desired effect. A service identifier is DWIM - an unspecific request for something that is ill-defined and non-interoperable. Spencer: this is slightly harsher than I understand the situation to be - perhaps "A service identifier is DWIM - the user is asking for a service but is including information that isn't necessary to result in the desired effect, and there is no direct connection between the service identifier that the user provides and the service the user receives"? 6.2. Perils of Explicit Identifiers Clearly, if the signaling message itself contains enough information to identify the service, inclusion of an extra field to say the same thing is going to be redundant. Redundancy by itself is not a big deal. However, redundancy can lead to other,more significant problems. Spencer (nit): s/other,more/other, more/ 6.2.1. Fraud First and foremost, it can lead to fraud. If a provider uses the Spencer (nit): s/it/the user of a service identifier/ service identifier for billing and accounting purposes, or for authorization purposes, it opens an avenue for attack. The user can construct the signaling message so that its actual effect (which is the service the user will receive), is what the user desires, but the service identity (which is what is used for billing and authorization) doesn't match, and indicates a cheaper service, or one Spencer: this text doesn't seem quite right - "the signaling and service identity (which is what is used for billing and authorization) don't match, and the service identifier indicates ..."? that the user is authorized to receive. If, however, the service identity used by the domain admistrator is derived from the signaling itself, the user cannot lie. If they did lie, they wouldn't get the desired service. Consider the example of IPTV vs. multimedia conferencing. If multimedia conferencing is cheaper, the user could send an INVITE for an IPTV session, but include a service identifier which indicates multimedia conferencing. They get the service associated with IPTV, Spencer (nit): s/They get/The user gets/ but at the cost of multimedia conferencing. This same principle shows up in other places. For example, in the identification of an emergency services call [6]. It is desirable to give emergency services calls special treatment, such as being free, authorized even when the user cannot otherwise make calls, and to give them priority. If emergency calls where indicated through something other than the target of the call being an emergency services URN [7], it would open an avenue for fraud. The user could place any desired URI in the request-URI, and indicate that the call is an emergency services call. This could would then get special treatment, but of course get routed to the target URI. The only way to prevent this fraud is to consider an emergency call as any call whose target is an emergency services URN. Thus, the service identification here is based on the target of the request. When the target is an emergency services URN, the request can get special treatment. The user cannot lie, since there is no way to separately indicate this is an emergency call, besides targeting it to an emergency URN. Spencer: somewhere around here, you might add something like "if the network operator must verify the service identity verification using signaling, the service identifier provides no additional value beyond the signaling." 6.2.2. Systematic Interoperability Failures How can inclusion of an explicit service identifier cause loss of interoperability? When such an identifier is used to drive functionality - such as dispatch on the phones, in the network, or QoS authorization, it means that the wrong thing can happen when this field is not set properly. Consider a user in domain 1, calling a user in domain 2. Domain 1 provides the user with a service they call "voice chat", which utilizes voice and IM for real time conversation, driven off of a buddy list application on a PC. Domain 2 provides their users with a service they call, "text telephony", which is a voice service on a wireless device that also allows the user to send text messages. Consider the case where domain 1 and domain 2 both have their user agents insert a service identifiers into the request, and then use that to derive QoS authorization, accounting, and invocation of applications in the network and in the device. The user in domain 1 calls the user in domain 2, and inserts the identifier "Voice Chat" into the INVITE. When this arrives at the proxy in domain 2, the service is unknown. Consequently, the request does not get the proper QoS treatment. When it gets delivered to the User Agent of the user in domain 2, the user agent does not see a service it understands, and so consequently, does not Spencer: this is probably clear to everyone but me ... but I'm confused between "does not get the proper QoS treatment" and "does not see a service it understands" - are you talking about two different failures here? If your point is "I told you to do GROMMET service and you don't know what that is, so you fail the call, but if you had ignored what I told you and looked at the signaling, the call would have succeeded", maybe dropping the proxy/QoS treatment side trip would be clearer. know to dispatch the request to the right application software. Thus, this call has completely failed, even when it could have succeeded. This illustrates the following key point: ... Usage of explicit service identifiers in the request will result in inconsistencies with results of any SIP negotiation that might Spencer: s/with/between the service identifier and the/ otherwise be applied in the session. Of course, there are cases where negotiating to a common baseline is not what is desired. SIP provides tools (such as Require), to force the call to fail unless the desired capabilities are supported. However, this is not recommended as a general rule [4]. Spencer: again, totally presentation-layer, but this paragraph isn't particularly helpful to me - the section would be clearer if the paragraph was deleted completely. 6.2.3. Stifling of Service Innovation The probability that any two pair of service providers end up with the same set of services, and give them the same names, becomes decreasingly small as the number of providers grow. Indeed, it would almost certainly require a centralized authority to identify what the services are, how they work, and what they are named. This, in turn, leads to a requirement for complete homogeneity in order to facilitate interconnection. Two providers cannot usefully interconnect unless they agree on the set of services they are offering to their customers, and each do the same thing. This is, in Spencer: well, if I've followed you to this point, there's no requirement for each provider to do the same thing, right? because there's no connection between what you say you're asking for and what you ask for in the signaling, soooo... If I tell you "I'm invoking the GROMMET service" and you say "cool, dude, I'll ignore that completely and read the signaling", everything works fine, right? You just have to agree you won't choke when I say "I'm invoking the GROMMET service". :-) a very real sense, anathema to the entire notion of SIP, which is built on the idea that heterogeneous domains can interconnect and still get interoperability: Metcalfe's law, when combined with explicit service identifiers, will stifle the ability of providers to develop new SIP services, since they have no hope of interconnecting them with anyone else. Spencer: slightly overstated, perhaps? "since they cannot interconnect new SIP services without the agreement of other providers", and maybe even "other, competing providers"? Normally, this service would be backwards compatible with a regular audio-video endpoint, which would just reject the third media stream. However, because a large network has been deployed that is expecting to see the token, "multimedia conversation" and its associated audio+ video service, it is nearly impossible for the new provider to roll out this new service. If they did, it would fail completely, or partially fail, when their users call users in other provider domains. Spencer: I'm thinking this document is missing something I've heard a lot of discussion about - the idea that there are various "levels" of service, and we don't have any clue how to mix-and-match service identifiers. If you have an audio+video+avatar service and a push-to-talk service, how do you offer an audio+video+avatar+push-to-talk service? We know how to do this based on signaling (conceptually, when PTT signaling actually reflects what happens in PTT), but how do we tell the other end what to do using service identifiers? Spencer: you may not want to go into too much more detail about a fundamentally bad idea, but there is also a problem with the service identifier namespace - flat? hierarchical? If 3GPP has a "push-to-talk" and Nintendo has a "push-to-talk" capability for gamers that doesn't work the same way, what happens then? If I have "3GPP/push-to-talk" and you have "Nintendo/push-to-talk" and they DO work the same way, can we communicate? 8. Security Considerations Oftentimes, the service associated with a request is utilized for purposes such as authorization, accounting, and billing. When service identification is not done properly, the possibility of Spencer: perhaps clearer if "possibility of unauthorized service use and"? network fraud is introduced. It is for this reason, discussed extensively in Section 6.2.1, that the usage of explicit service identifiers inserted by a UA is NOT RECOMMENDED. Spencer: NOT RECOMMENDED here does not look like 2119 text to me - it looks like caps-for-emphasis. 9. IANA Considerations There are no IANA considerations associated with this specification. Spencer: "IANA is requested to drop silently all requests for a service identifier registry" :-)