R. Mahy Internet Draft Cisco Systems Document: draft-mahy-sip-cc-models-00.txt Jul 2001 Expires: Jan, 2002 A Call Control Model for SIP Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [RFC2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 1. Abstract This document defines an abstract call model for describing the media relationships required for call control features in SIP, and discusses other issues related to SIP call control as part of the SIP Call Control Framework. 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC2119]. 3. Overview This document defines an abstract call model for describing the changes in media relationships (actions) which are needed to fulfill most call control features in SIP. The call model is an integral part of the SIP Call Control Framework defined in [cc-framework]. The model and actions described here are specifically chosen to be independent the SIP signaling and/or mixing approach chosen to actually setup the media relationships. Mahy Expires: Jan 2002 1 Call Control Model for SIP Implementations may setup the media relationships described in this model using the approach described in [3pcc]. The 3pcc approach relies on only the following 3 primitive operations: new INVITE reINVITE (modify the session, hold, resume from hold, etc.) BYE The main advantage of the 3pcc approach is that it only requires very basic SIP support from end systems to support call control features. It also has the advantage and disadvantage that new features can/must be implemented in one place only (the controller), and neither requires enhanced client functionality, nor takes advantage of it. In addition, a peer-to-peer approach is discussed at length in this draft. The primary drawback of the peer-to-peer model is additional end system complexity. The benefits of the peer-to-peer model include: - state remains at the edges - signaling is same whether mixing is performed by one of the participants, or by a central conference server - calls only go through participants involved (no additional points of failure) - does not reproduce the MGCP/Megaco command/control trust model - less complex to setup end-to-end QoS and security - shorter setup time (fewer messages, and round trips required) - many additional features applied at other UAs are transparent The peer-to-peer approach relies on additional "primitive" operations, some of which are identified here. INVITE with [Replaces] semantics INVITE with Join semantics INVITE or [REFER] with [Media Forking] semantics [REFER] to ask another UA to send a request on your behalf [PHONECTL] for desktop call control Many of the features, primitives, and actions described in this document require some type of mixing/combining/selection. 4. "Conversation Space" Model This document introduces the concept of an abstract "conversation space" (essentially as a set of participants who believe they are all communicating among one another). Each conversation space contains one or more participants. Participants are SIP User Agents which send original media to or terminate and receive media from other members of the conversation space. Logically, every participant in the conversation space has Mahy Expires: Jan 2002 2 Call Control Model for SIP access to all the media generated in that space (this is true if all participants share a common media type). A SIP User Agent which merely forwards, transcodes, mixes, or selects media originating elsewhere in the conversation space is NOT a participant. [Note that a conversation space consists of one or more SIP calls or SIP conferences. A conversation space is similar to the definition of a "call" in some other call models.] Participants may represent human users or non-human users (referred to as robots or automatons in this document). Some participants may be hidden within a conversation space. Some examples of hidden participants include: robots which generate tones, images, or announcements during a conference to announce users arriving and departing, a human call center supervisor monitoring a conversation between a trainee and a customer, and robots which record media for training or archival purposes. Participants may also be active or passive. Active participants are expected to be intelligent enough to leave a conversation space when they no longer desire to participate. (An attentive human participant is obviously active.) Some robotic participants (such as a voice messaging system, an instant messaging agent, or a voice dialog system) may be active participants if they can leave the conversation space when there is no human interaction. Other robots (for example our tone generating robot from the previous example) are passive participants. A human participant "on-hold" is passive. An example diagram of a conversation space shown as a "bubble" or ovals, and as a "set" in curly or square brace notation. Each set, oval, or "bubble" represents a conversation space. Hidden participants are shown in lowercase letters. { A , B } [ A , B ] .-. / \ / A \ ( ) \ B / \ / '-' Some examples of the relationship between conversation spaces and SIP calls, SIP call legs, and SIP sessions are listed below. In each example, a human user will perceive that there is a single call. A simple two-party call is a single conversation space, a single call, a single session, and a single call-leg. Mahy Expires: Jan 2002 3 Call Control Model for SIP A locally mixed three-way call is one or two calls (one if the mixer invited all the other participants, two otherwise), two sessions, and two call-legs. It is also a single conversation space. A simple dial-in audio conference is a single conversation space, but is represented by as many calls, call-legs, and sessions as there are human participants. A multicast conference is a single conversation space, a single session, one or more calls, and as many call-legs as participants. 5. Catalog of call control actions and sample features Below are listed several call control "actions" which modify the participants in a conversation space. The names of the actions listed are for descriptive purposes only (they are not normative). This list of actions is not meant to be exhaustive. In the examples, all actions are initiated by the user "Alice" represented by UA "A". 5.1 Transfer The conversation space changes as follows: before after { A , B } --> { C , B } A replaces itself with C. To make this happen using the peer-to-peer approach, "A" would send two SIP requests. A shorthand for those requests is shown below: REFER B Refer-To:C BYE B To make this happen instead using the 3pcc approach, the controller sends requests represented by the shorthand below: INVITE C (w/SDP of B) reINVITE B (w/SDP of C) BYE A Features enabled by this action: - blind transfer - transfer to a central mixer (some type of conference or forking) - transfer to park server (park) - transfer to music on hold or announcement server - transfer to a "queue" - transfer to a service (such as Voice Dialogs service) - transition from local mixer to central mixer Mahy Expires: Jan 2002 4 Call Control Model for SIP 5.2 Take The conversation space changes as follows: { B , C } --> { B , A } A forcibly replaces C with itself. In most uses of this primitive, A is just "un-replacing" itself. Using the peer-to-peer approach, "A" sends: INVITE B Replaces: Using the 3pcc approach (all requests sent from controller) INVITE A (w/SDP of B) reINVITE B (w/SDP of A) BYE C Features enabled by this action: - transferee completes an attended transfer - retrieve from central mixer (not recommended) - retrieve from music on hold or park - retrieve from queue - call center take - voice portal resuming ownership of a call it originated - answering-machine style screening (pickup) 5.3 Add The conversation space changes as follows: { A , B } --> { A, B, C } A adds C to the conversation. Using the peer-to-peer approach, adding a party using local mixing requires no signaling. To transition from a 2-party call or a locally mixed conference to centrally mixing A could send the following requests: REFER B Refer-To: mixer INVITE mixer BYE B To add a party to a central mixer: REFER C Refer-To: mixer or REFER mixer Refer-To: C Using the 3pcc approach to transition to centrally mixed, the controller would send: INVITE mixer leg 1 (w/SDP of A) INVITE mixer leg 2 (w/SDP of B) INVITE C (late SDP) Mahy Expires: Jan 2002 5 Call Control Model for SIP reINVITE A (w/SDP of mixer leg 1) reINVITE B (w/SDP of mixer leg 2) INVITE mixer leg3 (w/SDP of C) To add a party to a central mixer: INVITE C (late SDP) INVITE mixer (w/SDP of C) Features enabled: - standard conference feature - call recording - answering-machine style screening (screening) 5.4 Local Join The conversation space changes like this: { A, B} , {A, C} --> {A, B, C} or like this { A, B} , {C, D} --> {A, B, C, D} A takes two conversation spaces and joins them together into a single space. Using the peer-to-peer approach, A can mix locally, or REFER the participants of both conversation spaces to the same central mixer (as in 5.3) For the 3pcc approach, the call flows for inserting participants, and joining and splitting conversation spaces are tedious yet straightforward, so these are left as an exercise for the reader. Features enabled: - standard conference feature - leaving a sidebar to rejoin a larger conference 5.5 Insert The conversation space changes like this: { B , C } --> {A, B, C } A inserts itself into a conversation space. A proposed mechanism for signaling this using the peer-to-peer approach is to send a new header in an INVITE with "joining" semantics. For example: INVITE B Join: Mahy Expires: Jan 2002 6 Call Control Model for SIP If B accepted the INVITE, B would accept responsibility to setup the call legs and mixing necessary (for example: to mix locally or to transfer the participants to a central mixer) Features enabled: - barge-in - call center monitoring - call recording 5.6 Split { A, B, C, D } --> { A, B } , { C, D } If using a central mixer with peer-to-peer REFER C Refer-To: mixer (new URI) REFER D Refer-To: mixer (new URI) BYE C BYE D Features enabled: - sidebar conversations during a larger conference 5.7 Near-fork A participates in two conversation spaces simultaneously: { A, B } --> { B , [ A } , C ] A is a participant in two conversation spaces such that A sends the same media to both spaces, and renders media from both spaces, presumably by mixing or rendering the media from both. We can define that A is the "anchor" point for both forks, each of which is a separate conversation space. This action is purely local implementation (it requires no special signaling). Local features such as switching calls between the background and foreground are possible using this media relationship. 5.8 Far fork The conversation space diagram... { A, B } --> { A , [ B } , C ] A requests B to be the "anchor" of two conversation spaces. For an example of using 3pcc to setup media forking, see [Media forking]. The session descriptions for forking are quite complex. Controllers should verify that endpoints can handle forked-media, by using some type of Requires header token. Mahy Expires: Jan 2002 7 Call Control Model for SIP Two ways to setup this media relationship using peer-to-peer call control have been proposed: - the anchor receives a REFER with require: forked-media (implicit) - the anchor receives an INVITE with Fork-with header (explicit) Features enabled: - barge-in - voice portal services - whisper - hotword detection - sending DTMF somewhere else 6. Other Call Control Issues 6.1 Transparent feature interaction Combinations of features must work in SIP call control. For example, let us examine the combination of a transfer of a call which is conferenced. Alice calls Bob. Alice silently "conferences in" her robotic assistant Albert as a hidden party. Bob transfers Alice to Carol. If Bob asks Alice to Replace her leg with a new one to Carol then both Alice and Albert should be communicating with Carol (transparently). Using the peer-to-peer model, this combination of features works fine if A is doing local mixing (Alice replaces Bob's call-leg with Carol's), or if A is using a central mixer (the mixer replaces Bob's call leg with Carol's). A clever implementation using the 3pcc model can generate similar results. New extensions to the SIP Call Control Framework should attempt to preserve this property. 6.2 Presenting information to the user or application Participants should have access to the names of the other participants in a conversation space, so that this information can be rendered to a human user or processed by an automaton. Although some of this information may be available from To, From, Remote- Party-Id, or other SIP headers, another mechanism of reporting this information may be necessary. [The author believes that the data reported by RTCP is insufficient for these purposes.] For example, a mixer involved in a conversation space may wish to provide URLs for conference status, and/or conference/floor control. 6.3 Use of different mixing models Several conferencing models are discussed in [conf-models]. For brevity, only the two most popular conferencing models are Mahy Expires: Jan 2002 8 Call Control Model for SIP significantly discussed in this document (local and centralized mixing). Applications of the conversation spaces model to distributed full mesh and multicast conferences are left as an exercise for the reader. Note that a distributed full mesh conference can be used for basic conferences, but does not allow for more complex conferencing actions like splitting, joining, and forking. Call control features should be designed to allow a mixer (local or centralized) to decide when to reduce a conference back to a 2-party call, or drop all the participants (for example if only two automatons are communicating). The actual heuristics used to release calls are beyond the scope of this document, but may depend on properties in the conversation space, such as the number of active, passive, or hidden participants; and the send-only, receive-only, or send-and-receive orientation of various participants. 6.4 Effect when one user is represented by multiple UAs in same call Multiple participants in the same conversation space may represent the same human user. For example, the user may use one participant for video, chat, and whiteboard media on a PC and another for audio media on a SIP phone. In addition, human users may add robot participants which act on their behalf (for example a call recording service, or a calendar reminder). Call Control features in SIP should continue to function as expected in such an environment. 6.5 "Special" participants Call control implementation are encouraged to make intelligent decisions based on the type of participants (active/passive, hidden, human/robot) in a conversation space. Currently there is no standard way to convey this information about participants in a conversation space, but work in this area is encouraged. For example, a music on hold service may take the sensible approach that if there are two or more unhidden participants, it should not provide hold music; or that it will not send hold music to robots. 6.6 Billing issues Billing in PSTN is typically based on who initiated a call. At the moment billing in a SIP network is neither consistent with itself, nor with the PSTN. (A billing model for SIP should allow for both PSTN-style billing, and non-PSTN billing.) The example below demonstrates one such inconsistency. Alice places a call to Bob. Alice then blind transfers Bob to Carol through a PSTN gateway. In current usage of REFER and BYE/Also, Bob may be billed for a call he did not initiate (his UA originated the Mahy Expires: Jan 2002 9 Call Control Model for SIP outgoing call leg however). This is not necessarily a terrible thing, but it demonstrates a security concern (Bob must have appropriate local policy to prevent fraud). Also, Alice may wish to pay for Bob's session with Carol. There should be a way to signal this in SIP. Likewise a Replacement call may maintain the same billing relationship as a Replaced call, so if Alice first calls Carol, then asks Bob to Replace this call, Alice may continue to receive a bill. Further work in SIP billing should define a way to set or discover the direction of billing. 7. Security Considerations Let us first examine the security of the primitives used by the 3pcc approach (INVITE, reINVITE, and BYE). All signaling goes through the controller, which is a trusted entity. Initial INVITEs are frequently authenticated and may also be hop-by-hop (e.g. IPsec or TLS) or end-to-end (e.g. PGP or S/MIME) encrypted. Also, the human or robot user receiving the INVITE may accept or decline the INVITE based on any number of factors. An attacker can do many "rude" things to a SIP call-leg today (place calls on hold, send BYEs, reINVITE to a session of their choosing), if they have knowledge of the correct To, From, Call-ID, and CSeq headers. Encrypting or integrity protecting the signaling between User Agents and 3pcc controllers can prevent these attacks. When using the peer-to-peer approach, the call control actions and primitives are initiated by a) an existing participant in the conversation space, b) a former participant in the conversation space, or c) an entity trusted by one of the participants. For example, a participant always initiates a transfer; a retrieve from Park (a take) is initiated on behalf of a former participant; and a barge-in (insert or far-fork) is initiated by a trusted entity (an operator for example). Both REFER and PHONECTL primitives can be secured in the same manner as for an initial INVITE. To authorize call control primitives that trigger special behavior (such as an INVITE with Replace, Join, or Fork semantics), the receiving user agent needs some credentials with which to challenge or authorize the call, as the sender may be completely unknown to the receiver, except through the introduction of a third party. As future work, some form of generic authorization token is probably needed. 8. References [SIP] M. Handley, E. Schooler, and H. Schulzrinne, "SIP: Session Initiation Protocol", RFC2543, Internet Engineering Task Force, Mahy Expires: Jan 2002 10 Call Control Model for SIP Nov 1998. [3pcc] J. Rosenberg, J. Peterson, H. Schulzrinne, G. Camarillo, "Third Party Call Control in SIP", Internet Draft , IETF; March 2001. Work in progress [RFC2026] S Bradner, "The Internet Standards Process -- Revision 3", RFC2026 (BCP), IETF, October 1996. [RFC2119] S. Bradner, "Key words for use in RFCs to indicate requirement levels," Request for Comments (Best Current Practice) 2119, Internet Engineering Task Force, Mar. 1997. [cc-framework] B. Campbell, "SIP Call Control - Framework ", Internet Draft , IETF, Mar. 2001. Work in progress. [REFER] R. Sparks, "SIP Call Control - Transfer", Internet Draft , IETF; Feb. 2001. Work in progress. [Replaces] B. Biggs, R. Dean, "The SIP Replaces Header", Internet Draft , IETF, Nov. 2000. Work in progress. [Media forking] M. Shankar, "SIP Forked Media", Internet Draft , IETF, Feb. 2001. Work in progress. [PHONECTL] R. Dean, Belkind, B. Biggs, "PHONECTL: A Protocol for Remote Phone Control", Internet Draft , IETF, Jan. 2001. Work in progress. [conf-models] J. Rosenberg, H. Schulzrinne, "Models for Multi Party Conferencing in SIP", Internet Draft , IETF; Nov. 2000. Work in progress. 10. Acknowledgments Thanks to all who attended the SIP interim meeting in February 2001 for their support of the ideas behind this document. 11. Author's Addresses Rohan Mahy Cisco Systems 170 West Tasman Dr, MS: SJC-21/3/3 Phone: +1 408 526 8570 Email: rohan@cisco.com Mahy Expires: Jan 2002 11 Call Control Model for SIP Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Mahy Expires: Jan 2002 12