R. Mahy 
Internet Draft                                            Cisco Systems 
Document: draft-mahy-sip-cc-models-00.txt                      Jul 2001 
Expires: Jan, 2002                                                      
 
 
                      A Call Control Model for SIP 
 
 
Status of this Memo 
 
   This document is an Internet-Draft and is in full conformance with 
      all provisions of Section 10 of RFC2026 [RFC2026].  
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups. Note that 
   other groups may also distribute working documents as Internet-
   Drafts. Internet-Drafts are draft documents valid for a maximum of 
   six months and may be updated, replaced, or obsoleted by other 
   documents at any time. It is inappropriate to use Internet- Drafts 
   as reference material or to cite them other than as "work in 
   progress."  
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt  
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html. 
    
 
1. Abstract 
    
   This document defines an abstract call model for describing the 
   media relationships required for call control features in SIP, and 
   discusses other issues related to SIP call control as part of the 
   SIP Call Control Framework. 
    
    
2. Conventions used in this document 
    
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in 
   this document are to be interpreted as described in RFC-2119 
   [RFC2119]. 
    
    
    
3. Overview 
    
   This document defines an abstract call model for describing the 
   changes in media relationships (actions) which are needed to fulfill 
   most call control features in SIP.  The call model is an integral 
   part of the SIP Call Control Framework defined in [cc-framework].  
   The model and actions described here are specifically chosen to be 
   independent the SIP signaling and/or mixing approach chosen to 
   actually setup the media relationships.   
 Mahy                     Expires: Jan 2002                         1 
 
                      Call Control Model for SIP 
 
 
    
   Implementations may setup the media relationships described in this 
   model using the approach described in [3pcc]. The 3pcc approach 
   relies on only the following 3 primitive operations: 
    
        new INVITE 
        reINVITE (modify the session, hold, resume from hold, etc.) 
        BYE 
    
   The main advantage of the 3pcc approach is that it only requires 
   very basic SIP support from end systems to support call control 
   features.  It also has the advantage and disadvantage that new 
   features can/must be implemented in one place only (the controller), 
   and neither requires enhanced client functionality, nor takes 
   advantage of it.  
    
   In addition, a peer-to-peer approach is discussed at length in this 
   draft.  The primary drawback of the peer-to-peer model is additional 
   end system complexity.  The benefits of the peer-to-peer model 
   include: 
   - state remains at the edges 
   - signaling is same whether mixing is performed by one of the  
     participants, or by a central conference server 
   - calls only go through participants involved (no additional points 
     of failure) 
   - does not reproduce the MGCP/Megaco command/control trust model 
   - less complex to setup end-to-end QoS and security 
   - shorter setup time (fewer messages, and round trips required) 
   - many additional features applied at other UAs are transparent 
 
   The peer-to-peer approach relies on additional "primitive" 
   operations, some of which are identified here. 
    
        INVITE with [Replaces] semantics 
        INVITE with Join semantics 
        INVITE or [REFER] with [Media Forking] semantics 
        [REFER] to ask another UA to send a request on your behalf 
        [PHONECTL] for desktop call control  
    
   Many of the features, primitives, and actions described in this 
   document require some type of mixing/combining/selection.  
 
    
4. "Conversation Space" Model 
    
   This document introduces the concept of an abstract "conversation 
   space" (essentially as a set of participants who believe they are 
   all communicating among one another).  Each conversation space 
   contains one or more participants.  
    
   Participants are SIP User Agents which send original media to or 
   terminate and receive media from other members of the conversation 
   space.  Logically, every participant in the conversation space has 
 Mahy                     Expires: Jan 2002                         2 
 
                      Call Control Model for SIP 
 
 
   access to all the media generated in that space (this is true if all 
   participants share a common media type).  A SIP User Agent which 
   merely forwards, transcodes, mixes, or selects media originating 
   elsewhere in the conversation space is NOT a participant.  [Note 
   that a conversation space consists of one or more SIP calls or SIP 
   conferences.  A conversation space is similar to the definition of a 
   "call" in some other call models.] 
    
   Participants may represent human users or non-human users (referred 
   to as robots or automatons in this document).  Some participants may 
   be hidden within a conversation space. Some examples of hidden 
   participants include: robots which generate tones, images, or 
   announcements during a conference to announce users arriving and 
   departing, a human call center supervisor monitoring a conversation  
   between a trainee and a customer, and robots which record media for 
   training or archival purposes. 
    
   Participants may also be active or passive.  Active participants are 
   expected to be intelligent enough to leave a conversation space when 
   they no longer desire to participate.  (An attentive human 
   participant is obviously active.)  Some robotic participants (such 
   as a voice messaging system, an instant messaging agent, or a voice 
   dialog system) may be active participants if they can leave the 
   conversation space when there is no human interaction.  Other robots 
   (for example our tone generating robot from the previous example) 
   are passive participants.  A human participant "on-hold" is passive. 
    
   An example diagram of a conversation space shown as a "bubble" or 
   ovals, and as a "set" in curly or square brace notation. Each set, 
   oval, or "bubble" represents a conversation space. Hidden 
   participants are shown in lowercase letters. 
    
    
   { A , B }            [ A , B ] 
    
      .-. 
     /   \ 
    /  A  \ 
   (       ) 
    \  B  / 
     \   / 
      '-' 
    
    
   Some examples of the relationship between conversation spaces and 
   SIP calls, SIP call legs, and SIP sessions are listed below.  In 
   each example, a human user will perceive that there is a single 
   call. 
    
       A simple two-party call is a single conversation space, a single 
       call, a single session, and a single call-leg. 
        
 Mahy                     Expires: Jan 2002                         3 
 
                      Call Control Model for SIP 
 
 
       A locally mixed three-way call is one or two calls (one if the 
       mixer invited all the other participants, two otherwise), two 
       sessions, and two call-legs.  It is also a single conversation 
       space. 
        
       A simple dial-in audio conference is a single conversation 
       space, but is represented by as many calls, call-legs, and 
       sessions as there are human participants. 
        
       A multicast conference is a single conversation space, a single 
       session, one or more calls, and as many call-legs as 
       participants. 
    
    
5. Catalog of call control actions and sample features 
    
   Below are listed several call control "actions" which modify the 
   participants in a conversation space. The names of the actions 
   listed are for descriptive purposes only (they are not normative).  
   This list of actions is not meant to be exhaustive.   
    
   In the examples, all actions are initiated by the user "Alice" 
   represented by UA "A". 
    
    
5.1 Transfer 
    
   The conversation space changes as follows: 
    
         before            after 
        { A , B }  -->   { C , B }   
    
   A replaces itself with C. 
    
   To make this happen using the peer-to-peer approach, "A" would send 
   two SIP requests.  A shorthand for those requests is shown below: 
        REFER B  Refer-To:C 
        BYE B  
    
   To make this happen instead using the 3pcc approach, the controller 
   sends requests represented by the shorthand below: 
        INVITE C (w/SDP of B) 
        reINVITE B (w/SDP of C) 
        BYE A 
    
   Features enabled by this action: 
   - blind transfer 
   - transfer to a central mixer (some type of conference or forking) 
   - transfer to park server (park) 
   - transfer to music on hold or announcement server 
   - transfer to a "queue" 
   - transfer to a service (such as Voice Dialogs service) 
   - transition from local mixer to central mixer 
 Mahy                     Expires: Jan 2002                         4 
 
                      Call Control Model for SIP 
 
 
    
5.2 Take 
    
   The conversation space changes as follows: 
    
        { B , C }  -->   { B , A } 
    
   A forcibly replaces C with itself.  In most uses of this primitive, 
   A is just "un-replacing" itself. 
    
   Using the peer-to-peer approach, "A" sends: 
        INVITE B  Replaces: <call leg between B and C> 
    
   Using the 3pcc approach (all requests sent from controller) 
        INVITE A (w/SDP of B) 
        reINVITE B (w/SDP of A) 
        BYE C 
    
   Features enabled by this action: 
   - transferee completes an attended transfer 
   - retrieve from central mixer (not recommended) 
   - retrieve from music on hold or park 
   - retrieve from queue 
   - call center take 
   - voice portal resuming ownership of a call it originated 
   - answering-machine style screening (pickup) 
    
5.3 Add 
    
   The conversation space changes as follows: 
    
        { A , B } -->    { A, B, C } 
    
   A adds C to the conversation.   
    
   Using the peer-to-peer approach, adding a party using local mixing  
   requires no signaling.  To transition from a 2-party call or a 
   locally mixed conference to centrally mixing A could send the 
   following requests: 
        REFER B  Refer-To: mixer 
        INVITE mixer 
        BYE B 
    
   To add a party to a central mixer: 
        REFER C  Refer-To: mixer 
                or 
        REFER mixer  Refer-To: C 
    
   Using the 3pcc approach to transition to centrally mixed, the 
   controller would send: 
        INVITE mixer leg 1 (w/SDP of A) 
        INVITE mixer leg 2 (w/SDP of B) 
        INVITE C (late SDP) 
 Mahy                     Expires: Jan 2002                         5 
 
                      Call Control Model for SIP 
 
 
        reINVITE A (w/SDP of mixer leg 1) 
        reINVITE B (w/SDP of mixer leg 2) 
        INVITE mixer leg3 (w/SDP of C) 
    
   To add a party to a central mixer: 
        INVITE C (late SDP) 
        INVITE mixer (w/SDP of C) 
    
   Features enabled: 
   - standard conference feature 
   - call recording 
   - answering-machine style screening (screening) 
    
5.4 Local Join 
 
   The conversation space changes like this: 
    
        { A, B}  , {A, C}  -->  {A, B, C} 
         
                or like this 
         
        { A, B}  , {C, D}  -->  {A, B, C, D}   
         
   A takes two conversation spaces and joins them together into a 
   single space. 
    
   Using the peer-to-peer approach, A can mix locally, or REFER the 
   participants of both conversation spaces to the same central mixer 
   (as in 5.3) 
 
   For the 3pcc approach, the call flows for inserting participants, 
   and joining and splitting conversation spaces are tedious yet 
   straightforward, so these are left as an exercise for the reader. 
    
   Features enabled: 
   - standard conference feature 
   - leaving a sidebar to rejoin a larger conference 
    
5.5 Insert 
    
   The conversation space changes like this: 
    
        { B , C }  -->  {A, B, C } 
    
   A inserts itself into a conversation space. 
    
   A proposed mechanism for signaling this using the peer-to-peer 
   approach is to send a new header in an INVITE with "joining" 
   semantics.  For example:   
        INVITE B  Join: <call id of B and C> 
    
 Mahy                     Expires: Jan 2002                         6 
 
                      Call Control Model for SIP 
 
 
   If B accepted the INVITE, B would accept responsibility to setup the 
   call legs and mixing necessary (for example: to mix locally or to 
   transfer the participants to a central mixer) 
    
   Features enabled: 
   - barge-in 
   - call center monitoring 
   - call recording 
    
5.6 Split 
   { A, B, C, D } --> { A, B } , { C, D } 
    
   If using a central mixer with peer-to-peer 
   REFER C  Refer-To: mixer (new URI) 
   REFER D  Refer-To: mixer (new URI) 
   BYE C 
   BYE D 
    
   Features enabled: 
   - sidebar conversations during a larger conference 
    
    
5.7 Near-fork 
 
   A participates in two conversation spaces simultaneously:  
    
        { A, B } --> { B , [ A } , C ] 
    
    
   A is a participant in two conversation spaces such that A sends the 
   same media to both spaces, and renders media from both spaces, 
   presumably by mixing or rendering the media from both.  We can 
   define that A is the "anchor" point for both forks, each of which is 
   a separate conversation space. 
    
   This action is purely local implementation (it requires no special 
   signaling).  Local features such as switching calls between the 
   background and foreground are possible using this media 
   relationship.  
    
5.8 Far fork 
    
   The conversation space diagram... 
    
        { A, B } --> { A , [ B } , C ] 
    
   A requests B to be the "anchor" of two conversation spaces. 
    
   For an example of using 3pcc to setup media forking, see [Media 
   forking].  The session descriptions for forking are quite complex.  
   Controllers should verify that endpoints can handle forked-media, by 
   using some type of Requires header token. 
    
 Mahy                     Expires: Jan 2002                         7 
 
                      Call Control Model for SIP 
 
 
   Two ways to setup this media relationship using peer-to-peer call 
   control have been proposed: 
   - the anchor receives a REFER with require: forked-media (implicit) 
   - the anchor receives an INVITE with Fork-with header (explicit) 
 
   Features enabled: 
   - barge-in 
   - voice portal services 
   - whisper 
   - hotword detection 
   - sending DTMF somewhere else 
    
    
6. Other Call Control Issues  
    
6.1 Transparent feature interaction 
    
   Combinations of features must work in SIP call control.  For 
   example, let us examine the combination of a transfer of a call 
   which is conferenced. 
    
   Alice calls Bob.  Alice silently "conferences in" her robotic 
   assistant Albert as a hidden party.  Bob transfers Alice to Carol.  
   If Bob asks Alice to Replace her leg with a new one to Carol then 
   both Alice and Albert should be communicating with Carol 
   (transparently). 
    
   Using the peer-to-peer model, this combination of features works 
   fine if A is doing local mixing (Alice replaces Bob's call-leg with 
   Carol's), or if A is using a central mixer (the mixer replaces Bob's 
   call leg with Carol's).  A clever implementation using the 3pcc 
   model can generate similar results. 
    
   New extensions to the SIP Call Control Framework should attempt to 
   preserve this property. 
    
6.2 Presenting information to the user or application 
    
   Participants should have access to the names of the other 
   participants in a conversation space, so that this information can 
   be rendered to a human user or processed by an automaton.  Although 
   some of this information may be available from To, From, Remote-
   Party-Id, or other SIP headers, another mechanism of reporting this 
   information may be necessary.  [The author believes that the data 
   reported by RTCP is insufficient for these purposes.]   
    
   For example, a mixer involved in a conversation space may wish to 
   provide URLs for conference status, and/or conference/floor control. 
    
6.3 Use of different mixing models 
    
   Several conferencing models are discussed in [conf-models].  For 
   brevity, only the two most popular conferencing models are 
 Mahy                     Expires: Jan 2002                         8 
 
                      Call Control Model for SIP 
 
 
   significantly discussed in this document (local and centralized 
   mixing).  Applications of the conversation spaces model to 
   distributed full mesh and multicast conferences are left as an 
   exercise for the reader. 
   Note that a distributed full mesh conference can be used for basic 
   conferences, but does not allow for more complex conferencing 
   actions like splitting, joining, and forking.   
    
   Call control features should be designed to allow a mixer (local or 
   centralized) to decide when to reduce a conference back to a 2-party 
   call, or drop all the participants (for example if only two 
   automatons are communicating). 
    
   The actual heuristics used to release calls are beyond the scope of 
   this document, but may depend on properties in the conversation 
   space, such as the number of active, passive, or hidden 
   participants; and the send-only, receive-only, or send-and-receive 
   orientation of various participants. 
 
6.4 Effect when one user is represented by multiple UAs in same call 
    
   Multiple participants in the same conversation space may represent 
   the same human user.  For example, the user may use one participant 
   for video, chat, and whiteboard media on a PC and another for audio 
   media on a SIP phone.  In addition, human users may add robot 
   participants which act on their behalf (for example a 
   call recording service, or a calendar reminder).  Call Control 
   features in SIP should continue to function as expected in such an 
   environment. 
    
6.5 "Special" participants 
    
   Call control implementation are encouraged to make intelligent 
   decisions based on the type of participants (active/passive, hidden, 
   human/robot) in a conversation space.  Currently there is no 
   standard way to convey this information about participants in a 
   conversation space, but work in this area is encouraged. 
    
   For example, a music on hold service may take the sensible approach 
   that if there are two or more unhidden participants, it should not 
   provide hold music; or that it will not send hold music to robots. 
    
6.6 Billing issues 
    
   Billing in PSTN is typically based on who initiated a call.  At the 
   moment billing in a SIP network is neither consistent with itself, 
   nor with the PSTN.  (A billing model for SIP should allow for both 
   PSTN-style billing, and non-PSTN billing.)  The example below 
   demonstrates one such inconsistency.    
    
   Alice places a call to Bob.  Alice then blind transfers Bob to Carol 
   through a PSTN gateway.  In current usage of REFER and BYE/Also, Bob 
   may be billed for a call he did not initiate (his UA originated the 
 Mahy                     Expires: Jan 2002                         9 
 
                      Call Control Model for SIP 
 
 
   outgoing call leg however).  This is not necessarily a terrible 
   thing, but it demonstrates a security concern (Bob must have 
   appropriate local policy to prevent fraud).  Also, Alice may wish to 
   pay for Bob's session with Carol.  There should be a way to signal 
   this in SIP. 
    
   Likewise a Replacement call may maintain the same billing 
   relationship as a Replaced call, so if Alice first calls Carol, then 
   asks Bob to Replace this call, Alice may continue to receive a bill. 
    
   Further work in SIP billing should define a way to set or discover 
   the direction of billing. 
    
7. Security Considerations 
    
   Let us first examine the security of the primitives used by the 3pcc 
   approach (INVITE, reINVITE, and BYE).  All signaling goes through 
   the controller, which is a trusted entity.  Initial INVITEs are 
   frequently authenticated and may also be hop-by-hop (e.g. IPsec or 
   TLS) or end-to-end (e.g. PGP or S/MIME) encrypted.  Also, the human 
   or robot user receiving the INVITE may accept or decline the INVITE 
   based on any number of factors. 
    
   An attacker can do many "rude" things to a SIP call-leg today (place 
   calls on hold, send BYEs, reINVITE to a session of their choosing), 
   if they have knowledge of the correct To, From, Call-ID, and CSeq 
   headers.  Encrypting or integrity protecting the signaling between 
   User Agents and 3pcc controllers can prevent these attacks.   
    
   When using the peer-to-peer approach, the call control actions and 
   primitives are initiated by a) an existing participant in the 
   conversation space, b) a former participant in the conversation 
   space, or c) an entity trusted by one of the participants.  For 
   example, a participant always initiates a transfer; a retrieve from 
   Park (a take) is initiated on behalf of a former participant; and a 
   barge-in (insert or far-fork) is initiated by a trusted entity (an 
   operator for example).  
    
   Both REFER and PHONECTL primitives can be secured in the same manner 
   as for an initial INVITE. To authorize call control primitives that 
   trigger special behavior (such as an INVITE with Replace, Join, or 
   Fork semantics), the receiving user agent needs some credentials 
   with which to challenge or authorize the call, as the sender may be 
   completely unknown to the receiver, except through the introduction 
   of a third party.  As future work, some form of generic 
   authorization token is probably needed. 
 
    
8. References 
    
    
   [SIP] M. Handley, E. Schooler, and H. Schulzrinne, "SIP: Session  
   Initiation Protocol", RFC2543, Internet Engineering Task Force, 
 Mahy                     Expires: Jan 2002                        10 
 
                      Call Control Model for SIP 
 
 
   Nov 1998. 
    
   [3pcc] J. Rosenberg, J. Peterson, H. Schulzrinne, G. Camarillo, 
   "Third Party Call Control in SIP", Internet Draft <draft-rosenberg-
   sip-3pcc-02.txt>, IETF;  March 2001.  Work in progress 
    
   [RFC2026] S Bradner, "The Internet Standards Process -- Revision 3", 
   RFC2026 (BCP), IETF, October 1996. 
 
   [RFC2119] S. Bradner, "Key words for use in RFCs to indicate 
   requirement     levels," Request for Comments (Best Current 
   Practice) 2119, Internet     Engineering Task Force, Mar. 1997. 
    
   [cc-framework] B. Campbell, "SIP Call Control - Framework ", 
   Internet Draft <draft-campbell-sip-cc-framework-02.txt>, IETF, Mar. 
   2001.  Work in progress. 
    
   [REFER] R. Sparks, "SIP Call Control - Transfer", Internet Draft 
   <draft-ietf-sip-cc-transfer-04.txt>, IETF; Feb. 2001. Work in 
   progress. 
    
   [Replaces] B. Biggs, R. Dean, "The SIP Replaces Header", Internet 
   Draft <draft-biggs-sip-replaces-00.txt>, IETF, Nov. 2000.  Work in 
   progress. 
    
   [Media forking] M. Shankar, "SIP Forked Media", Internet Draft 
   <draft-shankar-sip-forked-media-00.txt>, IETF, Feb. 2001.  Work in 
   progress. 
    
   [PHONECTL] R. Dean, Belkind, B. Biggs, "PHONECTL: A Protocol for 
   Remote Phone Control", Internet Draft <draft-dean-phonectl-03.txt>, 
   IETF, Jan. 2001.  Work in progress. 
    
   [conf-models]  J. Rosenberg, H. Schulzrinne, "Models for Multi Party 
   Conferencing in SIP", Internet Draft <draft-rosenberg-sip-
   conferencing-models-00.txt>, IETF; Nov. 2000. Work in progress. 
    
10.  Acknowledgments 
    
   Thanks to all who attended the SIP interim meeting in February 2001 
   for their support of the ideas behind this document. 
    
    
 
11. Author's Addresses 
    
   Rohan Mahy 
   Cisco Systems 
   170 West Tasman Dr, MS: SJC-21/3/3 
   Phone: +1 408 526 8570 
   Email: rohan@cisco.com 
    
    
 Mahy                     Expires: Jan 2002                        11 
 
                      Call Control Model for SIP 
 
 
Full Copyright Statement 
   "Copyright (C) The Internet Society (date). All Rights Reserved. 
   This document and translations of it may be copied and furnished to 
   others, and derivative works that comment on or otherwise explain it 
   or assist in its implementation may be prepared, copied, published 
   and distributed, in whole or in part, without restriction of any 
   kind, provided that the above copyright notice and this paragraph 
   are included on all such copies and derivative works. However, this 
   document itself may not be modified in any way, such as by removing 
   the copyright notice or references to the Internet Society or other 
   Internet organizations, except as needed for the purpose of 
   developing Internet standards in which case the procedures for 
   copyrights defined in the Internet Standards process must be 
   followed, or as required to translate it into languages other than   
   English. 
       
   The limited permissions granted above are perpetual and will not be 
   revoked by the Internet Society or its successors or assigns. 
   This document and the information contained herein is provided on an 
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 
    
 
 Mahy                     Expires: Jan 2002                        12