INTERNET-DRAFT M. Barnes Document: draft-barnes-xcon-framework-00.txt Nortel Networks Category: Informational C.Boulton Ubiquity Expires: April 14, 2005 Oct 14, 2004 A Framework for Centralized Conferencing Status of this Memo By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 14th, 2005. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Abstract This document describes a framework for Centralized Conferencing (XCON). This XCON framework document provides an enhanced framework for conferencing that is protocol agnostic. This document expands upon the interfaces between the functional elements introduced in the SIP conferencing framework by describing the characteristics of connecting protocols and providing a related data model. However, this framework is applicable for a variety of signaling protocols besides SIP including H.323, XMPP, and even PSTN signaling protocols. Barnes & Boulton Expires April 14th, 2005 [Page 1] XCON Framework October 14th, 2004 Table of Contents 1. Introduction...................................................2 2. Conventions and Terminology....................................3 3. Overview of Conferencing Architecture.........................4 3.1 Usage of URIs.............................................7 4. Component Functionality.......................................8 4.2 Conference Policy Server.................................10 4.3 Mixers...................................................11 4.4 Conference Notification Service.........................11 4.5 Participants.............................................12 4.6 Conference Policy........................................12 5. Common Operations............................................13 5.1 Creating Conferences.....................................13 5.2 Adding Participants......................................15 5.3 Conditional Joins........................................15 5.4 Removing Participants....................................16 5.5 Creating Sidebars........................................16 5.6 Destroying Conferences...................................17 5.7 Obtaining Membership Information.........................18 5.8 Adding and Removing Media................................18 5.9 Conference Announcements and Recordings..................19 5.10 Floor Control...........................................21 5.11 Whispering or Private Messages...........................22 6. XCON Data Model...............................................23 7. Security Considerations.......................................23 8. IANA Considerations...........................................23 Informational References.........................................23 1. Introduction The SIP conferencing framework [SIPCONFW] presents a general architectural model for tightly coupled conferences. While the primary focus of that document is to provide a model for SIP based conferencing, the model itself was intended to be general purpose and applicable to non-SIP protocols. This document outlines a generic XCON architecture for tightly coupled conferences. It also provides details of connecting protocols and a data model used to expose interfaces to the primary XCON entities (e.g. Conference Policy Server, Floor Control Server) and provide a clear depiction of the primary data relationships between entities. An objective of this XCON framework is to not impact the support of fundamental SIP conferencing, but rather this XCON framework document is intended to extend and enhance the architectural model as necessary to provide a more general conference architecture that is protocol agnostic. For example, this framework applies equally well to an H.323, Jabber, or even PSTN conferencing system. Barnes & Boulton Expires April 14th, 2005 [Page 2] XCON Framework October 14th, 2004 2. Conventions and Terminology This framework uses many of the terms introduced in the SIP conferencing framework. In addition, it introduces new terms associated with the new protocols and functionality, and to describe the signaling interface between the conference participants and the conference focus (Signaling I/F, Establish, Modify and Tear down) in a protocol agnostic manner. The convention in this document is to describe the signaling processing using the new terms, while using SIP [RFC3261] to provide concrete examples of the operations, when applicable. o Conference Policy Control Protocol (CPCP): A protocol used by clients to manipulate the membership policy. o Establish: protocol operation applied to the signaling interface between the focus and a participant to setup a multimedia stream. (e.g. SIP INVITE) o Floor: a term used to apply to a set of data or resources associated with a conference instance, for which a conference participant is granted temporary input access. o Floor chair: A user (or an entity) who is authorized to manage one floor (grants, denies, or revokes a floor). The floor chair does not have to be a participant in the conference. o Floor Control: mechanism enabling applications or users to gain mutually exclusive or non-exclusive input access to the shared object or resource associated with a specific conference instance. Control of the "floor" is viewed as a temporary permission. o Floor Control Policy: A set of rules used as an alternative/in conjunction with a chair controlled floor to define policy for automatic generation of floor request decisions (grant, reject, revoke a floor). o Floor Control Protocol: a protocol used by XCON enabled clients to manipulate the floor control policy to effect changes on the conference policy to gain, modify or release control of the floor. o Floor control Server: A logical entity that maintains the state of the floor(s) including which floors exists, who the floor chairs are, who holds a floor, etc. Requests to manipulate a floor are directed at the floor control server using the Floor Control Protocol. Barnes & Boulton Expires April 14th, 2005 [Page 3] XCON Framework October 14th, 2004 o Modify: protocol operation applied to the signaling interface between the focus and a participant to change the characteristics of the media stream (e.g. SDP manipulation within a SIP re-INVITE). o Multimedia stream: in the context of this framework document, this term is used to refer to the media composition of the conference, which is established via the signaling protocol interface between the focus and a participant. The stream includes voice, video, session-mode instant messaging and interactive text. o Signaling Interface (I/F): the interface between a participant and the focus. o Tear down: protocol operation applied to the interface between the focus and a participant to remove a participant from a conference (e.g. SIP BYE). 3. Overview of Conferencing Architecture +-----------+ | | | | |Participant| | 4 | | | +-----------+ | |Signaling |I/F |4 | +-----------+ +----------+ +-----------+ | | | | | | | | | | | | |Participant|-------------| Focus |------------|Participant| | 1 |Signaling | |Signaling | 3 | | |I/F 1 | |I/F 3 | | +-----------+ +----------+ +-----------+ | | |Signaling |I/F |2 | Barnes & Boulton Expires April 14th, 2005 [Page 4] XCON Framework October 14th, 2004 +-----------+ | | | | |Participant| | 2 | | | +-----------+ Figure 1 The central component in a conference is the focus. The only difference between the model put forth in [SIPCONFW] and the model for the XCON framework is that the signaling relationship maintained by the focus with each participant in the conference is not restricted to the SIP protocol. Any multimedia signaling protocol that defines procedures for the establishment of a relationship between the focus and a participant could utilize that interface. As a result, the logical result of the signaling communications associated with a centralized conference remains the star topology, as shown in Figure 1. The XCON framework does not at all impact the role or logical functionality of the focus as put forth in the [SIPCONFW]. While the interface between the focus and the conference policy remains implementation specific, the data associated with the conference policy, which relates to specific functionality provided by the focus, is discussed in greater detail in this and other XCON WG documents. The primary difference between the architectural model proposed in this document and the one in [SIPCONFW], is that the interface between the participant and the focus is protocol agnostic (i.e. not SIP specific). For example, the ejection of a user from the conference would consist of the invocation of the tear down operation specific to the protocol supported by that user (e.g. SIP BYE). As discussed in [SIPCONFW], a conference instance is represented by a URI, which identifies the appropriate focus (responsible for conference state associated with the URI). Each conference has a unique focus and a unique URI identifying that focus. Requests to the conference URI are routed to the focus for that specific conference. Further detail on the usage of URIs is provided in section 3.1. Users usually join the conference by invoking the establish operation specific to the protocol supported by that user (e.g. SIP INVITE), using the conference URI as a target. As long as the conference policy allows (and the establish request is appropriately authenticated), the establish operation is accepted by the focus and Barnes & Boulton Expires April 14th, 2005 [Page 5] XCON Framework October 14th, 2004 the user is added to the conference. Users can leave the conference by invoking the tear down operation, specific to the protocol supported by that user (e.g. SIP BYE), as they would in a normal multimedia session for that protocol. Similarly, the focus can terminate a multimedia session with a participant by invoking the tear down operation, should the conference policy change to indicate that the participant is no longer allowed in the conference. A focus can also invoke the establish operation to add a participant, should the conference policy indicate (manipulated by an authorized user) that the focus needs to bring a participant into the conference. ..................................... . . . . . . . . . Conference . . Policy . Conference . . Policy . +-----------+ //-----\\ . Control . | | || || . Protocol . | Conference| \\-----// . +---------------->| Policy | | | . | . | Server |----> |Membership . | . | | | | . | . +-----------+ | & | . | . | | . | . | Media | . +-----------+ . +-----------+ | Policy| . | | . | | \ // . | | . | | \-----/ . |Participant|<--------->| Focus | | . | |Signaling. | | | . | | I/F . | |<-----------+ . +-----------+ . |...........| . ^ . | Conference| . | . |Notification . +------------>| Service | . Conference . +-----------+ . State . . Notifications. . . . . . ..................................... Barnes & Boulton Expires April 14th, 2005 [Page 6] XCON Framework October 14th, 2004 Conference Functions Figure 2 As outlined in [SIPCONFW], a conference-aware participant is one that has access to advanced conference functionality through additional protocol interfaces. The client uses these protocols to interact with the conference policy server and the focus. A model for this interaction is shown in Figure 2. A conference-unaware participant would not implement the XCON protocols; as such, it is not discussed in this document. A conference-aware participant can use the unique conference URI to request conference state updates. This involves connecting to the conference notification service provided by the focus using the appropriate signaling mechanism. Through this mechanism, the participant can be notified of changes in participants (effectively, the state of the signaling interfaces between the participants and the focus), the media policy, and the membership policy. The participant can communicate with the conference policy server using a conference policy control protocol (CPCP). Through this protocol, it can manipulate the conference policy. The requirements for a CPCP are specified in a separate document [XCONCPRQ]. An Extensible Markup Language (XML) [XML] schema enabling a user to define a conference policy is defined in [XCONCPCP]. The assignment of privileges which would allow a user to manipulate the conference policy is defined in [XCONCPRV]. XML Configuration Access Protocol (XCAP) is one proposed protocol mechanism [XCONCPXC] for manipulating the conference policy data. Although [XCONCPXC] defines a specific protocol mechanism, other interfaces (e.g. Web based) can be used to manipulate the conference policy data adhering to the constraints defined in [XCONCPCP]. The interfaces between the focus and the conference policy, and the conference policy server and the conference policy, are not standardized within this framework per se, but rather the data related to those interfaces is discussed in the context of the logical roles, with an associated data model provided in Section 6. As such, these interfaces show the logical roles involved in a conference, as opposed to suggesting a physical decomposition. 3.1 Usage of URIs As discussed in [SIPCONFW], it is fundamental to this framework that a conference is uniquely identified by a URI, and that this URI identifies the focus responsible for the conference. The conference URI is unique, such that no two conferences have the same conference Barnes & Boulton Expires April 14th, 2005 [Page 7] XCON Framework October 14th, 2004 URI at any one point in time. Some examples of conference URIs include: h323:conf312334@example.net xmpp:conf.example.com tel:+12025551212 sip:9023453@sip.example.net The conference URI is opaque to any participants which might use it. There is no way to look at the URI, and know for certain whether it identifies a focus, as opposed to a user or an interface on a PSTN gateway. This is in line with the general philosophy of URI usage [RFC2396]. However, contextual information surrounding the URI (for example, SIP header parameters) may indicate that the URI represents a conference. When a request to establish a conference (e.g. SIP INVITE) is sent using the conference URI, that request is routed to the associated focus instance. The element or system that creates the conference URI is responsible for guaranteeing this property. Ideally, a conference URI is never constructed or guessed by a user. Rather, conference URIs are learned through many mechanisms. A conference URI can be emailed or sent in an instant message. A conference URI can be linked on a web page. A conference URI can be obtained from a conference policy control protocol, which can be used to create conferences and the policies associated with them. The other functions in a conference are also represented by URIs. If the conference policy server is implemented through web pages, this server is identified by HTTP URIs. If it is accessed using an explicit protocol, it is a URI defined for that protocol. Starting with the conference URI, the URIs for the other logical entities in the conference can be learned using the conference notification service. The exact method is protocol specific and outside the scope of this document. 4. Component Functionality This section provides a more detailed description of the functions typically implemented in each of the elements that comprise an XCON conference server. The primary difference between the functionality in this framework and that described in [SIPCONFW] is that the functionality is described in general terms, rather than SIP specific. Thus, some information in this section is duplicated from [SIPCONFW] to set the context for, and to provide the reader familiarity with, the use of the general terminology introduced in this framework. Barnes & Boulton Expires April 14th, 2005 [Page 8] XCON Framework October 14th, 2004 4.1 Focus As its name implies, the focus instance is the central component of the conference. All participants in a conference are connected to a focus instance through the signaling interface established with a conference participant. The focus is responsible for maintaining the signaling interfaces connected to it. It ensures that the signaling interfaces are connected to a set of participants who are authorized to participate in the conference, as defined by the membership policy. The focus also uses the signaling interface to manipulate the media sessions, in order to make sure each participant obtains all the appropriate media for the conference. To do that, the focus makes use of mixers in conjunction with the media policy. When a focus receives an establish request for the signaling interface, it checks the membership policy. The membership policy might indicate that this participant is not allowed to join, in which case the request can be rejected. It might indicate that another participant, acting as a moderator, needs to approve this new participant. In that case, the establishment operation might be deferred (e.g. parked on a music-on-hold server) or an in progress operation might be invoked to indicate such to the participant. A notification, using the conference notification service, would be sent to the moderator. The moderator then has the ability to manipulate the policies using a conference policy control protocol (e.g. CPCP). If the policies are changed to allow this new participant, the focus can accept the establishment request (e.g. unpark it from the music-on-hold server). The interpretation of the membership policy by the focus is, itself, a matter of local policy, and not subject to standardization. If a participant manipulated the membership policy to indicate that a certain other participant was no longer allowed in the conference, the focus would invoke a tear down operation (e.g. SIP BYE) towards that required participant to remove them. This is often referred to as "ejecting" a user from the conference. Similarly, if a user manipulated the membership policy to indicate that a number of users need to be added to the conference, the focus would send establishment requests to those participants. This is often referred to as the "mass invitation" function. A policy request to add a set of users might not require any establishment operations to execute it; those users might already be participants in the conference. The media policy model is extremely similar to that previously described for membership policy. If media policy instructs a modification, the focus instance will implement appropriately by Barnes & Boulton Expires April 14th, 2005 [Page 9] XCON Framework October 14th, 2004 either manipulating signaling via the signaling interface or interacting directly with the media mixer. The explicit operations required for enforcing media policy are considered out of scope for this document. 4.2 Conference Policy Server The conference policy server allows clients to manipulate and interact with the conference policy. The conference policy is used by the focus to make authorization decisions and guide its overall behavior. Logically speaking, there is a one-to-one mapping between a conference policy and a focus instance. The conference policy is represented by a URI. There is a unique conference policy for each conference instance. The conference policy URI points to a conference policy server which can manipulate that particular conference policy. A conference policy server also has a "top level" URI which can be used to access functions that are independent of any conference. Perhaps the most important of these functions is the creation of a new conference. Creation of a new conference will result in the construction of a new focus and a corresponding conference URI, which can then be used to join the conference itself, along with a media policy and conference policy. The conference policy server is accessed using a client-server transactional protocol. The client can be a participant in the conference, or it can be a third party. Access control lists for who can modify a conference policy are themselves part of the conference policy. The conference policy server is responsible for reconciliation of potentially conflicting requests regarding the policy for the conference instance. The client of the conference policy server can be any entity interested in manipulating the conference policy. Clearly, participants might be interested in manipulating conference policy. A participant might want to raise or lower the volume for one of the other participants it is hearing. Or, a participant might want to add a user to the conference. A client of the conference policy server could also be another server whose job is to determine the conference policy. As an example, a floor control server is responsible for determining which participant(s) in a conference is/are allowed to speak at any given time, based on participant requests and access rules. The floor control server would act as a client of the conference policy server, and change the media policy based on who is allowed to speak. Barnes & Boulton Expires April 14th, 2005 [Page 10] XCON Framework October 14th, 2004 The client of the conference policy server could also be another conference policy server. 4.3 Mixers A mixer is responsible for combining the media streams that make up the conference, and generating one or more output streams that are distributed to recipients (which could be participants or other mixers). The process of combining media is specific to the media type, and is directed by the focus, under the guidance of the rules described in the media policy. A mixer is not aware of a "conference" as an entity, per se. A mixer receives media streams as inputs, and based on directions provided by the focus, generates media streams as outputs. Media streams can be grouped and labeled by the focus. For example, this could be done in SDP [SDPMLABL]. This allows policies and operations to be directed against a particular stream. A mixer is always under the control of a focus, either directly or indirectly. The focus is responsible for interpreting the media policy, and then installing the appropriate rules in the mixer. If the focus is directly controlling a mixer, the mixer can either be co-resident with the focus, or can be controlled through some kind of protocol. If the focus is indirectly controlling a mixer, it delegates the mixing to the participants, each of which has their own mixer. This is described in the context of SIP in Section 6.4 of [SIPCONFW]. A mechanism to manipulate and describe the media mixing for the various media types is described in the Media Policy Control document [XCONMPCP], with scenarios defined in [XCONSCEN]. 4.4 Conference Notification Service The focus can provide a conference notification service. When assuming this role, the conference focus will allow authenticated clients to request being notified of conference state updates (e.g. in SIP using [RFC3265]). Once an XCON conference aware entity has requested such notifications, it will receive conference state update information at appropriate times. The conference state is composed of both focus and conference policy state. The endpoint will be informed of changes in either state. The notification protocol selected might provide a mechanism for limiting the information provided by the conference notification service (e.g. Capabilities defined in the SIP Barnes & Boulton Expires April 14th, 2005 [Page 11] XCON Framework October 14th, 2004 events framework [4] allow requests to receive focus state changes only, conference policy state changes, or both). The state of the focus includes the participants connected to the focus, and detailed information regarding the connection. As new participants join, this state changes, and is reported through the notification service. Similarly, when a participant leaves, this state also changes, allowing entities who have registered an interest the ability to learn of the event. As described previously, the conference policy includes the membership policy and the media policy. As those policies change, due to usage of the CPCP, direct change by the focus, or through an application, the conference notification service informs entities who have registered an interest of these changes. 4.5 Participants This framework defines a participant as an endpoint which has a signaling relationship with the focus. Note that a participant can also be another focus. A conference which has a participant that is the focus of another conference is called a cascaded conference. They can also be used to provide scalable conferences where there are regional sub-conferences, each of which is connected to the main conference. A participant may support a CPCP protocol, the Conference Notification interface and/or a floor control protocol to make full use of the XCONFW functionality described in this framework. 4.6 Conference Policy The conference policy contains the rules that guide the operation of the focus. The rules can be simple, such as an access list that defines the set of allowed participants in a conference. The rules can also be complex, specifying time-of-day based rules on participation conditional on the presence of other participants. There is no restriction on the type of rules that can be encapsulated in a conference policy. The conference policy can be manipulated using web applications or voice applications. It can also be manipulated with proprietary protocols. A conference policy control protocol (CPCP) is proposed as a standardized means of manipulating the conference policy as described in the CPCP requirements [XCONCPRQ]. An [XML] data schema enabling a user to define a conference policy is defined in [XCONCPCP]. The assignment of privileges allowing a user to manipulate the conference policy is defined in [XCONCPRV]. XML Configuration Access Protocol (XCAP) is proposed as one protocol Barnes & Boulton Expires April 14th, 2005 [Page 12] XCON Framework October 14th, 2004 mechanism [XCONCPXC] to store and manipulate the conference policy data. By the nature of conference policies, not all aspects of the policy can be manipulated with a conference policy control protocol. The conference policy includes the membership policy and the media policy. The membership policy includes per-participant policies that specify how the focus is to handle a particular participant. These include whether or not the participant is anonymous, for example. The media policy describes the way in which the set of inputs to a mixer are combined to generate the set of outputs. Media policies can span media types. In other words, the policy on how one media stream is mixed can be based on characteristics of other media streams. Media policies can be based on any quantifiable characteristic of the media stream (its source, volume, codecs, speaking/silence, etc.), and they can be based on internal or external variables accessible by the media policy. Some examples of media policies include: o The video output is the picture of the loudest speaker (video follows audio). o The audio from each participant will be mixed with equal weight, and distributed to all other participants. o The audio and video that is distributed is the one selected by the floor control server. [Editor's note: Will provide more media policy detail in next revision of this document.] 5. Common Operations There are a large number of ways in which users can interact with a conference. They can join, leave, set policies, approve members, and so on. This section is meant as an overview of the major conferencing operations, summarizing how they operate. In addition, this section addresses how some of the scenarios identified in [XCONSCEN] can be realized with the functionality provided by the components. The SIP specific mechanisms for some of these common operations are described in [SIPCONFW]. Note that non-automated means, such as a web page or IVR interface could be used for these operations. However, this is outside the scope of this framework which is to define automated means and protocols. 5.1 Creating Conferences There are many ways in which a conference can be created. The creation of a conference actually constructs several elements all at the same time. It results in the creation of a focus and a Barnes & Boulton Expires April 14th, 2005 [Page 13] XCON Framework October 14th, 2004 conference policy. It also results in the construction of a conference URI, which uniquely identifies the focus. Since the conference URI needs to be unique, the element which creates conferences is responsible for guaranteeing that uniqueness. This can be accomplished deterministically, by keeping records of conference URIs, or by generating URIs algorithmically, or probabilistically, by creating random URI with sufficiently low probabilities of collision. When a media and conference policy are created, they are established with default rules that are implementation dependent. If the creator of the conference wishes to change those rules, they would do so using a conference policy control protocol (CPCP), for example. Of course, using a CPCP requires that an element know the URI for manipulating the policy. That requires a means to learn the conference policy URI from the conference URI, since the conference URI is frequently the sole result returned to the client as a result of conference creation. Any other URIs associated with the conference are learned through the conference notification service. They are carried as elements in the notifications. 5.1.1 CPCP Mechanisms An XCON conference instance can be created through interaction with the conference policy server, as defined in section 4.2. The creation process involves the creation of a membership policy resource as defined in [XCONCPCP]. The protocol interaction between the requesting entity and the policy server are defined in separate XCON documents, such as [XCONCPXC]. The creation of a new membership policy resource will be required to conform to the schema detailed in [XCONCPCP]. In many cases, the creator of the conference policy resource is the sole user with access rights to the conference policy and other users do not have any rights to view nor modify the document. However, some scenarios require different privileges to allow other users to modify certain parts of the conference policy XML document. The mechanism to provide these user privileges is defined in [XCONCPRV]. The constraints imposed on the creation of a new conference instance using this method must be enforced by the conference policy server and any additional constraints are subject to local policy (e.g. maintenance and handling of unique conference URI's). A successful membership policy creation will result in the automatic generation of all other required conference state components. If not otherwise specified, the mandatory conference state components Barnes & Boulton Expires April 14th, 2005 [Page 14] XCON Framework October 14th, 2004 include default media policy and URI, Floor Control and URI, focus instance and Signaling I/F instance etc. 5.2 Adding Participants There are many mechanisms for adding participants to a conference. These include using the Signaling I/F (for SIP described in [SIPCONFW]), the conference policy control protocol, and non- automated means. In all cases, participant additions can be first party (a user adds themselves) or third party (a user adds another user). 5.2.1 CPCP Mechanisms The conference membership policy semantics are defined in [XCONCPCP]. The semantics allow for participants of a conference instance to be added at both the instantiation and during the life time of a conference. The request to add additional participants must comply with the constraints detailed in [XCONCPRV] and violations will result in failure of the operation. The supporting protocol interaction between the requesting entity and the policy server are defined in separate XCON documents, such as [XCONCPXC]. A successful participant addition will result in implicit operations which complement the updated membership policy. This includes the creation/application of media policy, the triggering of conference notification service messages and appropriate focus signaling using the Signaling I/F. The floor control server should also be capable of accepting floor control requests from the additional participants. 5.3 Conditional Joins Conference policies are installed during conference instantiation for the purpose of defining both membership and media policies for a unique conference instance. The conference policy is a bi- directional process as a participant might only wish to join the conference instance if certain policies are set in a desired manner. The flexibility of achieving such conference policy manipulation is dependant on the security policies being enforced by the conference policy server. Examples can be conveyed for both media and membership policy. On receiving a conference URI, an XCON aware endpoint has the ability to use the appropriate policy interface and manipulate conference policy before joining. For example, a user might wish to enter the conference instance anonymously. This can be achieved by manipulating the conference policy before joining, prior to the acceptance of a conference invitation. This would allow the Barnes & Boulton Expires April 14th, 2005 [Page 15] XCON Framework October 14th, 2004 participant to join the conference instance by sending an XCON establish request to the focus but the endpoints identity would not be revealed to the remaining participants, however, the participants would be informed that a new participant has joined. This example can be applied to any conference policy feature. Similar examples can be conveyed for media policy. Following on from the previous example, the 'Anonymous' participant may be a supervisor who just wishes to observe a current conference instance. The XCON capable endpoint would manipulate the media policy (using the appropriate XCON interface) before joining the conference instance. This might involve muting the input media stream so that output media can be observed but none injected into the mix. Requiring that any conference policy features be enforced before joining a conference instance can be seen as examples of a conditional join. 5.4 Removing Participants CPCP can be used by a client to remove any participant (including themselves) as long as the semantics defined in [XCONCPCP] are obeyed and the initiator of the request has sufficient authentication/authorization as defined in [XCONCPRV]. When CPCP is used for this purpose, the focus will send a termination request to the participant that is being removed using the signaling interface. The focus will execute any other signaling that is needed to remove the participant (for example, manipulate other signaling connections). The change in membership policy will result in focus initiated updates of conference state using the conference notification service and the signaling interface. The conference policy control protocol can also be used to remove a large number of users. This is generally referred to as mass ejection. 5.5 Creating Sidebars A sidebar is a "conference within a conference", allowing a subset of the participants to converse amongst themselves. Frequently, participants in a sidebar will still receive media from the main conference, but "in the background". For audio, this may mean that the volume of the media is reduced, for example. A sidebar is represented by a separate conference URI. This URI is a type of "alias" for the main conference URI. Both route to the same focus. Like any other conference, the sidebar conference URI has a conference policy and a media policy associated with it. Like any Barnes & Boulton Expires April 14th, 2005 [Page 16] XCON Framework October 14th, 2004 other conference, one can join it by sending an establish request to this URI, or ask others to join by referring them to it. However, it differs from a normal conference URI in several ways. First, users in the main conference do not need to establish a separate signaling relationship to the sidebar conference. The focus recognizes the sidebar as a special URI, and knows to use the existing dialog to the main conference as a "virtual" connection to the sidebar URI. The second difference is the way in which conference and media policies are implemented. If the conference policy control protocol is used to add a user to a normal conference, the focus will typically send an establish request using the signaling interface to the participant to ask them to join. For a sidebar conference, it is done differently. If the conference policy control protocol is used to add a user to it, and that user is already part of the main conference, the focus will use the conference notification service to alert the existing participant that they have been asked to join the sidebar. The invited user can then make use of the CPCP to formally be added to the sidebar. Further detail on sidebars is provided in [XCONSIDE]. 5.6 Destroying Conferences Conferences can be destroyed in several ways. Generally, whether those means are applicable for any particular conference is a component of the conference policy. When a conference is destroyed, the conference and media policies associated with it are destroyed. Any attempts to read or write those policies results in a protocol error. Furthermore, the conference URI becomes invalid. Any attempts to send an establish request to it, or request conference notifications from it, would result in an error response. Typically, if a conference is destroyed while there are still participants, the focus would send a tear down to those participants before actually destroying the conference. Similarly, if there were any users subscribed to the conference notification service, those subscriptions would be terminated by the server before the actual destruction. 5.6.1 CPCP Mechanisms A CPCP can be used by a client to destroy a conference instance as long as the semantics defined in [XCONCPCP] are obeyed and the initiator of the request has sufficient authentication/authorization as defined per [XCONCPRV]. When CPCP is used for this purpose, the Barnes & Boulton Expires April 14th, 2005 [Page 17] XCON Framework October 14th, 2004 focus will first send both termination requests to all the conference instance participants and conference notification terminations using the signaling interface. The focus will execute any other signaling that is needed to remove the conference instance (for example, manipulate other signaling connections). Once all relevant signaling has occurred, the focus instance and all related policy state information can be destroyed. 5.7 Obtaining Membership Information A participant in a conference will frequently wish to know the set of other users in the conference. This information can be obtained many ways. 5.7.1 CPCP Mechanisms The CPCP can be used by a client to retrieve the members of a conference instance as long as the semantics defined in [XCONCPCP] are obeyed and the client has the privilege as defined in [XCONCPRV]. The supporting protocol interaction, for carrying out the retrieval, between the requesting entity and the policy server are defined in separate XCON documents, such as [XCONCPXC]. 5.8 Adding and Removing Media Each conference is composed of a particular set of media that the focus is managing. For example, a conference might contain a video stream and an audio stream. The set of media streams that constitute the conference can be changed by participants. When the set of media in the conference change, the focus will need to generate a modify request to each participant in order to add or remove the media stream to each participant. When a media stream is being added, a participant can reject the offered media stream, in which case it will not receive or contribute to that stream. Rejection of a stream by a participant does not imply that that the stream is no longer part of the conference, but rather that the participant is not involved in it. There are several ways in which a media stream can be added or removed from a conference. 5.8.1 MPCP Mechanisms The MPCP can be used by a client to add/remove media streams of a conference instance as long as the semantics defined in [XCONMPCP] are obeyed and the initiator of the request has sufficient Barnes & Boulton Expires April 14th, 2005 [Page 18] XCON Framework October 14th, 2004 authentication/authorization. The supporting protocol interaction, for carrying out the retrieval, between the requesting entity and the policy server are defined in separate XCON documents, such as [XCONMPCP]. The addition/removal of media from a conference instance will result in focus operations such as updates in both connection signaling and notification service updates using the signaling interface. Media updates will also have subsequent impacts on media policy and floor control (e.g. creation/deletion of a conference floor). 5.9 Conference Announcements and Recordings Conference announcements and recordings play a key role in many real conferencing systems. Examples of such features include: o Asking a user to state their name before joining the conference, in order to support a roll call o Allowing a user to request a roll call, so they can hear who else is in the conference o Allowing a user to press some keys on their keypad in order to record the conference o Allowing a user to press some keys on their keypad in order to be connected with a human operator o Allowing a user to press some keys on their keypad to mute or un-mute their line User 1 +-----------+ | | | | |Participant| | 1 | | | +-----------+ |Signaling |I/F 1 Conference | Policy +---|--------+ User 2 Server | | | Application +-----------+ +-----------+ | CPCP ************* | | | | |-------- * * | | | | | * * |Participant|-----------| Focus |------------*Participant* | 2 | Signaling| | |Signaling* 4 * | | I/F 2 | |--+ I/F 4 * * Barnes & Boulton Expires April 14th, 2005 [Page 19] XCON Framework October 14th, 2004 +-----------+ +-----------+ ************* | | |Signaling |I/F 3 | | +-----------+ | | | | |Participant| | 3 | | | +-----------+ User 3 Figure 4 In this framework, these capabilities are modeled as an application which acts as a participant in the conference. This is shown pictorially in Figure 4. The conference has four participants. Three of these participants are end users, and the fourth is the announcement application. If the announcement application wishes to play an announcement to all the conference members (for example, to announce a join), it merely sends media to the mixer as would any other participant. The announcement is mixed in with the conversation and played to the participants. The application would have configured appropriate media policy using the appropriate XCON interface to allow for media functions to act in this particular role (e.g. the input stream policy would be activated while the output stream would have gain set to mute). Similarly, the announcement application can play an announcement to a specific user by using the CPCP to configure its media policy so that the media it generates is only heard by the target user. The application then generates the desired announcement, and it will be heard only by the selected recipient. The announcement application can also receive input from a specific user through the conference. The announcement application would use a CPCP to cause in-band DTMF to be dropped from the mix, and sent only to itself. When a user wishes to invoke an operation, such as to obtain a roll call, the user would press the appropriate key sequence. That sequence would be heard only by the announcement application. Once the application determines that the user wishes to hear a roll call, it can use the CPCP to set the media policy so that media from that user is delivered only to the announcement Barnes & Boulton Expires April 14th, 2005 [Page 20] XCON Framework October 14th, 2004 application. This "disconnects" the user from the rest of the conference so they can interact with the application. Once the interaction is done, and announcement application uses the CPCP to "reconnect" the user to the conference. 5.10 Floor Control Within this framework, floor control is defined as a mechanism that enables applications or users to gain safe and mutually exclusive or non-exclusive input access to the shared object or resource associated with a specific conference instantiation. Floor control is managed by an entity that is referred to as a "chair". The chair does not have to be a participant in the conference. A floor chair is not mandatory for grant, deny, or revoke floor operations and decisions can automatically be generated based on floor control policy (e.g. floor grant based on queue position). A floor control server is a logical entity that maintains the state of the floor(s) including which floors exists, who the floor chairs are, who holds a floor, etc. Requests to manipulate a floor are directed at the floor control server. The chair may use CPCP to enforce the resulting floor control decisions by manipulating the conference policy, however, the requirements for the protocol to support floor control identified in [XCONFCRQ] are independent of the use of CPCP. A proposal for a binary floor control protocol is defined in [XCONBFCP]. Figure 5 provides an overview of the functionality supported by the floor control protocol: +---------+ | Floor | | Chair | | | +---------+ ^ | | | Notification | | Decision | | | | Floor | v +---------+ Request +---------+ +---------+ | |----------->| Floor | Notification | | | User | | Control |------------->| User | | |<-----------| Server | | | +---------+ Granted or +---------+ +---------+ Denied Figure 5: Functionality provided by Floor Control Protocol A Floor has a 1:1 mapping with a media type contained within a conference instance. Such media is represented using the Session Barnes & Boulton Expires April 14th, 2005 [Page 21] XCON Framework October 14th, 2004 Description Protocol (SDP) [RFC2327]. Each media type, as defined by the 'm=' line in an SDP description can have an associated floor if implemented. A correlation needs to exist so that media lines contained in SDP can be mapped to a floor instance existing in the media policy of the conference instance. This can be achieved using the media label attribute [SDPMLABL] which creates an identifier for an SDP media line, for example: m=audio 6967 RTP/AVP 0 a=label:1 A floor defined within a media policy will also have an identifying attribute to distinguish it from other floors [XCONMDTP]. The value of this attribute maps directly to the value conveyed in the 'label' attribute. For example, the media line defined in the previous example has an SDP 'label' attribute value of '1'. The media policy for this unique conference instance would also have a floor definition that contains an identifying attribute equal to '1'. If media policy is altered for this particular floor, the new policy can be applied to the correct media stream in the conference instance using this correlating identifier. 5.11 Whispering or Private Messages A whisper is a private message sent between participants in a conference or a conference sidebar. A whisper manifests itself as a temporary alteration to the media policy, instructing the mixer to temporarily restrict the distribution of a particular media stream to a single conference participant or a subset of the participants. The whispered media stream is marked as "private" so that the recipient can render it in an appropriate way. For example, a private instant message could be rendered along with the rest of the messages in the conference, but with a different color, or tagged as "private". The way in which whispered media streams are marked as private is dependent on the type of the media stream. For example, an instant message could have a dedicated command for sending a private message, or an explicit indicator imbedded in the message header. This indicator would both instruct the mixer in proper handling of the message, and the recipient in proper rendering of the message. Whether whispering is allowed in a conference is a configurable option. This option is set as part of the conference policy using CPCP and the support for this feature is negotiated with the focus when a participant joins the conference. OPEN ISSUE: How will the whisper mode be set? It probably needs to be defined per media, so a natural place would then be the media policy? Barnes & Boulton Expires April 14th, 2005 [Page 22] XCON Framework October 14th, 2004 The difference between a sidebar and a whisper is that a sidebar creates a context for the (potentially) private discussion, while a whisper is logically part of the existing context of the conference or conference sidebar establishes no additional context. 6. XCON Data Model This section defines a data model supporting and expanding upon the fundamental logical conferencing model defined in Figure 2. This model provides the basis of the functionality realized by the protocols and mechanisms defined in the individual XCON documents referenced in the previous sections of this document. [Editor's note: Nice ascii art diagram to be inserted here once we've finalized a model using some friendlier drawing tools.] 7. Security Considerations The framework put forth in this draft introduces signaling interfaces which have a variety of potential threats. Each of the specific protocols defined in support of this framework must adequately address those threats. 8. IANA Considerations This draft introduces no considerations for IANA. Informational References [RFC2327] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998. [RFC2396] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifiers (URI): Generic Syntax", RFC 2396, August 1998 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,A., Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [RFC3265] Roach, A., "Session Initiation Protocol (SIP)-Specific Event Notification", RFC 3265, June 2002. [SDPMLABL] O. Levin, G. Camarillo, "The SDP (Session Description Protocol) Label Attribute", draft-ietf-mmusic-sdp-media-label-00.txt, Work in Progress, September 28, 2004. Barnes & Boulton Expires April 14th, 2005 [Page 23] XCON Framework October 14th, 2004 [SIPCONFW] J. Rosenberg, "A Framework for Conferencing with the Session Initiation Protocol," draft-ietf-sipping-conferencing- framework-02, Work in Progress, June 29, 2004. [XCONBFCP] G. Camarillo, J. Ott, K. Drage, " The Binary Floor Control Protocol (BFCP)", draft-ietf-xcon-bfcp-00.txt, Work in Progress, July 6, 2004. [XCONCPRQ] P. Koskelainen, H. Khartabil, "Requirements for Conference Policy Control Protocol", draft-ietf-xcon-cpcp-reqs-04, Work in Progress, August 12, 2004. [XCONCPCP] H. Khartabil, P. Koskelainen, A. Niemi, "The Conference Policy Control Protocol (CPCP) ", draft-ietf-xcon-cpcp-01, Work in Progress, October 12, 2004. [XCONCPRV] H. Khartabil, A. Niemi, "Privileges for Manipulating a Conference Policy", draft-ietf-xcon-conference-policy-privileges-01, Work in Progress, October 12, 2004. [XCONCPXC] H. Khartabil,"An Extensible Markup Language (XML) Configuration Access Protocol(XCAP) Usages for Conference Policy Manipulation and Conference Policy Privileges Manipulation ", draft- ietf-xcon-cpcp-xcap-03, Work in Progress, October 12, 2004. [XCONFCRQ] P. Koskelainen, J. Ott, H. Schulzrinne, X. Wu, "Requirements for Floor Control Protocol", draft-ietf-xcon-floor- control-req-01.txt, Work in Progress, July 19, 2004. [XCONMPCP] C. Jennings, B. Rosen, "Media Conference Server Control for XCON", draft-jennings-xcon-media-control-01, Work in Progress, July 12, 2004. [XCONMDTP] C. Boulton, TBD. [XCONSCEN] R. Even, N. Ismail, "Conferencing Scenarios", draft-ietf- xcon-conference-scenarios-02.txt, Work in Progress, June, 2004. [XCONSIDE] B. Rosen, A. Johnston, "SIP Conferencing: Sub-conferences and Sidebars", draft-rosen-xcon-conf-sidebars-01, Work in Progress, July 16, 2004. [XML] Bray, T., Paoli, J., Sperberg-McQueen, C. and E. Maler, "Extensible Markup Language (XML) 1.0 (Second Edition)", W3C REC REC- xml-20001006, October 2000. Acknowledgements Barnes & Boulton Expires April 14th, 2005 [Page 24] XCON Framework October 14th, 2004 The initial text for this framework was based on [SIPCONFW] and modified to provide the more general context for this framework, thus the excellent work of Jonathan Rosenberg and the original conferencing design team is much appreciated in providing the starting point for this framework document. The constructive input and guidance from Alan Johnston for this document is appreciated. Aki Niemi provided the initial text for the section on "whispering". And, of course, the ongoing work in the XCON WG in forming the content of this draft is appreciated. Authors' Addresses Mary Barnes Nortel Networks 2380 Performance Drive Richardson, TX USA Phone: 1-972-684-5432 Email: mary.barnes@nortelnetworks.com Chris Boulton Ubiquity Software Langstone Park Newport, South Wales, UK, NP18 2LH Phone: +44 (0)1633 765600 Email: cboulton@ubiquitysoftware.com Full Copyright Statement Copyright (C) The Internet Society (2004). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND TH INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Barnes & Boulton Expires April 14th, 2005 [Page 25]