O. Levin Internet Draft RADVISION Document: draft-levin-sip-for-video-01.txt R. Even Polycom Expires: July 2002 February 2002 Multimedia Conferencing Requirements for SIP Based Systems Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document outlines requirements for SIP-based entities essential for participation and optional hosting of multimedia conferencing. The requirements are grouped in two categories: point-to-point primitives and conferencing (i.e. "beyond two participants") primitives. A separate document will suggest the mapping of the requirements to corresponding protocols' primitives. Levin, Even Expires: July 2002 1 Multimedia Conferencing Requirements for SIP Based Systems Table of Contents STATUS OF THIS MEMO....................................................1 ABSTRACT...............................................................1 MOTIVATION.............................................................3 REQUIREMENTS' CLASSIFICATION...........................................3 CONFERENCING CONTROL...................................................3 GENERAL...............................................................3 CONFERENCE DESCRIPTION................................................4 CONFERENCE IDENTIFICATION.............................................4 COMMANDS..............................................................4 INDICATIONS...........................................................5 VISUALIZATION COMMANDS................................................5 CHAIR CONTROL..........................................................5 CONFERENCE HIERARCHY...................................................5 POINT-TO-POINT AUTONOMOUS MEDIA CONTROL................................5 VIDEO FULL PICTURE FAST UPDATE REQUEST................................6 VIDEO GOB FAST UPDATE REQUEST.........................................6 VIDEO MB FAST UPDATE REQUEST..........................................6 VIDEO FREEZE PICTURE REQUEST..........................................6 VIDEO CHANGE SPATIAL TRADEOFF REQUEST.................................6 VIDEO MB NOT DECODED INDICATION.......................................6 LOST PICTURE AND LOST PARTIAL PICTURE INDICATION......................6 POINT-TO-POINT CALL (LEG) ESTABLISHMENT AND MANAGEMENT.................6 DEFAULT CAPABILITIES' LOWEST COMMON DENOMINATOR.......................7 CAPABILITIES' EXCHANGE................................................7 APPLICATION DRIVEN MEDIA CONTROL......................................7 POINT-TO-POINT APPLICATION CONTROL.....................................8 REMOTE DEVICE CONTROL.................................................8 T.120 AND OTHER APPLICATIONS' STREAMS.................................8 CONVENTIONS USED IN THIS DOCUMENT......................................8 SECURITY CONSIDERATIONS................................................8 AUTHOR'S ADDRESSES.....................................................8 REFERENCES.............................................................9 Levin, Even Expires: July 2002 2 Multimedia Conferencing Requirements for SIP Based Systems Motivation The goal of his document is to identify requirements for SIP entities in order to participate and optionally host multimedia conferencing applications in standard interoperable manner. The requirements are grouped according to their subjects and are presented as lists of primitives with motivation for each. A separate document will provide both the mapping from the requirements to existing protocols' primitives (whenever possible) and suggest directions for required extensions (whenever the desired functionality doesn't exist). It is our goal that together both documents would provide a guide for building interoperable SIP-based multimedia conferencing applications. Requirements' Classification There are many ways to classify conferencing models, requirements and the solutions. Documents [6-10] present a number of approaches. We chose to arrange the requirements in two main categories: those that relate to point-to-point procedures and those that relate to conferencing (i.e. "more then two participants involved") procedures. Support for the both categories is required for building multimedia conferencing applications. Each category is further divided into sub-topics. We believe that the primitives, listed in this document, are essential for building various conferencing models that are discussed in [6]. Standard definition for these primitives would suffice for building conferencing models using Conferencing Servers. More generally, these requirements address multimedia systems comprising of terminals, gateways and conferencing servers (i.e. MCUs - Multipoint Control Units). The general model for multimedia communications is not different from a "single media" communications except for the presence of more then one media stream that relates to the other streams in the session. Thus it is one of the goals to provide same architecture and same user experience for both audio only and multimedia conferencing. Requirements related to support of video are presented as a separate sub-topic of point-to-point procedures. Conferencing Control General Some conferencing models [6] can be implemented with only minor additions to the SIP baseline specification or by using "out-of- band" conventions. Nevertheless, standard "advanced" functionality and expressiveness are highly desirable. Levin, Even Expires: July 2002 3 Multimedia Conferencing Requirements for SIP Based Systems Standard SIP primitives used for building various services SHOULD be used by conferencing applications whenever possible. Same conference may be perceived by its participants as of different conferencing models (such as dial-in vs. dial-out and conferencing server vs. end user mixing). Additionally, the same user agent may participate or host conferences of different models. Therefore, different conferencing models SHOULD be built out of common basic "conferencing" blocks. We believe that in order to allow smooth services' interoperability and future features' expansion, the conferencing service should be explicitly identified. It is highly RECOMMENDED that SIP entities, that don't explicitly support conferencing extensions, would be able to participate in SIP conferences in a basic mode, i.e. without benefiting from additional information and functionality that these extensions would provide. The following should be specified: Conference Description Conference Description is a way of specifying a desired (but unknown) conference in terms of its capabilities, modes, location, etc. One of the examples for using Conference Description is upon creating a new conference. Conference Identification Global Conference Identifier should allow for matching existing (active or reserved) conferences. One of the examples for using the Conference Identification is upon joining a specific conference. The Conference Identifier SHALL be included in each conference-related primitive (such as commands and indications). Note: It is a separate design issue how to signal these values: new vs. existing methods, new vs. existing headers, and INVITE vs. Events model. Our preference is by NOT overloading the existing fields because of the same service-interoperability issue. Commands Standard way of issuing the following actions should be defined: Create/Terminate Conference Invite/Disconnect Participant Get/Receive Conference Details Get/Receive Participant Details Levin, Even Expires: July 2002 4 Multimedia Conferencing Requirements for SIP Based Systems Indications Standard way of conveying the following indications should be defined: Participant joined/left the conference Participant is seen be at least one other terminal Participant is seen by all Participant you are seeing Visualization Commands Standard way of issuing the following actions should be defined: Broadcast my session, listing selected streams (if not all of them are required). Broadcast another participant's stream Chair Control Chair commands such as Chair Token Management including: Make Me Chair Request Floor Cancel Chair We suggest designing these commands using a general token mechanism where the "chair" is one of the token types. Conference Hierarchy Conferencing identifiers and primitives SHALL allow in future for creating sub-conferences and moving participants to and from the sub-conferences. Point-to-Point Autonomous Media Control This includes commands and indications exchanged between CODECs (CODer DECoder) as a result of CODEC algorithms. In some cases it is performed independently from the upper layers such as call control and application actions. These commands and indications are typical for video media. Applications involving video are particularly prone to frequent network changes causing packets lost, error conditions, etc. The video information includes full picture frames and frames that reflect changes from previous frames. Losing IP packets causes synchronization problem for the decoder. Various video specific techniques have been used in today's networks in order to cope with fluctuating conditions with minimum service degradation. The required primitives (together with their motivations) are presented below: Levin, Even Expires: July 2002 5 Multimedia Conferencing Requirements for SIP Based Systems Video Full Picture Fast Update Request Video GOB Fast Update Request Video MB Fast Update Request These commands are to be sent from a decoder to an encoder. Video CODECs (such as H.261 [4] and H.263 [5]) have a notion of picture's building blocks: "full picture", GOB and MacroBlock (MB). The decoder has an ability to recognize degradation in synchronization and explicitly request from an encoder for a "full picture", a whole GOB or a whole MacroBlock. Video Freeze Picture Request This command is to be sent from an encoder to a decoder. In case the encoder is aware of a changes in the transmitted picture that would cause lost of synchronization, it requests the decoding side to freeze the picture, i.e. to stop presenting the changes, until a new stable image is encoded and transmitted. Video Change Spatial Tradeoff Request This command is to be sent from a decoder to an encoder (if the encoder has the capability to dynamically change the tradeoff). This is a request to change the tradeoff between temporal and spatial resolutions, i.e. the tradeoff between the rate of the samples and the resolution of the picture. There are indications from decoder to encoder informing the encoder of some problems, but leaving to the encoder the decision what to do. These indications are usually tied to a CODEC algorithm. These indications include: Video MB Not Decoded Indication This indication is used to inform the encoder not to use specific macro blocks for prediction. Lost Picture and Lost Partial Picture Indication These indications are used to inform the encoder that it should use some error resilience to solve the problem. Point-to-Point Call (Leg) Establishment and Management This section describes the requirements for establishing call legs and managing them. Most of the requirements are similar across different SIP systems and are not limited to multimedia conferencing applications, but we would like to have them here for completeness of the model. Levin, Even Expires: July 2002 6 Multimedia Conferencing Requirements for SIP Based Systems Default Capabilities' Lowest Common Denominator In audio calls we expect all endpoints to support G.711 as a common mode. We need to specify the mandatory video CODEC if we want to have a high probability of establishing a video call with a common CODEC. The typical common video CODEC, for enabling interoperability with other protocols, is H.261 at QCIF resolution. Capabilities' Exchange During call setup there is a need to establish the session based on the capabilities of the participants. Since an endpoint, which can be a terminal, a gateway or an MCU, can support one or more stream there is a need to be able to specify alternate and simultaneous capabilities. The endpoint needs to be able to explicitly request symmetric CODECs in the send and receive path. This is important for gateways and MCUs that may need to work symmetric on all legs of the call. An example is a gateway from SIP to H.320 [3]. These calls need symmetric CODECs on the switched network side. Video capabilities need to include the following parameters: Algorithm (H.261, H.263, MPEG) Maximum Bandwidth Supported Maximum Resolution Supported (QCIF, CIF, 4CIF, custom formats) Maximum Video Frame Rate Application Driven Media Control These are requirements for requesting change of media stream or bandwidth during the session. These requirements are typical for multipoint and gateway calls when there is a need to change the CODECs during the session. Changing of bandwidth is required for the video stream in order to be able to adapt to a congestion situation by reducing the video rate. Flow control mechanisms for controlling the bandwidth per stream and per session are important. The selected algorithm does not define the video bandwidth and a session can include many streams. The need is to be able to define the bandwidth at the start of the connection and being able to change it during the session. The reasons for bandwidth control include the need to gateway a video stream from the packet network to the switched network where the bandwidth is defined and fixed [3]. A reason for a change during the session is to be able to change the maximum bandwidth depending on network congestion. For example, to reduce the video rate when we suffer a lot of packet loss. The requirement is to be able to specify maximum and exact bandwidth to be used by the session and a specific media. Levin, Even Expires: July 2002 7 Multimedia Conferencing Requirements for SIP Based Systems Mode request mechanism is needed to request the sender to send a stream with the requested parameters. The parameters include algorithm, resolution, frame rate and specific algorithm parameters like interlace picture for H.263. In cases where the communication is going to a gateway or MCU there is a need to request symmetric stream. Terminals that cannot encode and decode different streams may require this as well. Point-to-Point Application Control Remote Device Control Far End Camera Control is a basic feature in video conferencing. This feature enables the remote user to control the remote camera (Zoom and Pan). The feature is used, for example, in medical application when the remote viewer can select the view without having to interrupt the people doing the operation. The requirement for a standard way to do it is to enable interoperability of the feature between different VC terminals. Far End Camera Control is a private case of a general Remote Device Control. It is required to have an ability to address the innate SIP/SDP objects (such as media streams) at the level of the Remote Device Protocol. That is in order to be able to associate a specific remote device (such as a camera) with specific media streams. T.120 and Other Applications' Streams Standard integration with T.120 [2] is required. T.120 is a data conferencing protocol. Microsoft NetMeeting is one of the well-known T.120 clients. The requirement is to be able to have the T.120 communication and the video/audio session as components of a common session (or conference). T.120 is an example of a more general requirement to be able to associate different types of application streams within a common session (or conference). Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [1]. Security Considerations Security requirements will be covered in the next version of this document. Author's Addresses Orit Levin Levin, Even Expires: July 2002 8 Multimedia Conferencing Requirements for SIP Based Systems RADVISION 575 Corporate Drive Phone: +1-201-529-4300 Mahwah, NJ USA Email: orit@radvision.com Roni Even Polycom 94 Derech Em Hamoshavot Phone: +972-3-925-1200 Petach Tikva Email: roni.even@polycom.co.il Israel References 1 RFC 2119 Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 2 ITU-T Recommendation T.120 (1996), Data protocols for multimedia conferencing. 3 ITU-T Recommendation H.320 (1997), Narrow-band visual telephone systems and terminal equipment. 4 ITU-T Recommendation H.261 (1993), Video codec for audiovisual services at p . 64 kbit/s. 5 ITU-T Recommendation H.263 (1998), Video coding for low bit rate communication. 6 J. Rosenberg, H. Schulzrinne, "Models for Multi Party Conferencing in SIP", draft-ietf-sipping-conferencing-models- 00.txt, Nov. 2001, IETF Draft, Work in progress. 7 H. Khartabil, "Conferencing using SIP", draft-khartabil-sip- conferencing-00.txt, Sep. 2001, IETF Draft, Work in progress. 8 R. Mahy, D. Petrie, "The SIP Join and Fork Headers", draft-mahy- sipping-join-and-fork-00.txt, Nov. 2001, IETF Draft, Work in progress. 9 I. Miladinovic, J. Stadler, "SIP Extension for Multiparty Conferencing", draft-miladinovic-sip-multiparty-ext-00.txt, Feb. 2002, IETF Draft, Work in progress. 10 Wu/Koskelainen/Schulzrinne/Chen, "Use SIP and SOAP for conference floor control", draft-wu-sipping-floor-control-00.txt, Feb. 2002,IETF Draft, Work in progress. Levin, Even Expires: July 2002 9