R. Mahy Internet Draft Cisco Systems Document: draft-mahy-sipping-signaled-digits-00.txt Nov 2001 Expires: May 2002 Signaled Digits in SIP Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [RFC2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 1. Abstract This document demonstrates a way for interested SIP User Agents which are not a party to the media of a call or session, to receive SIP event notifications when signaled digits, or other specific telephony-related events are detected. This is useful for a variety of applications that monitor calls for a specific event (e.g.: a long pound, special sequence of digits, or a fax signal) and--only then--take an active role in the monitored calls. 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC2119]. Throughout this document, the author refers to "DTMF" and other "tones" as audio media. Similar types of information conveyed as signaling are called "digits" or "signals". This convention is consistent with RFC2833 [AVT], which provides a more detailed discussion of the issue. Mahy Expires: May 2002 1 SIP Signaled Digits 3. Motivational text RFC2833 "AVT Tones" is widely acknowledged within the IP telephony community as the best way to transport telephony-related tones between end systems which already terminate media using [RTP]. This approach maintains synchronization of speech audio with tone audio, tolerates loss, provides event duration and volume information, and avoids detection delay. While AVT tones is ideal for conveying telephone-events in the context of straightforward sessions like a 2-party call or simple, centrally mixed conference, there are other environments where additional or alternative requirements are needed. These other environments include protocol translation, complex call control, and decomposed applications. 3.1. Protocol translation Protocol translators between SIP and other IP protocols which use RTP (ex: H.323, MGCP, Megaco, RTSP) are frequently implemented as a signaling-only entity which arranges for RTP media streams to travel directly between the final endpoints. This is an efficient arrangement in terms of limiting jitter and latency in the media, and allows the translator to support many more simultaneous sessions than if the translator terminated media as well. The protocol translators may receive telephony-related events (especially signaled digits) via signaling. Likewise, a SIP 3pcc controller, or a protocol translator which uses a traditional CTI protocol for control (ex: TAPI, TSAPI, JTAPI), may receive CTI commands to "insert" digits which may have originated from another application (for example, a desktop call control application). The protocol translator or 3pcc controller may send these signals as RFC2833 media to the target SIP User Agent, or it may want to send a SIP signaled digit instead. RTP implementations must be able to receive media from more than one source on the same receive port, so it would seem straightforward to send RTP to the target User Agent. This proposal has two problems however. If the target translator and SIP User Agent are separated by a firewall, then it is likely that this traffic from a different IP address will be discarded. It is also unlikely that most low-end RTP implementations (IP phones, and software User Agents) will render this additional media correctly. What is more problematic is that there is no mechanism to determine if a SIP User Agent can properly insert telephony events received in an RTP stream separate from their other audio media. Mahy Expires: May 2002 2 SIP Signaled Digits This document proposes that the protocol translator send the audio/telephone-event MIME type defined in RFC2833 in the body of an INFO method to the target User Agent, for it to render. The INFO method means: here is some information relevant to the call. A valid way to handle information in this situation is to render that information. Under this proposal, a User Agent MUST NOT send signaled digits or telephone-events using the INFO method if the event was ever represented as a tone (as media). Only signals originated as pure signaling MAY generate an INFO method. Failure to heed this requirement will result in double-detection of digits/events. If INFO is used incorrectly by a pair of PSTN gateways (for example), the source gateway may detect a digit, send an INFO request which is lost, and retransmit that request. The target gateway would send the original in-band tone to the PSTN when the audio media arrives, later when the INFO arrives, the target gateway would render the same tone again. 3.2. Complex call control Some applications are interested in the telephony signals represented by telephony tones, but do not desire to be a party to the speech portion of the audio media. This document addresses the transport requirements of these signals in this context. Synchronizing speech is a non-issue in these topologies, as there is no audio media with which to synchronize; SIP provides its own reliability mechanism to prevent loss; and since this proposal reuses the encoding specified in [AVT], volume and duration are preserved, and detection delay is minimized. For example, in some application scenarios, a user contacts an application, places a new call in the context of the application (an "outcall"), and returns to the application after the new call is finished. Examples of such scenarios include: Calling card systems, Voicemail or Messaging systems which allows outgoing calls, and Voice Browsers or Voice Portals which allow outgoing calls. All of these applications require a way for the user to get back to the application if something has gone wrong with the outgoing call (ex: wrong number), or if the user changes his or her mind. If the originating user is using a TDM telephone, or a simple IP endpoint, the application will typically expect a sequence of signaled digits (ex: a pound or hash (#) of long duration, three stars (*) in a row, etc.) +-------------+ | | | Originating | | User | Mahy Expires: May 2002 3 SIP Signaled Digits | | +-------------+ | ^ ^ | | NOTIFY | SIP | | RTP | | | | | v v v +-------------+ +-------------+ | | | | | Waiting for | | Target User | | trigger | | or Service | | | | | +-------------+ +-------------+ Below are several possible [SIP] topologies that would enable this type of behavior. Most of these approaches fall into two categories: the application could receive DTMF media corresponding to the signaled digits, or it could receive the signaled digits using SIP. Below are three approaches to encoding this information as media. None of these approaches are very attractive. - The application could relay all the media itself. This wastes network resources and is inefficient for the application. - The application could setup a conference and INVITE itself to the conference. This method requires setting up a complex set of call legs and wastes network and conferencing resources. It also requires that the application verify that the tone media originated exclusively from desired source, which may be impossible. - The application could request "forked-media" [Forked-Media], of just the RFC2833 media. While the best media-related proposal, this method requires rather complex functionality in the "forking" UAs; requires [3pcc], and is problematic for firewalls because of the complexity of the [SDP]. Also, experience at interoperability tests shows that most current SDP implementations are much less robust than their SIP counterparts. This draft will summarize a few non-media approaches as well: - The application could expect to receive [INFO] messages containing a representation of the signaled digits. There are a number of disadvantages to this method as well: a) it requires 3pcc, b) there is no way to turn the INFO messages on or off, c) simultaneous use of AVT tones and INFO may cause double detection of events. - The application could ask the originating UA to execute a script (ex: in [Java]) or render a markup language (ex: [VoiceXML]) to Mahy Expires: May 2002 4 SIP Signaled Digits watch for an event and transfer the call back to the application. This is a very elegant solution, but requires significant resources and implementation on each UA. - The application susbcribes to a SIP-specific event-package which notifies the application of signaled digits from the originating UA (proposed in this document). Although this requires wide deployment on UAs, it is fairly easy to implement and works with both 3pcc and fully distributed call control models. While this proposal only provides examples using signaled digits, it could be used to detect other telephony-related signals (for example FAX signals, or call progress signals). Notification of a telephone-event MUST NOT be used to generate a tone (using RFC2833 media or an INFO). 3.3. Decomposition The extensions in this document also allow for a clean decomposition of some services into media and signaling components. For example, below is a diagram of a VoiceXML browser split into media and non- media handling parts. +-------------+ | | | VoiceXML | | Interpreter | | (signaling) | +-------------+ ^ ^ | | SIP | | [RTSP] | | | | v v +-------------+ +-------------+ | | | | | SIP UA | RTP | RTSP Server | | |<------>| (media) | | | | | +-------------+ +-------------+ The requirements are almost identical to the requirements for complex call control as discussed in the previous section. 4. Event Package Formal Definition 4.1. Event Package Name Mahy Expires: May 2002 5 SIP Signaled Digits This document defines a SIP Event Package as defined in [Events]. The event-package token name for this package is: "telephone-event" 4.2. Event Package Parameters This package defines the following event package parameters: "event", and "duration". The following syntax specification uses the augmented Backus-Naur Form (BNF) as described in RFC-2234 [BNF]. This document adds a new event-package to the definition of event- package in the Event header. event-package = tpackage | token tpackage = "telephone-event" *[";" tparams ] tparams = event-param | duration-param event-param = "events" "=" nums *["," nums ] duration-param = "duration" "=" num numbers = range | num range = num "-" num The duration parameter in a SUBSCRIBE indicates the number of milliseconds a signal must exist before it should be reported with a NOTIFY. If the duration parameter is not specified, the default duration is 40ms. The duration parameter MAY NOT appear in a NOTIFY. The event parameter in a SUBSCRIBE specifies an event mask--the list of events in which the Subscriber is interested. The event parameter and its syntax are defined in RFC2833. The default event mask for this event-package is 0-15. The event parameter MAY NOT appear in a NOTIFY. 4.3. SUBSCRIBE Bodies This package does not define any SUBSCRIBE bodies. 4.4. Subscription Duration Subscriptions to this event package MAY range from seconds to days. Subscriptions in minutes or hours are more typical and are RECOMMENDED. 4.5. NOTIFY Bodies This package re-uses the audio/telephone-event MIME type defined in RFC2833. Each telephone-event consists of a four octet structure. Multiple telephone-events MAY be concatenated. Mahy Expires: May 2002 6 SIP Signaled Digits The structure of the audio/telephone-event MIME type is reproduced here from RFC2833 for the convenience of the reader. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | event |E|R| volume | duration | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ When used with this event-package, the duration portion of telephone-event MUST be expressed in milliseconds (the rate used for this package is always 1000Hz). 4.6. Subscriber generation of SUBSCRIBE requests Subscribers SUBSCRIBE to telephone-events for a period of minutes or hours, and automatically attempt to re-SUBSCRIBE when the subscription is half-expired. If re-subscription fails, the Subscriber SHOULD periodically retry again. The Subscriber SHOULD only SUBSCRIBE to events if it is no longer a party to the media of the susbcribed call. Because it is impossible to perfectly synchronize a SUBSCRIBE with a reINVITE or transfer necessary if the subscriber needs to divest itself of the media, it is sufficient for the application to ignore RFC2833 media which it receives while subscribed to this package for the same call. The application MAY specify an event mask. The specified event mask MUST at least include the default events: 0-15. The subscriber MAY specify a "minimum" duration in milliseconds. The subscriber can be assured that it will not receive any notifications for events which have been in progress for less than this duration. The subscriber MUST still accept "final" events (the end bit is set) with shorter durations. 4.7. Notifier processing of SUBSCRIBE requests If a SUBSCRIBE request arrives using the same call-leg as an existing call, the Notifier MAY authenticate the subscription request (this is RECOMMENDED). If a SUBSCRIBE request arrives using a new call-leg, the Notifier SHOULD authenticate the subscription request. The Notifier MAY limit the duration of the subscription to an administrator defined amount of time. 4.8. Notifier generation of NOTIFY requests Immediately after a subscription is accepted, the Notifier MUST send a NOTIFY with no body. When a telephone-event is detected which matches the event-mask for a subscription and the end of the event is detected or, the Notifier Mahy Expires: May 2002 7 SIP Signaled Digits MUST send a NOTIFY message to all subscribers with matching event masks. For each subscription, when a telephone-event a) is still in progress, b) matches the event-mask for a subscription, c) has occurred for at least the duration specified in a subscription, and d) has not yet triggered a NOTIFY, then the Notifier MUST send a NOTIFY message to that subscriber. Note that events which are detected and end at approximately the same time (for example a DTMF "9" of 45ms) MUST generate two separate events (ex: detected at 40ms, ended at 45ms), even if their durations are the same (ex: ended at exactly 40ms). The notifier MUST send a NOTIFY when it detects either the end of a subscribed event, or the continuation of a subscribed event for a sufficient duration. The notifier SHOULD NOT send events outside the subscribed event mask. If the notifier detects that an event has begun and continued for at least the subscribed duration, it MUST send a NOTIFY for that event. The notifier SHOULD NOT wait for the end of the event. If the notifier detects that an event has ended, it MUST send a NOTIFY for that event, even if that event previously generated a NOTIFY, and even if the event was shorter than the minimum duration requested. The notifier MUST NOT send the same event three times as required for AVT conveyed in RTP. SIP provides its own redundancy mechanism, and without the timestamp header of the RTP packet available in SIP, there would no way to determine if these were duplicate events. Note that multiple applications may subscribe to signaled digits (possibly with different parameters) for the same call simultaneously. A practical example is a calling card call to a voicemail application during an outcall. The calling card application may wait for a long pound, while the messaging system waits for a different sequence. The Notifier SHOULD concatenate all unsent events into a single NOTIFY. A Messaging System SHOULD send a NOTIFY with no body, and an "Expires" header of "0" when the subscribed call-leg is terminated. 4.9. Subscriber processing of NOTIFY requests Upon receipt of a valid NOTIFY request for this package, the subscriber MUST verify that the event, duration, and volume are of interest to the subscriber. The subscriber MUST also check the end bit. Mahy Expires: May 2002 8 SIP Signaled Digits If the Notifier receives a NOTIFY for an event in the range 0-15, it SHOULD verify that the volume parameter is less than or equal to 36 (at least -36 dbm). It MAY accept an event in this range with a volume parameter as large as 55 (volume as low as -55 dbm). For all other events, the volume MUST be zero. Most applications will only act either on interim events (end bit is zero), or on final events (end bit is one), but not both. For example, an application which watches for "***" would look for the "*" event (10), the end bit equal to zero, and a duration greater than or equal to 40ms based on the current rate. It could ignore all final events. Further processing of the event is implementation specific. 4.10. Handling of Forked Requests Forked requests are not permitted for this event-package. 4.11. Rate of notifications Note that in SIP, NOTIFY transactions MUST NOT overlap each other. The rate of Notifications is effectively limited by the round trip time between the Notifier and Subscriber. Notifiers MUST buffer and consolidate multiple events that occur while waiting for outstanding transactions to complete. Notifiers MUST NOT generate NOTIFY requests for this package more frequently than once every 50ms. Notifiers MAY buffer NOTIFYs for an administrator defined period of time. 4.12. State Agents Usage of State Agents is not appropriate for this event-package. 4.13. Behavior of a Proxy Server There are no additional requirements on a SIP Proxy, other than to transparently forward the SUBSCRIBE and NOTIFY methods as required in SIP. However, Proxies are encouraged to support routing to specific Contacts based on the existence of an Accept-Contact header, as specified in the caller preferences specification. 5. Usage with INFO To be written 6. Examples of usage 6.1 NOTIFY in 3pcc environment Mahy Expires: May 2002 9 SIP Signaled Digits The example below shows a typical scenario used for calling cards. The Application acts as both an ordinary UA and as a 3pcc controller. Original App Target UA UA | | | |--INVITE----->| | |<---200-------| | |----ACK------>| | | | | |<===RTP======>| | | | | | ..time.. | | | | | |<--SUBSCRIBE--| | |----200------>| | | | | | |--INVITE---->| | |<---180------| |<--reINVITE---|<---200------| |----200------>| | |<---ACK-------|----ACK----->| | | | |<=========RTP==============>| | | | | ..time.. | | | | | |---NOTIFY---->| | |<---200-------| | | | | |<--reINVITE---|----BYE----->| |-----200----->|<---200------| |<----ACK------| | | | | |<====RTP=====>| | | | | SUBSCRIBE sip:gateway.itsp.net SIP/2.0 Call-Id: 100@gateway.itsp.net To: From: ;tag=abcd CSeq: 1 SUBSCRIBE Events: telephone-event;duration=2000 Expires: 3600 Content-Length: 0 NOTIFY sip: Call-Id: 100@gateway.itsp.net Mahy Expires: May 2002 10 SIP Signaled Digits To: ;tag=abcd From: ;tag=efgh CSeq: 5 NOTIFY Events: telephone-event;rate=1000 Content-Type: audio/telephone-event Content-Length: 4 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | event |E|R| volume | duration | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ event=11, end=0, reserved=0, volume=15, duration=2048 a "#" tone at -15dmb volume has been in progress for 2.048 seconds (2048 units of duration at a sampling rate of 1000Hz). 6.2. NOTIFY with several short tones (as in VoiceXML decomposition example) 6.3. NOTIFY with distributed call control 6.4. INFO 7. Simple Implementation on IP phones IP phones only generate DTMF for compatibility with the PSTN. The concepts of volume and minimum duration in this context are irrelevant. Therefore, a simple IP phone MAY a) only support events zero through eleven (most phones do not have keys for ABCD), b) always set the volume to zero, c) only use the default rate, and d) never send an event shorter than 40ms. Long key presses (ex: 2 seconds) MUST still be correctly detected and reported. Accept or refuse SUBSCRIBE messages according to local authorization policy. For example, always accept messages for your SIP peer for an active call. Ignore any parameters in the subscriptions. When key activity occurs, check if there are any subscriptions which correspond to the active "line". If so, send a NOTIFY (to each subscriber for this call-id) once when a "DTMF keypad" key is depressed (set the duration to 40ms). Also, send a NOTIFY with the end bit set, and the approximate duration of the keypress when the Mahy Expires: May 2002 11 SIP Signaled Digits key is released. In both cases, always use the default rate of 8000Hz, and set the volume to zero. This description does not intend to limit implementation to physical telephones with a "DTMF keypad". 8. Security Considerations Signaled Digits may convey private information such as PINs, credit card numbers, or account numbers. UAs SHOULD authenticate these subscriptions. In addition, UAs are encouraged to encrpyt this information using a suitable mechanism as available in SIP (e.g. [PGP]). 8. References [SIP] M. Handley, E. Schooler, and H. Schulzrinne, "SIP: Session Initiation Protocol", RFC2543, Internet Engineering Task Force, Nov 1998. [SDP] M. Handley and V. Jacobson, "SDP: session description protocol," Request for Comments 2327, Internet Engineering Task Force, April 1998. [AVT] H. Schulzrinne, S. Petrack, "RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals", RFC2833, Internet Engineering Task Force, May 2000. [RTP] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 1889, January 1996. [SUB/NTFY] Adam Roach, "Event Notification in SIP", Internet Draft , IETF; January 2001. Work in progress. [RTSP] H. Schulzrinne, A. Rao, and R. Lanphier, "Real time streaming protocol (RTSP)," RFC2326, Internet Engineering Task Force, Apr. 1998. [3pcc] J. Rosenberg, J. Peterson, H. Schulzrinne, "Third Party Call Control in SIP", Internet Draft , IETF; Nov. 2000. Work in progress [INFO] S. Donovan, "The SIP INFO method," Request for Comments 2976, Internet Engineering Task Force, Oct. 2000. Mahy Expires: May 2002 12 SIP Signaled Digits [Java] J. Gosling, B. Joy, G. Steele, "The Java Language Specification," Addison Wesley, 1996. [VoiceXML] VoiceXML Forum, "Voice extensible markup language (voicexml) version 1.00," VoiceXML forum specification, VoiceXML Forum, Mar. 2000. [PGP] D. Atkins, W. Stallings, and P. Zimmermann, "PGP message exchange formats," Request for Comments 1991, Internet Engineering Task Force, Aug. 1996. [RFC2026] S Bradner, "The Internet Standards Process -- Revision 3", RFC2026 (BCP), IETF, October 1996. [RFC2119] S. Bradner, "Key words for use in RFCs to indicate requirement levels," Request for Comments (Best Current Practice) 2119, Internet Engineering Task Force, Mar. 1997. [BNF] D Crocker, P Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC2234, IETF, Nov 1997. 10. Acknowledgments Funding for the RFC Editor is currently provided by the Internet Society. 11. Author's Addresses Rohan Mahy Cisco Systems 170 West Tasman Dr, MS: SJC-21/3 Phone: +1 408 526 8570 Email: rohan@cisco.com Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be Mahy Expires: May 2002 13 SIP Signaled Digits followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Mahy Expires: May 2002 14