TOC 
SIPPING WGR. Mahy
Internet-DraftCisco Systems, Inc.
Expires: December 27, 2002June 28, 2002

Signaled Telephony Events in the Session Initiation Protocol (SIP)
draft-mahy-sipping-signaled-digits-01.txt

Status of this Memo

This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on December 27, 2002.

Copyright Notice

Copyright (C) The Internet Society (2002). All Rights Reserved.

Abstract

This document demonstrates a way for interested SIP User Agents which are not a party to the media of a call or session, to receive SIP event notifications when signaled digits, or other specific telephony-related events are detected. This is useful for a variety of applications that monitor calls for a specific event (e.g.: a long pound, special sequence of digits, or a fax signal) and--only then--take an active role in the monitored calls.



 TOC 

Table of Contents




 TOC 

1. Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119[2]. Throughout this document, the author refers to "DTMF" and other "tones" as audio media. Similar types of information conveyed as signaling are called "digits" or "signals". This convention is consistent with RFC2833[4], which provides a more detailed discussion of the issue.



 TOC 

2. Motivation

RFC2833 "AVT Tones" is widely acknowledged within the IP telephony community as the best way to transport telephony-related tones between end systems which already terminate media using RTP[11]. This approach maintains synchronization of speech audio with tone audio, tolerates loss, provides event duration and volume information, and avoids detection delay. While AVT tones is ideal for conveying telephone-events in the context of straightforward sessions like a 2-party call or simple, centrally mixed conference, there are other environments where additional or alternative requirements are needed. These other environments include protocol translation, complex call control, and decomposed applications.

2.1 Protocol translation

Protocol translators between SIP and other IP protocols which use RTP (ex: H.323[16], MGCP[14], Megaco[15], RTSP[13]) are frequently implemented as a signaling-only entity which arranges for RTP media streams to travel directly between the final endpoints. This is an efficient arrangement in terms of limiting jitter and latency in the media, and allows the translator to support many more simultaneous sessions than if the translator terminated media as well.

The protocol translators may receive telephony-related events (especially signaled digits) via signaling. Likewise, a SIP 3pcc[10] controller, or a protocol translator which uses a traditional CTI (Computer Telephony Integration) protocol for control (ex: TAPI, TSAPI, JTAPI), may receive CTI commands to "insert" digits which may have originated from another application (for example, a desktop call control application). The protocol translator or 3pcc controller may send these signals as RFC2833 media to the target SIP User Agent, or it may want to send a SIP signaled digit instead.

RTP implementations must be able to receive media from more than one source on the same receive port, so it would seem straightforward to send RTP to the target User Agent. This proposal has two problems however. If the target translator and SIP User Agent are separated by a firewall, then it is likely that this traffic from a different IP address will be discarded.

It is also unlikely that most low-end RTP implementations (IP phones, and software User Agents) will render this additional media correctly. What is more problematic is that there is no mechanism to determine if a SIP User Agent can properly insert telephony events received in an RTP stream separate from their other audio media.

This document proposes that the protocol translator send the audio/telephone-event MIME type defined in RFC2833 in the body of an INFO[5] method to the target User Agent, for it to render.

The INFO method means: here is some information relevant to the call. A Content-Disposition header can provide a suggestion to handle information in this situation by rendering that information.

Under this proposal, a User Agent MUST NOT send signaled digits or telephone-events using the INFO method if the event was ever represented as a tone (as media). Only signals originated as pure signaling MAY generate an INFO method. Failure to heed this requirement will result in double-detection of digits/events.

If INFO is used incorrectly by a pair of PSTN gateways (for example), the source gateway may detect a digit, send an INFO request which is lost, and retransmit that request. The target gateway would send the original in-band tone to the PSTN when the audio media arrives, later when the INFO arrives, the target gateway would render the same tone again. It is quite likely that the INFO will arrive after the media tone has been played (especially if multiple intermediaries are involved); the same digit would be played out again and this will frequently cause systems in the PSTN to detect the same digit signaled twice.

2.2 Complex call control

Some applications are interested in the telephony signals represented by telephony tones, but do not desire to be a party to the speech portion of the audio media. This document addresses the transport requirements of these signals in this context. Synchronizing speech is a non-issue in these topologies, as there is no audio media with which to synchronize; SIP provides its own reliability mechanism to prevent loss; and since this proposal reuses the encoding specified in RFC2833, volume and duration are preserved, and detection delay is minimized.

For example, in some application scenarios, a user contacts an application, places a new call in the context of the application (an "outcall"), and returns to the application after the new call is finished. Examples of such scenarios include: Calling card systems, Voicemail or Messaging systems which allows outgoing calls, and Voice Browsers or Voice Portals which allow outgoing calls.

All of these applications require a way for the user to get back to the application if something has gone wrong with the outgoing call (ex: wrong number), or if the user changes his or her mind. If the originating user is using a TDM telephone, or a simple IP endpoint, the application will typically expect a sequence of signaled digits (ex: a pound or hash (#) of long duration, three stars (*) in a row, etc.)

                 +-------------+
                 |             |
                 | Originating |
                 |    User     |
                 |             |
                 +-------------+
                  |         ^ ^
                            | |
           NOTIFY |     SIP | | RTP
                            | |
                  |         | |
                  v         v v
      +-------------+      +-------------+
      |             |      |             |
      | Waiting for |      | Target User |
      |   trigger   |      |  or Service |
      |             |      |             |
      +-------------+      +-------------+

Below are several possible SIP topologies that would enable this type of behavior. Most of these approaches fall into two categories: the application could receive DTMF media corresponding to the signaled digits, or it could receive the signaled digits using SIP.

Below are three approaches to encoding this information as media. None of these approaches are very attractive.

This draft will summarize a few non-media approaches as well:

While this proposal only provides examples using signaled digits, it could be used to detect other telephony-related signals (for example FAX signals, or call progress signals).

Notification of a telephone-event MUST NOT be used to generate a tone (using RFC2833 media or an INFO).

2.3 Decomposition

The extensions in this document also allow for a clean decomposition of some services into media and signaling components. For example, below is a diagram of a VoiceXML browser split into media and non-media handling parts.

                 +-------------+
                 |             |
                 | VoiceXML    |
                 | Interpreter |
                 | (signaling) |
                 +-------------+
                   ^          ^
                   |          |
               SIP |          | RTSP
                   |          |
                   |          |
                   v          v
      +-------------+        +-------------+
      |             |        |             |
      |  SIP UA     |   RTP  | RTSP Server |
      |             |<------>|   (media)   |
      |             |        |             |
      +-------------+        +-------------+

The requirements are almost identical to the requirements for complex call control as discussed in the previous section.



 TOC 

3. Event Package Formal Definition

3.1 Event Package Name

This document defines a SIP Event Package as defined in SIP Events. The event-package token name for this package is: "telephone-event"

3.2 Event Package Parameters

This package defines the following event package parameters: "event", and "duration". The following syntax specification uses the augmented Backus-Naur Form (BNF) as described in RFC-2234[3]. This document adds a new event-package to the definition of event- package in the Event header.

event-package        =  ... / tpackage / token
tpackage             =  "telephone-event" *[ SEMI tparams ]
tparams              =  digit-event-param / duration-param
digit-event-param    =  "events" EQUAL nums *[COMMA nums ]
duration-param       =  "duration" EQUAL num
numbers              =  range / num
range                =  num DASH num

The duration parameter in a SUBSCRIBE indicates the number of milliseconds a signal must exist before it should be reported with a NOTIFY. If the duration parameter is not specified, the default duration is 40ms. The duration parameter MAY NOT appear in a NOTIFY.

The event parameter in a SUBSCRIBE specifies an event mask--the list of events in which the Subscriber is interested. The event parameter and its syntax are defined in RFC2833. The default event mask for this event-package is 0-15. The events parameter MAY NOT appear in a NOTIFY.

3.3 SUBSCRIBE Bodies

This package does not define any SUBSCRIBE bodies.

3.4 Subscription Duration

Subscriptions to this event package MAY range from seconds to days. Subscriptions in minutes or hours are more typical and are RECOMMENDED.

3.5 NOTIFY Bodies

This package re-uses the audio/telephone-event MIME type defined in RFC2833. Each telephone-event consists of a four octet structure. Multiple telephone-events MAY be concatenated.

The structure of the audio/telephone-event MIME type is reproduced here from RFC2833 for the convenience of the reader.

0                   1                   2                   3 
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     event     |E|R|  volume   |         duration              | 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

When used with this event-package, the duration portion of telephone-event MUST be expressed in milliseconds (the rate used for this package is always 1000Hz).

3.6 Subscriber generation of SUBSCRIBE requests

Subscribers SUBSCRIBE to telephone-events for a period of minutes or hours, and automatically attempt to re-SUBSCRIBE when the subscription is half-expired. If re-subscription fails, the Subscriber SHOULD periodically retry again. The Subscriber SHOULD only SUBSCRIBE to events if it is no longer a party to the media of the susbcribed call. Because it is impossible to perfectly synchronize a SUBSCRIBE with a reINVITE or transfer necessary if the subscriber needs to divest itself of the media, it is sufficient for the application to ignore RFC2833 media which it receives while subscribed to this package for the same call.

The application MAY specify an event mask. The specified event mask MUST at least include the default events: 0-15. The subscriber MAY specify a "minimum" duration in milliseconds. The subscriber can be assured that it will not receive any notifications for events which have been in progress for less than this duration. The subscriber MUST still accept "final" events (the end bit is set) with shorter durations.

3.7 Notifier processing of SUBSCRIBE requests

If a SUBSCRIBE request arrives using the same call-leg as an existing call, the Notifier MAY authenticate the subscription request (this is RECOMMENDED). If a SUBSCRIBE request arrives using a new call-leg, the Notifier SHOULD authenticate the subscription request. The Notifier MAY limit the duration of the subscription to an administrator defined amount of time.

3.8 Notifier generation of NOTIFY requests

Immediately after a subscription is accepted, the Notifier MUST send a NOTIFY with no body.

When a telephone-event is detected which matches the event-mask for a subscription and the end of the event is detected or, the Notifier MUST send a NOTIFY message to all subscribers with matching event masks.

For each subscription, when a telephone-event a) is still in progress, b) matches the event-mask for a subscription, c) has occurred for at least the duration specified in a subscription, and d) has not yet triggered a NOTIFY, then the Notifier MUST send a NOTIFY message to that subscriber.

Note that events which are detected and end at approximately the same time (for example a DTMF "9" of 45ms) MUST generate two separate events (ex: detected at 40ms, ended at 45ms), even if their durations are the same (ex: ended at exactly 40ms).

The notifier MUST send a NOTIFY when it detects either the end of a subscribed event, or the continuation of a subscribed event for a sufficient duration. The notifier SHOULD NOT send events outside the subscribed event mask.

If the notifier detects that an event has begun and continued for at least the subscribed duration, it MUST send a NOTIFY for that event. The notifier SHOULD NOT wait for the end of the event. If the notifier detects that an event has ended, it MUST send a NOTIFY for that event, even if that event previously generated a NOTIFY, and even if the event was shorter than the minimum duration requested.

The notifier MUST NOT send the same event three times as required for AVT tones conveyed in RTP. SIP provides its own redundancy mechanism, and without the timestamp header of the RTP packet available in SIP, there would no way to determine if these were duplicate events.

Note that multiple applications may subscribe to signaled digits (possibly with different parameters) for the same call simultaneously. A practical example is a calling card call to a voicemail application during an outcall. The calling card application may wait for a long pound, while the messaging system waits for a different sequence.

The Notifier SHOULD concatenate all unsent events into a single NOTIFY.

A Messaging System SHOULD send a NOTIFY with no body, and an "Expires" header of "0" when the subscribed call-leg is terminated.

3.9 Subscriber processing of NOTIFY requests

Upon receipt of a valid NOTIFY request for this package, the subscriber MUST verify that the event, duration, and volume are of interest to the subscriber. The subscriber MUST also check the end bit.

If the Notifier receives a NOTIFY for an event in the range 0-15, it SHOULD verify that the volume parameter is less than or equal to 36 (at least -36 dbm). It MAY accept an event in this range with a volume parameter as large as 55 (volume as low as -55 dbm). For all other events, the volume MUST be zero.

Most applications will only act either on interim events (end bit is zero), or on final events (end bit is one), but not both. For example, an application which watches for "***" would look for the "*" event (10), the end bit equal to zero, and a duration greater than or equal to 40ms based on the current rate. It could ignore all final events.

Further processing of the event is implementation specific.

3.10 Handling of Forked Requests

Forked requests are not permitted for this event-package.

3.11 Rate of notifications

Note that in SIP, NOTIFY transactions MUST NOT overlap each other. The rate of Notifications is effectively limited by the round trip time between the Notifier and Subscriber. Notifiers MUST buffer and consolidate multiple events that occur while waiting for outstanding transactions to complete. Notifiers MUST NOT generate NOTIFY requests for this package more frequently than once every 50ms. Notifiers MAY buffer NOTIFYs for an administrator defined period of time.

3.12 State Agents

State Agents are not appropriate for usage with this event-package. They MUST NOT be used.

3.13 Behavior of a Proxy Server

There are no additional requirements on a SIP Proxy, other than to transparently forward the SUBSCRIBE and NOTIFY methods as required in SIP.



 TOC 

4. Usage with INFO

To be written.



 TOC 

5. Examples of Usage

5.1 NOTIFY in 3pcc environment

The example below shows a typical scenario used for calling cards. The Application acts as both an ordinary UA and as a 3pcc controller.

  Original         App          Target
    UA                           UA
     |              |             |
     |--INVITE----->|             |
     |<---200-------|             |
     |----ACK------>|             |
     |              |             |
     |<===RTP======>|             |
     |              |             |
     |  ..time..    |             |
     |              |             |
     |<--SUBSCRIBE--|             |
     |----200------>|             |
     |---NOTIFY---->|             |
     |<---200-------|             |
     |              |--INVITE---->|
     |              |<---180------|
     |<--reINVITE---|<---200------|
     |----200------>|             |
     |<---ACK-------|----ACK----->|
     |              |             |
     |<=========RTP==============>|
     |              |             |
     |  ..time..    |             |
     |              |             |
     |---NOTIFY---->|             |
     |<---200-------|             |
     |              |             |
     |<--reINVITE---|----BYE----->|
     |-----200----->|<---200------|
     |<----ACK------|             |
     |              |             |
     |<====RTP=====>|             |
     |              |             |


SUBSCRIBE sip:gateway.itsp.net SIP/2.0
Call-Id: 100@gateway.itsp.net
To: <sip:service@asp.com>
From: <sip:appserver.com>;tag=abcd
CSeq: 1 SUBSCRIBE
Event: telephone-event;duration=2000
Expires: 3600
Content-Length: 0


NOTIFY sip:appserver.com SIP/2.0
Call-Id: 100@gateway.itsp.net
To: <sip:appserver.com>;tag=abcd
From: <sip:service@asp.com>;tag=efgh
CSeq: 5 NOTIFY
Event: telephone-event;rate=1000
Subscription-State: active
Content-Type: audio/telephone-event
Content-Length: 0


NOTIFY sip:appserver.com SIP/2.0
Call-Id: 100@gateway.itsp.net
To: <sip:appserver.com>;tag=abcd
From: <sip:service@asp.com>;tag=efgh
CSeq: 7 NOTIFY
Event: telephone-event;rate=1000
Subscription-State: active
Content-Type: audio/telephone-event
Content-Length: 4

<four octets in network order, corresponding to hexadecimal 0x0B0F0300>
0                   1                   2                   3 
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     event     |E|R|  volume   |         duration              | 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

event=11, end=0, reserved=0, volume=15, duration=2048

A "hash" tone at -15db volume has been in progress for 2.048 seconds (2048 units of duration at a sampling rate of 1000Hz).

5.2 NOTIFY with several short tones

(as in VoiceXML decomposition example)

5.3 NOTIFY with distributed call control

5.4 INFO

INFO sip:8472@translator.itsp.net SIP/2.0
...
Content-Disposition: render
Content-Type: audio/telephone-event
Content-Length: 4
...


 TOC 

6. Simple Implementation on IP phones

IP phones only generate DTMF for compatibility with the PSTN. The concepts of volume and minimum duration in this context are irrelevant. Therefore, a simple IP phone MAY a) only support events zero through eleven (most phones do not have keys for ABCD), b) always set the volume to zero, c) only use the default rate, and d) never send an event shorter than 40ms. Long key presses (ex: 2 seconds) MUST still be correctly detected and reported.

Accept or refuse SUBSCRIBE messages according to local authorization policy. For example, always accept messages for your SIP peer for an active call. Ignore any parameters in the subscriptions.

When key activity occurs, check if there are any subscriptions which correspond to the active "line". If so, send a NOTIFY (to each subscriber for this call-id) once when a "DTMF keypad" key is depressed (set the duration to 40ms). Also, send a NOTIFY with the end bit set, and the approximate duration of the keypress when the key is released. In both cases, always use the default rate of 1000Hz, and set the volume to zero.

This description does not intend to limit implementation to physical telephones with a "DTMF keypad".



 TOC 

7. Security Considerations

Signaled Digits may convey private information such as PINs, credit card numbers, or account numbers. UAs MUST authenticate these subscriptions and authorize them according to local policy. In addition, UAs are encouraged to provide message integrity and encryption of this information using a suitable hop-by-hop or end-to-end mechanism (e.g. TLS[7], SMIME[8]).



 TOC 

8. IANA Considerations

This section serves as the IANA registration for the telephone-events SIP event package.

      Package name: telephone-event

      Type: package

      Contact: [Mahy]

      Published Specification: This document.


 TOC 

9. Open Issues

Should we remove the INFO mechanism and use an explicit REFER to subscribe to the package, or use the Call-Info header with the URI of the appropriate subscription instead?

Should the sampling rate be fixed at 8000Hz, 1000Hz, or variable? If fixed we can remove the rate parameter.



 TOC 

Normative References

[1] Rosenberg, J. and H. Schulzrinne, "SIP: Session Initiation Protocol", draft-ietf-sip-rfc2543bis-09 (work in progress), February 2002.
[2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[3] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997.
[4] Schulzrinne, H. and S. Petrack, "RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals", RFC 2833, May 2000.
[5] Donovan, S., "The SIP INFO Method", RFC 2976, October 2000.
[6] Roach, A., "SIP-Specific Event Notification", draft-ietf-sip-events-05 (work in progress), March 2002.
[7] Dierks, T., Allen, C., Treese, W., Karlton, P., Freier, A. and P. Kocher, "The TLS Protocol Version 1.0", RFC 2246, January 1999.
[8] Dusse, S., Hoffman, P., Ramsdell, B., Lundblade, L. and L. Repka, "S/MIME Version 2 Message Specification", RFC 2311, March 1998.
[9] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with SDP", draft-ietf-mmusic-sdp-offer-answer-02 (work in progress), February 2002.


 TOC 

Informational References

[10] Rosenberg, J., Schulzrinne, H., Camarillo, G. and J. Peterson, "Best Current Practices for Third Party Call Control in the Session Initiation Protocol", draft-ietf-sipping-3pcc-02 (work in progress), June 2002.
[11] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 1889, January 1996.
[12] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998.
[13] Schulzrinne, H., Rao, A. and R. Lanphier, "Real Time Streaming Protocol (RTSP)", RFC 2326, April 1998.
[14] Arango, M., Dugan, A., Elliott, I., Huitema, C. and S. Pickett, "Media Gateway Control Protocol (MGCP) Version 1.0", RFC 2705, October 1999.
[15] Cuervo, F., Greene, N., Rayhan, A., Huitema, C., Rosen, B. and J. Segers, "Megaco Protocol Version 1.0", RFC 3015, November 2000.
[16] "Packet-based Multimedia Communications Systems (includes Annex C - H.323 on ATM)", ITU-T Recommendation H.323v3, September 1999.
[17] Rosenberg, J., "A Framework for Stimulus Signaling in SIP Using Markup", draft-rosenberg-sipping-markup-00 (work in progress), April 2002.
[18] McGlashan, S., "Voice eXtensible Markup Language, Version 2.0", April 2002.


 TOC 

Author's Address

  Rohan Mahy
  Cisco Systems, Inc.
  170 West Tasman Drive
  San Jose, CA 95134
  USA
EMail:  rohan@cisco.com


 TOC 

Full Copyright Statement

Acknowledgement