TOC 
SIPPING WGR. Mahy
Internet-DraftCisco Systems, Inc.
Expires: December 21, 2003June 22, 2003

Conveying Tones in the Session Initiation Protocol (SIP)
draft-mahy-sipping-tones-00.txt

Status of this Memo

This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on December 21, 2003.

Copyright Notice

Copyright (C) The Internet Society (2003). All Rights Reserved.

Abstract

One of the major applications of the Session Initiation Protocol (SIP) is in Internet telephony. In this context it is often useful or convenient for a SIP entity to request another SIP User Agent generate some type of tone, without generating this tone as part of a session. This document describes how SIP can be used without modification to carry such tones and explores issues relating to their use. Finally this document defines a new MIME type which is useful for conveying computer generated tones.



 TOC 

Table of Contents




 TOC 

1. Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119[2].



 TOC 

2. Overview

One of the major applications of SIP[1] is in Internet telephony. In this context it is often useful or convenient for a SIP entity to request another SIP User Agent generate some type of tone, without generating this tone as part of a session. These tones are presented as MIME content in the body of a SIP message or as indirected content[7]. As such, these tones can be conveyed without any modifications to SIP whatsoever.

This document makes use of the audio/telephone-events MIME type defined in RFC2833[5] and also defines a new MIME type (audio/tone-info+xml) to carry a textual, semantic description of tone information in XML, which is suitable for tone generators. When tones carried in the body of the message are used with the Alert-Info header, these tones are correlated to the appropriate body using the cid: URI and the Content-ID MIME header from RFC2045[3] using the grammar used in the Referred-By Mechanism[16].

Although a variety of tones with well-defined semantics already exist, use of this mechanism may be especially useful to convey special ringback tones or country-specifc cadences without encouraging the proliferation of early media[17]. Of course, playing tones suggested by another has a variety of security consequences. Also, the use of certain tones may leak potentially private information (such as the home country of the callee). Both privacy and security issues are discussed in the Security Considerations Section.



 TOC 

3. Example Tone Scenarios

3.1 Playing Ringback with a 180 Response

Perhaps the simplest tone scenario involves using the Alert-Info in a 180 Ringing response to provide a specific sound file used to render target-country specific ringback.

SIP/2.0 180 Ringing
...
Alert-Info: <https://server.example.net/ringback-france.wav>

Note that ringing tone or ringback tones specified in an Alert-Info header SHOULD be repeated for as long as local ringback would have been generated. In the future it may be desirable to define an explicit Alert-Info header parameter which indicates if the file should be repeated or not.

Our next example requests playing special (call waiting) ringback using inline MIME content of type audio/telephone-event from RFC2833[5]. The body of the SIP message consists of 4 octets of binary data. A UAS MUST NOT include this type of content in a 180 response unless support for audio/telephone-event MIME tye and the appropriate event(s) are advertised in an Accept header in the correspinding INVITE request. (For example: Accept-Content: application/sdp, audio/telephone-event;events="0-11,70,71" )

Note that this MIME type was originally defined only or use in RTP[12], so its use here may be considered irregular. The community should carefully consider if reuse of this MIME type if appropriate for the usage described here.

early media ringback :  

SIP/2.0 180 Ringing
Alert-Info: <cid:foo@bar>
Content-Type: audio/telephone-event   
Content-ID: <foo@bar>
Content-Length: 4

  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |     event     |E R| volume    |          duration             |
  |       71      |0 0|     3     |             4000              |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   

This next example conveys a US ringback tone from the XML tone description file defined in the Formal Syntax section of this document. Note also that in this example, the 180 also contains an SDP[9]offer or answer[13], so the tone description is part of a multipart MIME[4] body. Notice that the Content-ID MIME header is not a SIP header and so it is included inside the specific MIME part which carries the tone description.

SIP/2.0 180 Ringing
Alert-Info: <cid:foo@bar>
Content-Type: multipart/mixed;boundary=godzilla
Content-Length: xxx

--godzilla
Content-Type: audio/tone-info+xml 
Content-Disposition: render;handling=optional
Content-ID: <foo@bar>
Content-Length: yyy

<?xml version="1.0" ?>
<tones repeat="true">
  <tone>
    <modulation hz="0"/>
    <volume db="-3"/>
    <duration ms="2000"/>
    <frequency hz="440"/>
    <frequency hz="480"/>
  </tone>
  <tone>
    <modulation hz="0"/>
    <volume db="-63"/>
    <duration ms="4000"/>
    <frequency hz="0"/>
  </tone>
</tones
--godzilla
Content-Type: application/sdp
Content-Disposition: session;handling=required
Content-Length: zzz

<SDP contents>
--godzilla

Other mechanisms for suggesting country-specific ringback have been suggested as far back in the past as IETF 47 (Adelaide). Specifically, in a long-expired draft, Adam Roach proposed an extension which carried the ISO country code of the appropriate ringback tone.

3.2 Playing Ringback and other tones as a result of 3pcc

In the following flow, Alice receives an invitiation from a Third-Party Call Control[8] (3pcc) controller. Alice may have been invited due to a click-to-dial service or some type of scheduled callback or reminder service. Alice answers immediately, but when the Controller invites Bob, he does not answer immediately. It is desirable for Alice to receive an indication that her call to Bob is ringing. The Controller could negotiate a session with Alice to just provide a stream of RTP media to provide ringback, but this requires that the Controller be able to provide such media and deal with all the security and middlebox traversal issues associated with RTP for a simple tone which is transient in nature. Worse yet, in this flow Bob is willing to negotiate session details before answer which ordinarily would minimize or eliminate clipping of Bob's initial "Hello". However, if the Controller negotiates a session with Alice to send early media, the Controller must setup a session with Bob and discard or relay Bob's media which is likely to introduce clipping.

Instead, the Controller sends an UPDATE[11] request to Alice which contains the tone description file from the previous example. (It could have been a more traditional audio file instead, such as a wave file.) The UPDATE also contains a Reason header[6] which provides additional information about the cause of the UPDATE request. Alice's UA plays the tone until media is received from Bob. After the successful offer/answer exchange with Alice, the controller sends a PRACK[10] request to acknowledge Bob's reliable 180 response.

   

  Alice              Controller                 Bob
  |(1) INVITE offer1      |                       |
  |no media               |                       |
  |<----------------------|                       |
  |(2) 200 answer1        |                       |
  |no media               |                       |
  |---------------------->|                       |
  |(3) ACK                |                       |
  |<----------------------|                       |
  |                       |(4) INVITE no SDP      |
  |                       |---------------------->|
  |                       |(5) 180 Ringing offer2 |
  |                       |<----------------------|
  |(6) UPDATE offer2'     |                       |
  |    and ringback tone  |                       |
  |<----------------------|                       |
  |(7) 200 answer2'       |                       |
  |---------------------->|(8) PRACK answer2      |
  |                       |---------------------->|
  |                       |(9) 200 OK (PRACK)     |
  |                       |<----------------------|
  |                       |                       |



UPDATE sip:alice@a.example.com SIP/2.0
...
Reason: SIP;cause=180
Content-Type: multipart/mixed;boundary=gorilla
Content-Length: xxx

--gorilla
Content-Type: audio/tone-info+xml 
Content-Disposition: render;handling=optional
Content-Length: yyy

(tone description file from the previous example)
--gorilla
Content-Type: application/sdp
Content-Disposition: session;handling=required
Content-Length: zzz

(SDP contents of offer 2')
--gorilla
   
   
   

3.3 Playing other tones

The list below provides a non-exhaustive list of other tones (from RFC2833[5]) which may be provided during certain services. This list represents events 85, 80, 79, and 84 respectively of audio/telephone-event.

Calling card service tone: The calling card service tone consists of 60 ms of the sum of 941 Hz and 1477 Hz tones (DTMF '#'), followed by 940 ms of 350 Hz and 440 Hz (U.S. dial tone), decaying exponentially with a time constant of 200 ms.

Pay tone: The caller, at a payphone, is reminded to deposit additional coins.

Intrusion tone: The call is being monitored, e.g., by an operator.

Call waiting tone: Another party wants to reach the subscriber.

Note that there does not seem to be any good reason to indicate the Call Waiting tone remotely rather than directly at the User Agent and therefore sending this tone over SIP to implement Call Waiting is inadvisable.

In the example that follows, a pre-paid calling card application sends a calling card service tone to a SIP UA, along with instructions to collect digit stimulus using the App-Info header[14]and the Keypad Markup Language[15](KPML). It provides the tone description as indirect content[7]. These tones could be sent as an audio/telephone-event body instead.

UPDATE sip:a.example.com SIP/2.0
...
App-Info: <http://app.example.net/collect-digits.kpml>
Content-Length: xxx
Content-Type: message/external-body; access-type="URL";
                       expiration="Tue, 24 July 2003 09:00:00 GMT";
                       URL="http://app.example.net/calingcard.xml"
                       
Content-Type: audio/tone-info+xml
Content-Disposition: render;handling=required

   

After authorizing the request, the UA fetches the following tone description file from the http: URL in the UPDATE, and renders this "be-dong" tone to the user.

<?xml version="1.0" ?>
<tones repeat="false">
  <tone>
    <modulation base-hz="0"/>
    <volume db="-3"/>
    <duration ms="60"/>
    <frequency hz="941"/>
    <frequency hz="1477"/>
  </tone>
  <tone>
    <modulation base-hz="0"/>
    <volume db="-3"/>
    <duration ms="940"/>
    <frequency hz="350"/>
    <frequency hz="440"/>
    <decay type="exponential" scale="200"/>
  </tone>
</tones


 TOC 

4. Formal Syntax

The following is an XML Schema description of the audio/tone-info+xml syntax.

repeat indicates if the tone sequence repeats continuously or plays only once.

modulation and frequency are measured in Hz. Up to 5 simultaneous frequencies are allowed.

duration and decay scale are measured in milliseconds

volume is measured in decibels from line level. The range -3 to -63 is recommended for consistency with RFC2833.

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="tones">
      <xs:complexType>
         <xs:sequence>
            <xs:element maxOccurs="unbounded" minOccurs="1" 
                name="tone" type="toneType"/>
         </xs:sequence>
         <xs:attribute name="repeat" type="xs:boolean" 
                 use="required"/>
      </xs:complexType>
   </xs:element>
   <xs:complexType name="toneType">
      <xs:sequence>
         <xs:element maxOccurs="1" minOccurs="0" 
              name="modulation" type="modulationType"/>
         <xs:element maxOccurs="1" minOccurs="1" 
             name="volume" type="volumeType"/>
         <xs:element maxOccurs="1" minOccurs="1" 
             name="duration" type="durationType"/>
         <xs:element maxOccurs="5" minOccurs="1" 
             name="frequency" type="frequencyType"/>
         <xs:element maxOccurs="1" minOccurs="0" 
             name="decay" type="decayType"/>
      </xs:sequence>
   </xs:complexType>
   <xs:complexType name="modulationType">
      <xs:attribute name="hz" type="xs:decimal" use="required"/>
   </xs:complexType>
   <xs:complexType name="volumeType">
      <xs:attribute name="db" type="xs:integer" use="required"/>
   </xs:complexType>
   <xs:complexType name="durationType">
      <xs:attribute name="ms" type="xs:integer" use="required"/>
   </xs:complexType>
   <xs:complexType name="frequencyType">
      <xs:attribute name="hz" type="xs:integer" use="required"/>
   </xs:complexType>
   <xs:complexType name="decayType">
      <xs:attribute name="type" use="required">
         <xs:simpleType>
            <xs:restriction base="xs:string">
               <xs:enumeration value="none"/>
               <xs:enumeration value="linear"/>
               <xs:enumeration value="exponential"/>
            </xs:restriction>
         </xs:simpleType>
      </xs:attribute>
      <xs:attribute name="scale" type="xs:integer" use="optional"/>
   </xs:complexType>
</xs:schema>



 TOC 

5. Security Considerations

TODO



 TOC 

6. IANA Considerations

TODO - MIME registration for audio/tone-info+xml



 TOC 

7. Acknowledgments



 TOC 

Normative References

[1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002.
[2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997 (TXT, HTML, XML).
[3] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996.
[4] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996.
[5] Schulzrinne, H. and S. Petrack, "RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals", RFC 2833, May 2000.
[6] Schulzrinne, H., Oran, D. and G. Camarillo, "The Reason Header Field for the Session Initiation Protocol (SIP)", RFC 3326, December 2002.
[7] Olson, S., "A Mechanism for Content Indirection in Session Initiation Protocol (SIP) Messages", draft-ietf-sip-content-indirect-mech-03 (work in progress), June 2003.
[8] Rosenberg, J., Schulzrinne, H., Camarillo, G. and J. Peterson, "Best Current Practices for Third Party Call Control in the Session Initiation Protocol", draft-ietf-sipping-3pcc-03 (work in progress), March 2003.


 TOC 

Informational References

[9] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998 (TXT, HTML, XML).
[10] jdrosen@dynamicsoft.com and schulzrinne@cs.columbia.edu, "Reliability of Provisional Responses in Session Initiation Protocol (SIP)", RFC 3262, June 2002.
[11] Rosenberg, J., "The Session Initiation Protocol (SIP) UPDATE Method", RFC 3311, October 2002.
[12] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 1889, January 1996.
[13] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002.
[14] Jennings, C., "SIP Support for Application Initiation", draft-jennings-sip-app-info-00 (work in progress), October 2002.
[15] Burger, E., "Keypad Markup Language (KPML)", draft-burger-sipping-kpml-01 (work in progress), March 2003.
[16] Sparks, R., "The SIP Referred-By Mechanism", draft-ietf-sip-referredby-01 (work in progress), February 2003.
[17] Schulzrinne, H. and G. Camarillo, "Early Media and Ringback Tone Generation in the Session Initiation Protocol", draft-camarillo-sipping-early-media-01 (work in progress), February 2003.


 TOC 

Author's Address

  Rohan Mahy
  Cisco Systems, Inc.
  101 Cooper Street
  Santa Cruz, CA 95060
  USA
EMail:  rohan@cisco.com


 TOC 

Intellectual Property Statement

Full Copyright Statement

Acknowledgement