Network Work Group E. Burger Internet Draft SnowShore Networks, Inc. Document: draft-burger-sipping-kpml-00a.txt Category: Standards Track October 28, 2002 Expires: April 28, 2002 Keypad Markup Language (KPML) Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract Keypad Markup Language (KPML) is a markup language used in conjunction with SIP and HTTP to provide instructions to SIP User Agents for the reporting of user digit presses. Burger Draft - Expires 4/2002 1 KPML October 28, 2002 Table of Contents 1. Conventions used in this document..................................2 2. Introduction.......................................................2 3. Overview...........................................................3 4. Examples...........................................................4 4.1. Monitoring for Octothorpe........................................4 4.2. VoiceXML Digit Collection........................................5 4.3. Dial String Collection...........................................6 4.4. Interactive Digit Collection.....................................7 5. Report Body........................................................8 6. Formal Syntax......................................................9 7. Security Considerations............................................9 8. Open Issues........................................................9 9. References.........................................................9 10. Contributors.....................................................10 11. Acknowledgments..................................................10 12. Author's Address.................................................10 1. Conventions used in this document In the narrative discussion, the "user device" is a User Agent that will report stimulus. An "application" is a User Agent requesting the user device to report stimulus. The "user" is an entity that stimulates the user device. In English, the user device is a phone, the application is an application server or proxy server, and the user presses keys to generate stimulus. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [2]. 2. Introduction This document describes the Keypad Markup Language, KPML. KPML is a markup [3] that enables "dumb phones" to report on basic user key- press interactions. This document refers to a "dumb phone" as a user device that does not have a display. Otherwise, it is actually a rather smart device. KPML requires the user device to be an http [4] client and interpret KPML markup. The name of the markup, KPML, reflects its legacy support role. The public switched telephony network (PSTN) accomplished end-to-end signaling by transporting Dual-Tone, Multi-Frequency (DTMF) tones in the bearer channel. This is in-band signaling. Burger Draft - Expires 4/2003 2 KPML October 28, 2002 The spelunking community already took the name KML for their cave data base interchange format. From the point of view of an application being signaled, what is important is the fact of the stimulus, not the tones used to transport the stimulus. For example, an application may ask the caller to press the "1" key. What the application cares about is the key press, not that there were two cosine waves at 697 Hz and 1209 Hz transmitted. A SIP-signaled [5] network transports end-to-end signaling with RFC 2833 [6] packets. In RFC 2833, the signaling application inserts RFC 2833 signal packets instead of generating tones. The receiving application gets the signal information, which is what it wanted in the first place. RFC 2833 is the only method that can correlate the time the end user pressed a digit with the user's media. However, out-of-band signaling methods, as are appropriate for user device to application signaling, do not need millisecond accuracy. On the other hand, they do need reliability, which RFC 2833 does not provide. An interested application could request notifications of every key press. However, many of the use cases for such signaling has the application interested in only one or a few keystrokes. Thus we need a mechanism for specifying to the user device what stimulus the application would like notification of. 3. Overview KPML is a stateless, declarative markup. A KPML document contains a tag with a series of tags. The tag has a value attribute which is a RFC 3015 (H.248) [7] digit map. NOTE: We use H.248 digit maps instead of MGCP digit maps because the former is an IETF standard and the latter is not. For HTTP reporting, each tag in the markup has an href attribute. When the user enters keypress(es) that match a tag, the user device will issue an http POST to the URI specified by the href. The body of the POST is a report of the actual digits entered. This is so the user device can indicate what digit string matched a pattern with wildcards. It is possible that the page returned by the target URI of the http POST is another KPML document. In this situation, the user device needs to decide what to do with user key presses collected between the time the user device posted the last result and the fetch and interpretation of the next KPML document. Burger Draft - Expires 4/2003 3 KPML October 28, 2002 For many applications, the user device needs to quarantine (buffer) those digits. Some applications use modal interfaces where the first few key presses determine what the following digits mean. For a novice user, the application may play a prompt describing what mode the application is in. However, "power users" often barge through the prompt. KPML provides a barge attribute to the tag. The default is "barge=yes". Enabling barge means that the user device buffers digits and applies them immediately when the next KPML document arrives. Disabling barge by specifying "barge=no" means the user device flushes any collected digits before collecting more digits and comparing them against the tags. NOTE: Quarantine and barge are separate actions. However, the barge action directly determines the quarantine action. Thus KPML only specifies the barge action request. If the user presses a key not matched by the tags, the user device discards the key press from consideration against the current or future KPML documents. However, as described above, once there is a match, the user device quarantines any keys the user enters subsequent to the match. Because it is not possible to know if the signaled digits are for local KPML processing or for other recipients of the media stream, the user device transmits the digits to the far end in real time, using either RFC 2833 or by generating the appropriate tones. NOTE: If KPML did not have this behavior, then a user device executing KPML could easily break called applications. For example, take a personal assistant that uses "*9" for attention. If the user presses the "*" key, KPML will hold the digit, looking for the "9". What if the user just enters a "*" key, possibly because they accessed an IVR system that looks for "*"? In this case, the "*" would get held by the user device, because it is looking for the "*9" pattern. The user would probably press the "*" key again, hoping that the called IVR system just didn't hear the key press. At that point, the user device would send both "*" entries, as "**" does not match "*9". However, that would not have the effect the user intended when they pressed "*". 4. Examples 4.1. Monitoring for Octothorpe A common need for pre-paid and personal assistant applications is to monitor a conversation for a signal indicating a change in user focus from the party they called through the application to the Burger Draft - Expires 4/2003 4 KPML October 28, 2002 application itself. For example, if you call a party using a pre- paid calling card and the party you call redirects you to voice mail, digits you press are for the voice mail system. However, many applications have a special key sequence, such as the octothorpe (#, or pound sign) or *9 that terminate the called party leg and shift the user's focus to the application. Figure 1 shows the KPML for long octothorpe. Figure 1 - Long Octothorpe Example In this example, the parameter "session=19fsjcalksd" associates the http POST with the SIP call session. One can use other methods to associate the POST with a SIP call. The following examples will show these various methods. The regex value Z indicates the following digit needs to be a long- duration key press. F, from the H.248 DTMF package, is the octothorpe key. In fact, KPML supports all digits, 1-9, *, #, A-D from the H.248 DTMF package. 4.2. VoiceXML Digit Collection One could imagine a VoiceXML platform that wants to have the user device signal the user's key presses, while the VoiceXML platform still streams prompts to the user device. Of course, by definition, the VoiceXML platform receives all of the user device's media. This is because the user hears prompts from the VoiceXML platform and the platform hears all of the user's utterances (e.g., for recording a message). However, let us say that the VoiceXML platform would like to receive the stimulus in http, rather than in RFC 2833. Moreover, the KPML mechanism enables the user device to immediately barge the prompt, saving at least a round-trip-time of latency. In this example, a VoiceXML script builds a menu. The VoiceXML interpreter has pulls out a grammar definition similar to the following. Burger Draft - Expires 4/2003 5 KPML October 28, 2002 For sports press 1, For weather press 2, For Stargazer astrophysics press 3. To speak to a person press 0. Figure 2 - VoiceXML Code Figure 3 - VoiceXML KPML Example Note the targets of the href's are opaque strings that have meaning only to the VoiceXML platform. 4.3. Dial String Collection In this example, the user device collects a dial string. The application uses KPML to quickly determine when the user enters a target number. In addition, KPML indicates what type of number the user entered. Burger Draft - Expires 4/2003 6 KPML October 28, 2002 Figure 5 - IVR KPML Example The target of the http post, "sess$9aej08asd7", identifies the SIP session. NOTE: It is unclear if this usage of KPML is better than using a device control protocol like H.248. From the application's point of view, it has to do the low-level prompt-collect logic. Granted, it is relatively easy to change the key mappings for a given menu. However, often more of the call flow than a given menu mapping gets changed. 5. Report Body For HTTP or SIP responses, the body of the response from the user device is a KPML response form. The tag has an attribute, "digits". The digits attribute is the digit string. The digit string uses the conventional characters '*' and '#' for star and octothorpe respectively. X shows a sample response body to the example in section 4.3. Figure 6 - Response Body Burger Draft - Expires 4/2003 8 KPML October 28, 2002 6. Formal Syntax The following syntax specification uses the augmented Data Type Definition (DTD) as described in XML [3]. 7. Security Considerations KPML presents no further security issues beyond the startup issues addressed in the companion documents to this document. 8. Open Issues o Describe SIP mechanism fully. o Timeout and inter-digit timing (platform-specific, right now). o Should we return entered digits in message body or URI? Body makes SIP & HTTP XML identical; URI may be easier for some application servers to parse. 9. References 1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. INFORMATIVE 2 Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. NORMATIVE 3 Bray, T. et. al., "Extensible Markup Language (XML) 1.0 (Second Edition)", W3C Recommendation, http://www.w3.org/TR/REC-xml, October 2000. NORMATIVE Burger Draft - Expires 4/2003 9 KPML October 28, 2002 4 Fielding, R. et. al., "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. INFORMATIVE 5 Rosenberg, J. et. al., "SIP: Session Initiation Protocol", RFC 3261, June 2002. INFORMATIVE 6 Schulzrinne, H. and Petrack, S., "RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals", RFC 2833, May 2000. INFORMATIVE 7 Cuervo, F. et. al., "Megaco Protocol Version 1.0", RFC 3015, November 2000. NORMATIVE 10. Contributors Robert Fairlie-Cuninghame, Cullen Jennings, Jonathan Rosenberg, and I were the members of the Application Stimulus Signaling Design Team. All members of the team contributed significantly to this work. In addition, Jonathan Rosenberg postulated DML in his "A Framework for Stimulus Signaling in SIP Using Markup" draft. This version of KPML has significant influence from MSCML, the SnowShore Media Server Control Markup Language. Jeff Van Dyke, Andy Spitzer, and Walter O'Connor were the primary contributors to that effort. That said, any errors, misinterpretation, or fouls in this document are my own. 11. Acknowledgments 12. Author's Address Eric Burger SnowShore Networks, Inc. 285 Billerica Rd. Chelmsford, MA 01824-4120 USA E-mail: eburger@snowshore.com Phone: +1 978/367-8400 Burger Draft - Expires 4/2003 10 KPML October 28, 2002 Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement The Internet Society currently provides funding for the RFC Editor function. SnowShore Networks, Inc. is a member of the Internet Society. Burger Draft - Expires 4/2003 11