Document: draft-engelsma-dmsp-04.txt
Reviewer: Paul Kyzivat [pkyzivat@cisco.com]
Review Date:  4/13/2008
Review Due:  4/21/2008
Review Type: Cross Area


Summary:
* This draft is on the right track but has open issues, described in the review.

=============

General Comment:

It appears that this *ought* to be a normative document, but there is virtually no normative language (RFC 2119) in it. Thus it at least needs a major rewrite. Part of this document is an architecture, and part is the specification of a protocol. These might be served better as separate documents, though they perhaps could remain as one. I see that one of the references (http://www.w3.org/TR/mmi-arch/) is an architecture document that overlaps with this to some extent. But it is much more general so it doesn't obviate the need for something new. That document is also getting old now, and it is only referenced informatively. There is another architecture document referenced ("OMA Multimodal and Multi-device Enabler Architecture") but I don't have access to it.

Section 2:
    ...  The aim of multimodal user interfaces is to enhance
    the usability of mobile devices.

    The term "multimodal interface" in the context of this specification
    refers to augmenting the existing graphical user interface (GUI) of
    mobile devices with a voice user interface (VUI).  In this way, the
    strengths of one modality offset the weaknesses of the other.  ...

I find it a little disconcerting that the term "multimodal interface"
has been co-opted for such a narrow meaning. While the goal seems good, it might be better to come up with a different term to name it.
Multimodal interfaces in general need not include voice, and need not involve multiple devices.

    ...  The
    scope of this specification is limited to one particular
    architectural configuration within this framework: the coordination
    of a GUI browser or application running on a mobile device with a
    VoiceXML browser running in the network.

Unless the proposed mechanism is much broader than this we need a name that is consistent with this scope.

I am only complaining about terminology here. The problem being addressed seems to be a significant one for which a solution would be valuable.

Section 4.1:

Use of unacknowledged events seems dubious. What happens if they are lost, or not understood? Are you perhaps assuming that they are acknowledged at the transport layer, but not at the application protocol layer? If so, that becomes a requirement on the transport.

I agree with Eric Burger that justification for two encodings is dubious. If you need the binary encoding, then you might as well use it universally.

Section 4.2.2.4

    Reserved ranges are left for implementations to provide response
    messages for additional primitive and complex types and error codes.

These are reserved for *implementations*??? You mean for proprietary use? There is no reservation for future *standard* use? What is the plan for future enhancements to this protocol? Something ought to be said about this.

Section 4.2.2.8:

The prior section defined "Field". This section now defines a new structure with a column entitled "Field", that contains several rows, one of which is called "Fields". It appears that the Field named Fields is actually of type Field, while the Fields named "Score", "Raw Utterance" and "Field Count" are not of type Field.

The terminology could stand to be improved to help the confused.

I would suggest that the name of a tuple in a structure be called something other than an a "Field" - perhaps "Row Name", or "Tuple". But anything that makes it less confusing would be fine.

Section 4.2.3.1:

There is no explanation of how session IDs should be constructed in order to guarantee that they are unique. What is the scope within which these must be unique?

There is also no explanation of how values of the User Agent field are to be constructed to endure that the same value isn't used to mean two different things. The section says this is "analogous" to the User Agent in HTTP, but it doesn't say it is the same format. It suggests one can infer properties from the value, but this isn't possible without some standardization of the format.

The SIG_INIT message is used both to initiate the protocol and to ack that initiation. I seems like this could potentially be ambiguous. Both sides may decide to initiate and send a message at about the same time.
When each received a SIG_INIT it will be ambiguous if it should send another in response. After reading further, this is apparently resolved by assigning distinct roles to the two parties in the session, that are running different state machines. It seems there could be a problem if two nodes that thought they were playing the same role were accidentally connected. That may not be possible if the transport session is always established in one direction, but that part isn't included in this draft. In my opinion it would be helpful to clarify how the roles of the participants is determined. Or else change the message types so there isn't any ambiguity.

Section 4.2.5.1

This carries a row called "Response To", which is described as

    Response To:  The sequence number to the corresponding command that
       is being confirmed.

but none of the commands contain anything called "sequence number". They do all contain something called "Correlation". I presume this is what is meant. But the terminology ought to be aligned. Also, there is never any specification of how correlation values are assigned, how long they are valid, etc.

Section 4.2.6.11

The names of custom events are simply strings. There is no guidance on how these names may be assigned to avoid accidental collisions. Nor do I understand how one would negotiate to determine if a peer supports these custom events.

Section 4.3 and Appendix A:

I didn't review these since I am not an expert in XML. But as noted above I can see no merit in supporting the XML form in addition to the binary form.

Section 4.4:

    ...  Responses to the CMD_EXEC_FORM (i.e.
    either the RESP_ERROR or RESP_OK) are processed only if their
    sequence number matches what the GUA state machine is expecting.  If
    the sequence numbers match, the response is processed and the GUA
    state machine transitions to the next state.  Responses that do not
    have a sequence number matching with what the GUA is expecting are
    ignored and there state transition is effected.

What are the circumstances under which an unmatching response is received? Can this happen with properly functioning participants and transport? If not, why specify this? If so, what are the impacts of ignoring the response?

Section 4.4.1.3:

The state machine model is insufficient to represent the intermediate states resulting from individual ADD_LISTENER requests. Its not clear what happens if no response is received for one of these. Similarly for REMOVE_LISTENER. Must state be retained for every listener active?
Similarly, how long must the upper layer wait for a response to one of these requests? What happens if a LOAD_DOC is executed while there are unanswered ADD/REMOVE requests outstanding?

Section 4.4.2.3:

Similar comment to 4.4.1.3 - this can't be fully modeled as a state machine. Where is the extra state kept?

Section 5:

Woefully inadequate. There is no indication of the required properties of the underlying transport. Presumably it must be reliable and order preserving. How are errors to be reflected to the state machines? This much should be specified.

There is also no discussion of connection establishment and verification of roles in the protocol. Which party is responsible for establishing the transport connections? There is an assumption that the parties at two ends of a transport have are running complementary state machines, but there is no specified mechanism for determining this. Further specification seems necessary.

There is also no provision for extensibility in this protocol. There is a version number field, but no provision for two sides running different versions. Nor is there much if any provision for additions to any of the messages. So the first time there is need for any small addition there will be need for a new version and no interop between old and new.

Section 7:

The security consideration are feeble, but I'll defer to a security person for detail on what ought to be present.