Document: draft-ietf-mediactrl-sip-control-framework-02 Reviewer: Ben Campbell [ben@estacado.net] Review Date: 5/20/2008 Review Deadline: 5/19/2008 Review Type: Initial Review Summary: My overall review is that this document is either "on the right track with open issues", or "has serious issues and should be rethought", depending on the response to a couple of critical concerns. --Critical Concerns: 1) It is not clear to me why the media control protocol needs to be negotiated via SIP. SIP primarily offers rendezvous and media negotiation. I'm not sure why this protocol needs either. There are several mentions of rendezvous, but no discussion of why it is needed. Are application servers or media servers likely to move around, or have dynamically assigned IP addresses, etc? Do we need registration, forking, etc? Or more to the point, why can't a media control client just open the connection to the server directly, rather than going through an SDP offer/answer? What makes it different from thousands of other protocols that get by just fine without SIP? I'd like to at least see motivation for this discussed in the draft. 2) Why do we need an entirely new protocol for this? Could this not have been done with one of many existing protocols that allow us to send requests with MIME bodies and get responses? (e.g. HTTP)? I assume the work group thought about this, and has good reasons. It would help to have the reasoning documented in the draft. --Lesser Substantive Comments: Section 1, paragraph 1: "It is intended that the framework contained in this document will be applicable for a variety of device control scenarios." That's very open ended. Can the scope be defined more clearly? I hope that requirements were not added for this framework simply because some device control scenario might someday need them. paragraph 2: "This document does not define a SIP based extension that can be used directly for the control of external components. The framework mechanism must be extended by other documents that are known as "Control Packages" " I'm confused by the words "SIP based" in this paragraph. paragraph 4: "Application servers traditionally use SIP third party call control [RFC3725] to establish media sessions from SIP user agents to a media server." Some do, some don't. Is this attempting to suggest that they should use 3PCC, or merely acknowledging it? I propose striking the paragraph. Section 2, definition of Transaction-Timeout: Is it safe to hard-code the transaction timeout? How was this value chosen? Section 3, paragraph 1: Is the requirement here really for SIP, or for SDP offer/answer? I know the answer might be that there aren't many such protocols, but for example we went to some trouble in the MSRP work to avoid making it SIP specific. Also, the sentence structure is easily read to imply you mean to use SIP to control an external server, rather than use SIP to establish a channel to control the external server. Section 3, List of arguments for SIP 3rd bullet: This spec talks later about auditing parameters using _this_ protocol-- how does SIP contribute to that? Section 4.1: "( The UAC MAY include a valid session description (an 'offer' as defined in [RFC3264]) in an INVITE request using the Session Description Protocol defined in [RFC4566] (*note - SIP also allows an 'offer-less' INVITE which is also maintained by this specification)." While SIP allows this, it a common source of interop orbits. Unless there is a a specific need otherwise, I suggest the MAY should be a SHOULD. "[cfw-id] MUST contain an appropriately random value that will not clash with other offer/answer exchanges that will take place and is globally unique over space and time." Can you provide more precise guidance on what level of randomness is needed? (i.e. how many bits of randomness, etc.) "The client generating the offer should act as it would normally on receiving this response, as per [RFC3261]." Should that be normative? (I think you just stated a requirement that the client does what the client does.) "Media streams can also be rejected by setting the port to "0" in the "m=" line of the session description." Need to be explicit that this is in the _answer_. (In general, watch for imprecise language around offers and answers.) " A client using this specification should be prepared to receive an answer where the "m=" line it inserted for using the Control Framework has been set to "0"." How should it behave when this happens? Section 6, second bullet point: Is the dialog-id correlated with a dialog, or a media description? Can we have more than one dialog-id values in a single SDP document? Section 6.1, first paragraph: You need more guidance on the uniqueness requirements for the transaction identifiers. Paragraph 4: What does a client do when a transaction times out? Is that specific to the package? Section 6.2, last paragraph: The draft should say more than "might result in a resubmission". Who gets to decide? Is this package specific? Section 6.3, 2nd paragraph: Need a little more about how extension methods are handled. Are they package specific? What does a device do if it gets a method it does not understand. 6.2.1, first paragraph: What goes in Content-Type? Does it use MIME media types? Do we need to allow other MIME headers than Content-Type and Content-Length? 6.3.2.1, paragraph 2: What if a client does not want to wait as long as is indicated in the 202? Does it have any recourse? paragraph 3: Do the range suggestions for Timout in REPORT also apply to the value in a 202 response? paragraph 5: Is there any use for a terminating REPORT message without a body? Is there a header where it can put the results of the original request? 6.3.3, first paragraph: Is support of the outbound keep-alive mechanism a requirement for devices implementing this framework? 2nd paragraph: How would you indicate support for such future keep-alive mechanisms? 6.4.3.1: Isn't it true that the "active" participant always sends the sync? If so, then this list could be greatly simplified by simply saying something to the affect that "The active party as negotiated via COMEDIA sends the SYNC..." 7.2.2: ben: Is the "completion of a successful transaction" part really true? Can we have responses along the line of "Trying...still trying...oops, that failed"? 7.9 "Recipient already has a transaction with the same transaction ID." 7.10 Is the first type of 481 response only for REPORT requests? 7.11: How is 500 as defined practically different from 400? 8.1 What is the practical use of the version number? I don't recall mention of version numbers in the package support negotiation. 8.3: This section seems to need elaboration. What do you mean by media dialog and Conference reference? Also, need more elaboration on what sort of MUST strength statements. 8.4: Does the Control message body section need to discuss MIME media types? 8.6: Can you define what you mean by the auditing of a control package properties? And how is this different from defining any other operation in a package? 9.1: Is Content-Length missing from the expansion of "header-name"? 11.1: Need more here--what are you actually recommending? Also, need to address the fact that the id parameter in the SDP is crucial to determining that the correct host is connecting. 11.2, last paragraph: Need to talk more about TLS authentication here. What is actually being authenticated? What certificate fields are checked? Can you use self-signed certs (maybe with fingerprints sent in SIP) 11.3 In my opinion, section 12.3 needs a complete rewrite. I really cannot tell what it attempts to say other than something to the effect of "You can authorize particular actions based on the authenticated identity of the sender, and send 403 if an action is not allowed. --Nits and Editorial Comments: Global: There's lots of use of passive-voice language that obscures which device is responsible for some particular action. For example, "... the control-channel is terminated" would be better phrased as "The [UAC or UAS] terminates the control channel. All of the bullet lists are missing white space between entries. This makes them hard to read, at least for me. Section 1, paragraph 1: It would help to elaborate on what you mean by "logic" and "processing" and why they are different. Section 4.1, first paragraph: Sentence is a little confusing. I assume you mean to say something to the effect of "when a UAC wishes to establish a control channel, it MUST construct and transmit INVITE..." "A non-2xx class error (4xx, 5xx and 6xx) SIP response received for the INVITE request indicates that no SIP dialog has been created and is treated as specified [RFC3261]" s/error/"final response" 4.2, general I think this section would be more clear if written in terms of SDP offers and answers, rather than SIP messages, given that the offers and answers do not always show up in the same SIP messages. " If the UAS does not support the extension defined in this document, as identified by the media contained in the Session Description, it SHOULD respond as detailed in [RFC3261] with a "SIP 488" response code. If multiple media descriptions exist it MAY choose to continue processing the request and mark the port field equal to "0"." You can't really put normative requirements on devices that _don't_ implement this protocol. This would be better written as a non- normative statement of what SIP devices that do not implement this spec would do. Section 5, second paragraph: It's not clear to me what the "(B2BUA functionality)" means in context. Third paragraph: I don't understand this paragraph. Can you rephrase? Section 6.2, paragraph 2, 2nd sentence: The sentence is hard to follow. "...will return a 202 status code..." s/return/"result in" Section 6.2, first sentence: s/messages/requests 6.3.4.2, third paragraph: I assume the 422 response means there is no package supported _in_common_, not that no package is supported at all, right? 9.1: The ABNF is inconsistent in handling of literal strings. For example, method names are spelled out in text, but header names are not. 10, step 6: Didn't the TCP connection get opened in step 4? 11.3: paragraph 1: I don't recall a discussion of failover and redundancy. 12.1: What RFC tracks are appropriate for package definitions? 12.4: Is "Content-Length" missing?