Draft: draft-ietf-sipping-race-examples-01.txt Reviewer: Vijay K. Gurbani Review Date: 18 May 2007 Review Deadline: 18 May 2007 (Extended) Status: WGLC Summary: This draft is on the right track but has open issues, described in the review. This is a good draft. It attempts to fill in the blanks of rfc3261 with respect to some race conditions. It should be helpful to implementors. Not sure whether it makes sense to hang some of these tests on the SIPit site for new implementations, but it may be well worth considering it. Global comments: 1) On a philosophical note, I believe that you are attempting to clarify the race conditions inherent in rfc3261, not in rfc4235. To that extent, it may be best to simply start S2 with a state machines of an early dialog from rfc3261: early, confirmed, and terminated (also, pedantically speaking, these states are not as much "shown in RFC3261" as they are "described in RFC3261." I believe that the closest definition is in the last paragraph of S12 in rfc3261). So, if you accept my premise, then starting with something like this should help the reader, regardless of whether or not they are familiar with rfc4235: For the UAC: R: Received S: Sent +--- R(1xx) v +->+-------+ R(1xx)| | Early |-----------------+ +--+--+----+ | | | R(2xx) | +--+ v | | +-----------+ | | | Confirmed |---+ | S(CANCEL) && R(non-| +-----------+ | R(BYE) | R(487-INV) 2xx)| |S(BYE) +----+ | | v v | | +------------+ | +->| Terminated |<-----------+ +------------+ For the UAS: R: Received S: Sent +--- R(INV) && S(1xx) v +-------+ R(BYE) +----------| Early |-----------------+ or | +--+----+ | R(CAN) | | S(2xx) | | v | | +->+-----------+ | | R(CAN)| | Confirmed |---+ | S(non-2xx) | +--+-----------+ | R(BYE) | | |S(BYE) +----+ | | v v | | +------------+ | +--------->| Terminated |<-----------+ +------------+ Then, you can add your super-state of Preparative to these, as well as further subdivide Confirmed and Terminated states and proceed as you do in the draft. 2) Several places, the prefix "ini-" is used before the word "INVITE". I presume you mean the initial INVITE. If so, just explain it as such on the first use of the prefix. 3) You may want to consider putting a blank line between paragraphs for readability. 4) This is a BCP, right? If so, I find the normative text a bit distracting. 5) At various places, you use the word "signals", i.e., "Only significant signals are illustrated." What you mean, I believe, is "signaling"; so, you may want to consider changing the occurrence of signals to signaling, as in: "Only significant signaling messages are shown." 6) In some call flows, a substantial amount of information on the signaling aspect is provided, including SDP. Other flows simply include a mnemonic like "F12 200 OK Alice -> Bob". Is the fact that more information is provided beyond a mnemonic germaine to the reader? That is, should the reader spent more (careful) time reading the SDP in signaling? Please clarify. The rest of the comments are sorted by the section. 1) S2, Figures 1 and 2 - The timers (Timer K and Timer J) may confuse the reader. Essentially, you are using two ideas in one place: a dialog-level state machines augmented with transaction-related timers. It may be better to simply label Timers K and J as "Timeout" and then explain in the writeup that "This timeout corresponds to the lifetime of the BYE transaction at the UAS and UAC." 2) S3.1.3 - You should (again) emphasize why sending BYE on an early dialog is not recommended. Yes, rfc3261 allows the UAC to do so, but point the reader to Appendix A one more time and exhort them NOT to do this. 3) S3.1.4 - I do not understand the resolution to this problem. Wouldn't an easier (and better?) resolution be that the UAC simply wait a bit longer (at least 2xT1 or 3xT1) before sending a re-INV? That should, under most cases, ensure that if the ACK from the UAC was lost, the UAC will see a retransmitted 200 OK. 4) S3.1.5 - Same feedback as above. 5) S3.1.6 - How feasible is the scenario here? That is, are there UAs in the wild that will start a session when *after* sending a BYE, they see a retransmitted 200 OK (INVITE)? In other words, I don't view this neccessarily as a race condition as much as I view it as the need to apply solid defensive programming: If I get a 200 OK whose dialog identifiers match a dialog that is now in the terminated state, why would I want to resurrect the call again? Clearly, the state machine driving my behavior as a UAC tells me that last arriving 200 OK (INVITE) should be discarded (much like we discard a CANCEL at a UAS if it arrives *after* a UAS has generated a final response.) 6) S3.3 - Nit - s/Here explains/Here, we explain 7) S3.3 - I am not sure what the authors mean by the phrase "What is established by SIP." Please elaborate. 8) Appendix B - I am not sure I understand the reasoning here. The UAC sends a BYE immediately following a re-INV. This will typically happen if the user associated with the UAC presses the "Mute" button (say), followed immediately by hanging up. Now, here is where the actions of the user should be decoupled from the actions at the protocol level. When the user presses the "Mute" button, the transaction layer generates a re-INV. Now, if the user immediately hangs up, the transaction layer can be smart and *defer* sending the BYE until the re-INV has completed successfully. In other words, the transaction layer at the sender knows what has just occurred, so by quarantining the BYE until a response to the re-INV has been received, it can save the UAS a fair amount of protocol grief. Would it be better to simply provide guidance to quarantine the BYE in such cases until the re-INV transaction has finished? That seems easy enough. 9) In Appendix D, you should point out that a CANCEL request, regard- less of when it is received must always elicit a final response. Long time ago in SIP, we had a mantra: "every transaction completes independent of others" -- and it is worth documenting this mantra here. Thus, if a CANCEL elicits a 487 (INVITE), the 487 completes the INVITE transaction and the CANCEL itself should be answered by a 200. 10) I think it is worth adding a test case that documents forking at a proxy, all branches returning a non-2xx, and the proxy aggregating and choosing the best response, but inserting it's own tag in the To header of the non-2xx response. The UAC will have got multiple 1xx responses with different tags, but when it gets a final response, the tag in the To will not match any of the early dialogs. But this is okay. A UAC must be prepared to deal with this eventuality; rfc3261 only mandates that the 200 OK has the same From-tag that the 1xx response corresponding the 200 OK did. Proxies may want to insert their own To-tag in an aggregated non-2xx response for various reasons, some of which are detailed in the last paragraph of Step 6 in S16.7 of rfc3261. This is an area where I am sure some implementations may falter and expect a tag in a previous 1xx corresponding to a forked branch. If you like, I can send you a prototypical example.