Draft: draft-ietf-sipping-race-examples-01.txt
Reviewer: Vijay K. Gurbani
Review Date: 18 May 2007
Review Deadline: 18 May 2007 (Extended)
Status: WGLC

Summary: This draft is on the right track but has open issues, described in the review.

This is a good draft.  It attempts to fill in the blanks of rfc3261 with respect to some race conditions.  It should be helpful to implementors.  Not sure whether it makes sense to hang some of these tests on the SIPit site for new implementations, but it may be well worth considering it.

Global comments:

1) On a philosophical note, I believe that you are attempting to
   clarify the race conditions inherent in rfc3261, not in rfc4235.
   To that extent, it may be best to simply start S2 with a state
   machines of an early dialog from rfc3261: early, confirmed, and
   terminated (also, pedantically speaking, these states are not as
   much "shown in RFC3261" as they are "described in RFC3261."  I
   believe that the closest definition is in the last paragraph of
   S12 in rfc3261).

   So, if you accept my premise, then starting with something like
   this should help the reader, regardless of whether or not they
   are familiar with rfc4235:

   For the UAC:
   R: Received
   S: Sent
               +--- R(1xx)
               v
          +->+-------+
    R(1xx)|  | Early |-----------------+
          +--+--+----+                 |
             |  | R(2xx)               |
          +--+  v                      |
          |  +-----------+             |
          |  | Confirmed |---+         | S(CANCEL) &&
    R(non-|  +-----------+   | R(BYE)  | R(487-INV)
      2xx)|     |S(BYE) +----+         |
          |     v       v              |
          |  +------------+            |
          +->| Terminated |<-----------+
             +------------+

   For the UAS:
   R: Received
   S: Sent
                      +--- R(INV) && S(1xx)
                      v
                     +-------+
   R(BYE) +----------| Early |-----------------+
     or   |          +--+----+                 |
   R(CAN) |             | S(2xx)               |
          |             v                      |
          |       +->+-----------+             |
          | R(CAN)|  | Confirmed |---+         | S(non-2xx)
          |       +--+-----------+   | R(BYE)  |
          |             |S(BYE) +----+         |
          |             v       v              |
          |          +------------+            |
          +--------->| Terminated |<-----------+
                     +------------+

   Then, you can add your super-state of Preparative to these, as
   well as further subdivide Confirmed and Terminated states and
   proceed as you do in the draft.

2) Several places, the prefix "ini-" is used before the word "INVITE".
   I presume you mean the initial INVITE.  If so, just explain it as such
   on the first use of the prefix.

3)  You may want to consider putting a blank line between paragraphs
   for readability.

4)  This is a BCP, right?  If so, I find the normative text a bit
   distracting.

5)  At various places, you use the word "signals", i.e., "Only
   significant signals are illustrated."  What you mean, I believe,
   is "signaling"; so, you may want to consider changing the occurrence
   of signals to signaling, as in: "Only significant signaling messages
   are shown."

6)  In some call flows, a substantial amount of information on the
  signaling aspect is provided, including SDP.  Other flows simply
  include a mnemonic like "F12 200 OK Alice -> Bob".  Is the fact
  that more information is provided beyond a mnemonic germaine to
  the reader?  That is, should the reader spent more (careful) time
  reading the SDP in signaling?  Please clarify.

The rest of the comments are sorted by the section.

1)  S2, Figures 1 and 2 - The timers (Timer K and Timer J)
  may confuse the reader.  Essentially, you are using two ideas in
  one place: a dialog-level state machines augmented with
  transaction-related timers.  It may be better to simply label
  Timers K and J as "Timeout" and then explain in the writeup
  that "This timeout corresponds to the lifetime of the BYE
  transaction at the UAS and UAC."

2)  S3.1.3 - You should (again) emphasize why sending BYE on an
   early dialog is not recommended.  Yes, rfc3261 allows the UAC to
   do so, but point the reader to Appendix A one more time and
   exhort them NOT to do this.

3)  S3.1.4 - I do not understand the resolution to this problem.
   Wouldn't an easier (and better?) resolution be that the UAC
   simply wait a bit longer (at least 2xT1 or 3xT1) before sending
   a re-INV?  That should, under most cases, ensure that if the
   ACK from the UAC was lost, the UAC will see a retransmitted
   200 OK.

4)  S3.1.5 - Same feedback as above.

5)  S3.1.6 - How feasible is the scenario here?  That is, are
   there UAs in the wild that will start a session when *after*
   sending a BYE, they see a retransmitted 200 OK (INVITE)?
   In other words, I don't view this neccessarily as a race
   condition as much as I view it as the need to apply solid
   defensive programming: If I get a 200 OK whose dialog
   identifiers match a dialog that is now in the terminated state,
   why would I want to resurrect the call again?  Clearly, the
   state machine driving my behavior as a UAC tells me that
   last arriving 200 OK (INVITE) should be discarded (much like
   we discard a CANCEL at a UAS if it arrives *after* a UAS has
   generated a final response.)

6)  S3.3 - Nit - s/Here explains/Here, we explain

7)  S3.3 - I am not sure what the authors mean by the phrase
  "What is established by SIP."  Please elaborate.

8)  Appendix B - I am not sure I understand the reasoning here.
   The UAC sends a BYE immediately following a re-INV.  This will
   typically happen if the user associated with the UAC presses
   the "Mute" button (say), followed immediately by hanging up.

   Now, here is where the actions of the user should be decoupled
   from the actions at the protocol level.  When the user presses
   the "Mute" button, the transaction layer generates a re-INV.
   Now, if the user immediately hangs up, the transaction layer
   can be smart and *defer* sending the BYE until the re-INV has
   completed successfully.  In other words, the transaction layer
   at the sender knows what has just occurred, so by quarantining
   the BYE until a response to the re-INV has been received, it
   can save the UAS a fair amount of protocol grief.

   Would it be better to simply provide guidance to quarantine
   the BYE in such cases until the re-INV transaction has finished?
   That seems easy enough.

9) In Appendix D, you should point out that a CANCEL request, regard-
   less of when it is received must always elicit a final response.
   Long time ago in SIP, we had a mantra: "every transaction completes
   independent of others" -- and it is worth documenting this mantra
   here.  Thus, if a CANCEL elicits a 487 (INVITE), the 487 completes
   the INVITE transaction and the CANCEL itself should be answered by
   a 200.

10)  I think it is worth adding a test case that documents forking
   at a proxy, all branches returning a non-2xx, and the proxy
   aggregating and choosing the best response, but inserting it's
   own tag in the To header of the non-2xx response.  The UAC will
   have got multiple 1xx responses with different tags, but when it
   gets a final response, the tag in the To will not match any of
   the early dialogs.

   But this is okay.  A UAC must be prepared to deal with this
   eventuality; rfc3261 only mandates that the 200 OK has the same
   From-tag that the 1xx response corresponding the 200 OK did.
   Proxies may want to insert their own To-tag in an aggregated
   non-2xx response for various reasons, some of which are detailed
   in the last paragraph of Step 6 in S16.7 of rfc3261.  This is
   an area where I am sure some implementations may falter and
   expect a tag in a previous 1xx corresponding to a forked branch.
   If you like, I can send you a prototypical example.