Draft: draft-ietf-sipping-cc-framework-o7 Reviewer: John Elwell Review Date: 2007-04-13 Review Deadline: 2007-04-13 Status: WGLC Summary: This draft is basically ready for publication, but has nits that should be fixed before publication. General comment The draft seems to cover an extremely broad spectrum of capabilities, and it is not always obvious that they are specifically to do with multi-party usage of SIP, which is the stated aim of the draft. For example, the description of caller preferences is really about how to choose a particular contact for establishing a two-party session, and not specifically to do with multi-party. As another example the Voice XML section doesn't explicitly talk about use in support of multi-party communications. The draft might benefit from slimming down and focusing on core multi-party usage of SIP. However, I don't see this as a major restructuring - just removal of a few less relevant parts. Because of limited time, I was only able to review small parts of the document in detail. I happened to notice the following specific points: 1. "In this context a "mixer" refers to combining media in an appropriate, media-specific way." Can we change "combining media" to "combining media of the same type", to avoid misinterpretation as combining media of different types? A transcoder might do this, but not a mixer. 2. "Some participants may be hidden within a conversation space." It then goes on to give examples but without providing a definition of "hidden". For the tone generator example, this is not hidden in the sense that other participants are aware that a tone is being injected, although it does not need to be associated with any identified participant. For the call centre supervisor monitoring example it seems like the participant is truly hidden, in that other participants might not be aware of that user's presence (other than perhaps being told that calls may be monitored). So is the single term "hidden" applicable to all these examples, and can we find some sort of definition that works for all examples? 3. "A human participant "on-hold" is passive." This seems to be in conflict with the definition of active earlier in the paragraph. Since a human participant on hold can decide to leave the conversation space, surely that participant is active? 4. There seems to be some inconsistency in terminology: the terms "dialog", "SIP dialog" and "session dialog" seem to be used interchangeably. A single term should be selected, preferably SIP dialog (since the term dialog is used also in other contexts, such as "voice dialog"). 5. "setup time is shorter (fewer messages and round trips are required)" (referring to the peer-to-peer approach). This assertion doesn't seem to be backed up. Consider the case where A is in a call with B and A refers B into a conference focus/mixer C. For the 3PCC approach assume there is one B2BUA in the middle and for the P2P approach assume there is a record-routing proxy in the middle. Assume REFER with norefersub for simplicity and skip 100 responses. With the 3PCC approach: 1. REFER request A to B2BUA 2. 2xx response B2BUA to A 3. INVITE request B2BUA to C 4. 180 response C to B2BUA 5. 200 response C to B2BUA 6. Re-INVITE request B2BUA to B 7. 200 response B to B2BUA 8. ACK B2BUA to C 9. ACK B2BUA to B 10. BYE B2BUA to A 11. 200 response A to B2BUA Total 11 messages With the P2P approach 1. REFER request A to proxy 2. REFER request proxy to B 3. 2xx response B to proxy 4. 2xx response proxy to A 5. INVITE request B to proxy 6. INVITE request proxy to C 7. 180 response C to proxy 8. 180 response proxy to B 9. 200 response C to proxy 10. 200 response proxy to B 11. ACK B to proxy 12. ACK proxy to C 13. BYE request B to proxy 14. BYE request proxy to A 15. 200 response A to proxy 16. 200 response proxy to A Total 16 messages. So, whilst the messages sequences might not be quite accurate, in this example the P2P approach seems to be significantly more expensive than the 3PCC approach. The use of NOTIFICATION requests would increase the difference further. Whilst there might be particular cases where the P2P approach is cheaper (particularly where there is no record routing), it is certainly not the general case. 6. "Support SIP conference policy control" What is this? 7. "Locally perform media forking (multi-unicast)" Why does the P2P approach rely on this primitive? I guess it depends on the mixing model 8. "Each participant joins that multicast groups" Change "that" to "those". 9. "This concept is described in more detail in the context of dialog operations in section" This sentence is incomplete.