[Apologies to the author: This document slipped through the gen-art last call review net and so I only got to volunteer on this one 24 hours ago. Later... Having slept on the comments I see there are a couple that were items that I missed and should be modified or removed. In particular about whitespace (where I missed the definition in terminology). Further I have now read (very quickly) the address draft which clarifies some matters on what are possible JIDs but doesn't totally solve some of the issues.] Document: draft-ietf-xmpp-3920bis-19 Reviewer: Elwyn Davies Review Date: 29 November 2010 IETF LC End Date: IESG Telechat date: 2 December 2010 Summary: This is a very well written document that is mostly easy to read and follow. However I believe it is not (quite) ready for the IESG for the following reasons: It has (IMO) one major issue as an EXTENSIBLE system - the three level-1 stanzas are hard coded and it is not immediately possible to add new types of level-1 stanza should this be thought useful in future. This is, I guess a philosophical issue, and as a -bis document it probably isn't going to get changed at this stage, so this need not concern us deeply. However I think that it might be useful to provide an explanation of the general semantics of the three types of stanza early on in the document - not having participated in XMPP, it wasn't until I stepped back from the details of the document (as well as finding out what IQ stood for when I reached section 8) that I realized that the three existing categories of stanza are effectively semantic types that are appropriate for the three classes of message for which they are named, but could be used for other sorts of function - this might avoid people thinking that they need extra stanza types and complaining that it isn't extensible :-( . There are several more minor issues with these three being important: There are a largish number of SHOULD requirements where the alternative is not discussed or specified. There seems to be a lack of clarity or definiteness about the effects of support of SASL being mandatory. There are a number of places in the document where the text is written as if SASL were not mandatory, and others where the reverse is true but the surrounding text doesn't seem to think it is mandatory.. All these places need to be looked at critically to ensure that the wording is appropriate for SASL being mandatory and that the whole is consistent. The separation of the address format into a separate document means that there should be some information about the terms that are imported in the terminology section. The early sections on stream setup talking about 'from' and 'to' addresses have some difficulties with the use of hostname versus the term used in the JID/addressing draft of domainname. Almost the whole of this document implies that a JID *necessarily* has a localpart, whereas the definition in the addressing draft allows it to be just a bare (or mere!) domainname. The couple of places where JIDs may or in some cases, must not have a localpart confuse matters because they don't describe the hostname only forms as JIDs in the main text, but they are suddenly transformed into JIDs in summaries (in particular the summary table in s4.6.6 where it turns out that, yes, all the 'to' and 'from' addresses are actually JIDs but (1) only restricted forms are allowed for some interactions and (2) these restricted forms were carefully *not* described as JIDs earlier in s4.6). Again, the term 'bare JID' is almost everywhere shown as except for in s8.1.2.1. This term is not actually defined in the addressing draft and appears only in one place as a term for . Apart from these there are a number of other minor issues and various nits/editorial issues. Caveat: As usual I haven't tried to verify the XML in Appendix A, and my scan of the examples in s9 was very cursory. Major issues: s4.1, Definition of XML Stanza: Given that xmpp is supposed to be extensible, is there a really good reason why the first-level stanza-defining elements (message, presence, iq) are hard-coded into the specification rather than being possibly extensible? There is already a stream features mechanism where additional first-level items could be signaled. This might be alleviated by describing the semantic types of message that are covered by the available stanzas - the naming covers up that they are quite general types of message that could be used for other purposes. Minor issues: s2.5, first bullet describing server respnsibilities: The term 'local clients' is slightly confusing. My immediate understanding of 'local' would be something attached to the server from the local network or machine. In this case it is presumably intended to imply clients that have a session connected to this server rather than via some other server but may of course be very remote as regards physical and network location. Maybe 'attached clients' or 'associated clients' (this term is used elsewhere in the document) might be clearer, or maybe this just needs a terminology definition for how 'local' is used - there is the use of 'local entities' previously in s2.5 but this didn't jar so much. s2.5, last para: Does the use of 'local services' imply anything about the disposition of the code implementing the service? This use of 'local' may or may not conflict with the implications of 'local mentioned in the last comment. Terminology needs to be made a little more rigorous. s4.2.2: Towards the end of this section, there are two statements stating 'the initiating entity is cleared to send XML stanzas'. Should this be qualified with 'unless a stream restart is required as a result of the feature negotiation' (at least at the first appearance)? This probably depends on whether voluntary-to-negotiate features can require stream restarts if agreed but not otherwise. s4.2.3 and s4.4: In s4.2.3, if a stream has to be restarted, the parties are supposed to consider the the stream 'replaced'. This presumably means that the old stream is closed. It is not absolutely clear whether this requires that a is exchanged or definitely not exchanged in this case. [OK: This is cleared up for the specific TLS case at item 3 in s5.4.3.3 and for SASL in s6.4.6 - a general note at s4.2.3 would be appropriate since these are not necessarily the only restart cases.] s4.2.6, paras 2 and 3: The first parts of these paragraphs are written as if SASL autrhentication was mandatory (which it is) and is the only possibility (which it might not be). Para 2 (the client-to-server case) does not appear to allow for any other options. Para 3, OTOH, goes on to deal with Server Dialback but the first part still reads as if SASL was the only option. Both paras need to be rewritten to cater for other options (and the version 0.9 case?). s4.6.1, para 6: 'For initial stream headers in server-to-server communication, a server MUST include the 'from' attribute and MUST set the value to the domainpart of the 'from' attribute of the stanza that caused the stream to be established...': Is it absolutely always the case that a server-to-server connection would be established because some stanza came in (presumably from an attached client)? Could this be indirectly through some local service requiring a connection for any other reason? It is clear why the server needs to specify 'from' but the constraint on the origin seems overly restrictive. s5.3.5: 'Because it is relatively inexpensive to establish streams in XMPP, for the first two cases it is preferable to use an XMPP stream reset (as described under Section 4.8.3.16) instead of performing TLS renegotiation.': Is this always true? Probably so for simple instant messaging but if the client is using some other services provided by the server, this may an unwarranted generalization. I am not sufficiently expert in either XMPP or TLS renegotiation to know what the trade-offs would be. s5.3.5: 'However, the third case is sufficiently rare that XMPP entities SHOULD NOT blindly attempt TLS renegotiation.' (the third case is separating client authentication from server authentication): I don't understand what is being said here. Is this an opinion that the risk of client credential leakage is sufficiently low that the expense of two stage negotiation is not really worth it? Or is this something that should be controlled by a configurable client policy? Or is this an implementation choice? Some clarification would help. s5.4.3.1, rule 3: Why should the receiving entity not request a certificate? Why would the initiating entity not send its certificate if it had one? (Is this related to the renegotiation issue from the last comment?) s5.4.3.1, rule 4: How should the receiving entity make the choice given the 'to' attribute (presumably some field in the certificate)? What happens if it decides not to follow this should? s5.4.3.2, para 2: The comment about not sending relates to the comment on s4.3.2 above. However, the reasoning for not sending when the TCP connection is closed due to TLS failure differs from reasoning given elsewhere (e.g., s5.3.5, para 8 - here the error is said to be at the TLS layer and not the XMPP layer so that it is not appropriate to send ). A consistemt story is needed here. s5.4.3.3, item 5: Given that SASL support is mandatory, why is offering the SASL feature not a MUST? Is this to do with possible future alternatives? s6.3.8, para 1: I suspect the two SHOULDs in this para ought to be MUSTs. What other options does initiating entity have if it does/does not want to act on behalf of another entity and SASL is being used? However maybe all this should be qualified by 'If SASL is being used...' s6.4.1, para 5: 'If the receiving entity supports SASL, the stream features SHOULD include an advertisement for support of SASL negotiation,...': s6.2 says that XMPP entities MUST support SASL. I think that this sentence probably needs to be turned round making it something like 'Unless support for SASL negotiation is only allowed once STARTLS negotiation has been completed, the stream features MUST include an advertisement for the support of SASL.' Aside from the STARTLS case (or equivalent cases with future secure stream mechanisms), what happens if the SASL advertisement is not made? s6.4.2, last para: '... the server SHOULD discard the ongoing handshake...': What happens if the server (receiving entity) chooses not to comply with the new proposed SASL mechanism choice? s6.4.5, paras 4 and 6: 'SHOULD close the stream': What would the entity do instead? s6.4.6, para 1: The two SHOULDs in this paragraph ought to be MUSTs. (if there is a 'from' address then the receiving entity MUST do the correlation - what else could it do? If it does the match and they don't match, what would do other than terminate the attempt?) s8.1.2.1, item 2; also s10.5.1 and s10.5.2: s8.1.2.1, item 2 appears to introduce the idea that the is the 'bare JID' of a server. This concept does not appear anywhere else in this document (or from a brief Google search elsewhere). It was not used in RFC 3920. It appears that the term is used once but not explicitly defined in the addressing draft - there it is used for the form. It also doesn't match the 'definition' of Bare JID in s10.5.3.2. Previously in this document this was known as the 'hostname'. However s10.5.1 refers to a 'Mere Domain' as a JID and s10.5.2 to a as a JID. This seems rather inconsistent although technically it is covered by the addressing draft. This all probably be helped by summarizing the terminology used in the addressing draft in the terminology section, and defining the term 'bare JID' properly (in the addressing draft). s8.1.2.1, items 2 and 4: As written, both items refer to stanzas 'generated by the server'. I take it that the distinction is that item 2 refers to messages from the server that are not generated by the server as a proxy for the client account but are some sort of generic control message or the like, whereas item 4 refers to server generated messages that are generated by the server as such a proxy for the client account. The wording is somewhat confusing at present. Presumably the lack of a 'from' attribute is intended to distinguish item 1 messages from item 4 messages - this could be spelt out more positively if this is correct. s13.7.1.1. Both instances of item 4: 'The signatureAlgorithm SHOULD be the PKCS #1 version 1.5 signature algorithm with SHA-256 as defined by [PKIX-ALGO].': What happens if it isn't? s13.7.1.1: First item 5: 'The certificate SHOULD include an Authority Information Access (AIA) extension that specifies the address of an Online Certificate Status Protocol [OCSP] responder.' What happens if it doesn't? s13.9.5 and s15: The requirement for support of TLS-NEG if TLS Renegotiation is to be allowed should be mentioned here. This is also a conformance requirement. s14: I seem to remember that giving a working group as a registrant contact for an URI is deprecated as it is likely to go out of existence fairly shortly. Nits/editorial comments: s3.2.1, bullet labeled '5': 'The initiating entity uses the IP address(es) from the first successfully resolved hostname...': 'first' implies that it depends on how fast the answers come back. Presumably what is wanted is the IP address of the highest priority SRV record that is successfully resolved, maybe subject to some local policies? s3.2.1,bullets labelled '3' and '8': Bullet 3 suggests that the s3.2.2 fallaback process 'SHOULD' be used whereas bullet 8 has an implicit MAY (that might be better if was explicit). Why the distinction? [Presumably, having read on, there is little point in trying the first possibility in the fallback if the initiating entity has already tried the default port, but the alternatives might still work.] Section 3.2.2 could mention that there is no point in trying the default ports if they were already tried under s3.2.1. s3.2.2: A pointer to s15.7 where the default port numbers are registered would be useful. s3.2.4: Where is the appropriate XML stanza for defining add-on services defined? s4.1, Definition of XML Stanza: The term/title 'iq'/'IQ' is not expanded until s8.2.3 in this document where it is expanded as info/query. It would be useful to mention this here (first occurrence) and indeed maybe to put all three of message, presence and iq into the terminology section. The usage of IQ stanzas would then be clearer and their use in examples would be more transparent - currently they just appear with no introduction (e.g, in s4.1 and then in s4.8.3.11). s4.1, final set of bullets: Adding section references to the five topics would be helpful. s4.2.2, first para: A pointer to s4.6.5 which is very prescriptive about how to do the version comparison would be useful. Figure 3: It would help if the figure could be made one or two lines shorter so that the title appears on the same page. (e.g., move the bottom line 'yes' and 'no' above the transition lines that they apply to, remove one line below the 'process complete' box.) s4.6.1, para 8 (also s4.6.2): Although it may seem a bit pedantic, it is probably desirable to clarify that by 'hostname' the document means a fully qualified DNS name. The term is used ambiguously by many operating systems. (The document could also stick to the 'domainname' term used in the addressing draft.) s4.6.6: The entries for to/initiating to receiving and from/receiving to initiating are hostnames which are rstricted forms of JID. The other two to and from entries are hostnames rather than arbitrary JIDs for the server-to-server case. [But the comments on s8.1.2.1 item 2 and ss10.5.1/2 under minor issues may have some bearing here.] [Comment withdrawn: I missed the definition of whitespace in the terminology section: s5.3.3 and s6.3.5: Does 'whitespace' mean all the 'white space' items referred to in Section 2.10 of the current version of the XML spec (i.e., including line separators). The term whitespace is used earlier whan referring just to spaces used in the context of keep alives. [It turns out much later that s11.7 clarifies this but maybe doesn't totally resolve s2.10.] Note that the XML specification uses 'white space' rather than 'whitespace'.] s6.4., last para: For consistency and avoiding confusion between XMPP server and SASL server: s/server/receiving entity/ s8.2.1: A reference to s10.3.1, where the meaning of a message with 'to' attribute is explained, would clarify the first SHOULD. Adding 'if possible' to route or deliver it to the intended recipient' would explain the second SHOULD which shoudl probably be owercase 'should' as this isn't a conformance issue but a matter of whether the connections are available. s8.2.2: The second and third SHOULDs probably need 'if possible' to explain them and they should probably be lowercase 'should' as this isn't a conformance issue but a matter of whether the connections are available. s8.3.2/s8.3.3: All the error specifications in s8.3.3 contain a SHOULD specification of the error type recommended. A note in s8.3.2 or s8.3.3 that the values of the error type given in s8.3.3 are the usual expected error type, but that there are conceivably circumstances when some other value would be more appropriate. (the only one I copuld think of so far, apart from the remote-server-timeout case already in the draft, was associated with payment-required where a modify or cancel error type might be appropriate if the reason was not that the requestor wasn't actually authorized but that it didn't have enough credit.) s11.4: 'A client SHOULD NOT rely on the ability to send data that does not conform to the schemas': This is not controllable by the protocol. s/SHOULD NOT/should not/ s11.4: What should the client do if it doesn't ignore non-conformant elements? s8.3.3.15, Security Note: s/communicated/communicate/ s14. The IANA Considerations mostly duplicate those in RFC 3920. Is the intention that IANA should update the references to RFC3920 to point at this document? If so some statement should be made about this. s14: Presumably 'XXXX' is to be replaced by the RFC nnnn identifier for the eventual RFC. This isn't stated. Should the various Namespace names refer to document sections?