Document: draft-ietf-nsis-rmd-16.txt RMD-QOSM - The Resource Management in Diffserv QOS Model Reviewer: Joel M. Halpern Review Date: 13-Mar-2010 IETF LC End Date: 22-Mar-2010 IESG Telechat date: N/A Summary: This document is not ready for publication as an Experimental RFC. Clarity Issue: The document makes repeated use of the term Severe congestion. It seems inevitable that a somewhat fuzzy definition will be used for that, and I would not have concern about such fuzziness. However, the definition used in the document, in section 2, presumably with the understanding and agreement of the working group, is "congestion that occurs when a node or link fails and the traffic is rerouter through another node or link." This property (being caused by node or link failure) has nothing to do with the severity of the congestion. The text goes on to talk about this type of congestion not be addressable via admission control. It is possible that the document means severe congestion (in the more conventional sense) with the added caveat that it is brought about by failure. But that is not what the definition says. (If that is indeed the intent, then clarifying the definition will suffice to resolve this issue.) Also, as a lesser matter, there are systems which do address / prevent element failures from causing severe congestion by using admission control, so the claim in the definition that it can not be addressed by admission control is at best misleading. It requires very different behaviors than RMD,so are presumably inapplicable to this situation. Major issues: Section 3.2.3 on applicability seems to state that although there are Multiple RMD-QOSM schemes, none are mandatory to implement. And that a domain must all use one scheme. I am not sure if "scheme: here refers to this document as distinct from some other document, or refers to the variations (such as reduced state, and two varieties of stateless) on interior node behavior. If, as seems to be the case since the following text defines 5 schemes, it is referring to the interior behavior choices, it would seem that there needs to be a mandatory-to-implement scheme in order for this document to promote interoperability rather than fragmentation of the network. In this day and age, it seems surprising that the protocol specifies that the interior messages are to be sent with no security. The IETF is actively working to improve the security of intra-domain and inter-domain routing, so this decision seems wrong. (Even for an experiment.) (Section 4.4, 4th bullet.) At the very least, some explanation of this choice is necessary. The text in section 4.1.2 states that the 8 bit overload % field contains a real value. However, I could not find a description of the encoding by which a real value between (between 0 and 1?) should be encoded in the message. Minor issues: The measurement based admission control mechanism used here looks remarkably similar to the classical RSVP Predicative service. Both of these are based on the assumption that current measured characteristics are an indicator of future load. It is not at all apparent that there is any such relationship. It seems that the text ought to include some indication as to what the basis of suggesting this be used is, and why it is thought to be meaningful. Even if the argument is "it is worth trying", it seems worth stating that, and stating why it is thought that it will work now. It would probably be helpful to explain why it is necessary or desirable to use two different RESERVE messages across the same domain, traversing the same set of devices, with different but closely related information. (particularly in light of the comments about reducing load on intermediate devices.) The applicability section states that this mechanism can only be used with the EF DSCP. Is it further the case that it can only be used for traffic which consistently uses a stable amount of bandwidth (per reservation)? One of the difficulties with the style of reservation based on measurement of load is that the end pointing requesting the measurement must be aware of whether the measurement data includes the flow being considered for admission. Otherwise, large flows can cause significant confusion. With very stable flows, as long as the measurements are not requested too often, this is achievable. Otherwise, it is not at all clear to this reader how the proposed mechanism would work (particularly when refreshing a reservation). Continuing this line of questioning, the mechanism for modification seems to send the new bandwidth through the stateless intra-domain routers. Since they are stateless, those routers do now know what the old reservation was. And the measurements presumably include traffic under the old reservation. if these are added together, significant double- counting woudl seem to occur. (This is listed as minor on the premise that the protocol presumably actually works, and therefore the problem is one of reader comprehension, rather than more serious technical issues. I was not able to understand the purpose or use of the K bit. I may have missed it in the dense text. Assuming there is an explanation, a pointer at the point where the bit is defined to the text which explains its use would be a very good idea.