Document: draft-ietf-avt-rtp-atrac-family-16.txt Reviewer: Scott Brim Review Date: 24 June 2008 IESG Telechat date: 02 July 2008 Summary: This draft is on the right track, but has open issues, described in the review. Comments: This is being submitted as a proposed standard. Therefore I am asking that it be very clear. My concerns are mainly with what I see as some ambiguities and some possible errors in documenting protocol behavior. There aren't many so I have left them in the order they occur in the draft instead of categorizing them. 1. Introduction > The need for real-time streaming of audio data has grown, and > this document details our efforts in increasing the product and > application space for the ATRAC family of codecs. This is a draft for a proposed standard technical specification. Whether it is motivated by a desire to increase product and application space is irrelevant. I would delete this. 4.5.2 Scalable Multi-Session Streaming > While there may be alternative methods for synchronization of the > layers, it is RECOMMENDED that the timestamp will be used for > synchronizing the base layer with its enhancement. Applications "It is RECOMMENDED" does not conform to RFC 2119. This should be a SHOULD, along with an explanation of the conditions under which it is reasonable not to implement (so that implementors are not left guessing). > If the enhancement layer's session data cannot arrive until the > presentation time, the decoder SHALL decode the Base layer > session's data only, ignoring the enhancement layer's data. Change SHALL to MUST globally. 5.1 Global Structure of Payload Format > The structure of ATRAC Payload is illustrated in Figure 3. The > RTP payload following the RTP header contains three octet-aligned > data sections. Only two data sections are described. Do you mean that the RTP header plus ATRAC header section plus payload section form three sections? 5.3.1 Usage of ATRAC Header Section > Fragment Number (FrgNo): 3 bits > In the event of data fragmentation, this value is one for the > first packet, and increases sequentially for the remaining > fragmented data packets. This value SHOULD be zero for an > unfragmented frame. Earlier it was said: "The ATRAC codec can handle very large frames. As most IP networks have significantly smaller MTU sizes than the frame sizes ATRAC can handle ...". If there can be such a significant difference -- and if you want to allow for larger frames in the future -- is there special handling for when this 3-bit counter rolls over (more than 7 fragments)? If not, at least mention that you do not expect it to roll over -- or that you expect the receiver to be able to handle rollovers. 5.3.2.2 Frame Fragmentation > However, if even a single ATRAC frame will not fit into a > complete RTP packet, the ATRAC frame SHOULD be fragmented. What is the alternative to fragmenting it? If there is no alternative, make the SHOULD a MUST. If there is an alternative, what is it and under what conditions is it acceptable to do it? For example, you might say: ... "the ATRAC frame SHOULD be fragmented unless the receiver is non-compliant and has indicated it is incapable of receiving fragments, in which case the session MUST be terminated." > As subsequent packets do not contain any new frames, the Number > of Frames field SHOULD be ignored. Should this SHOULD be a MUST? I would think so. If not, under what conditions is it acceptable NOT to ignore the Number of Frames field? 6.1 Example Multi-frame Packet First, NFrames=5 means there are 6 frames in the packet but only 5 are shown. Second, up in 4.5.1 you said: "In multiplexed streaming, the base layer and enhancement layer are coupled together in each packet, utilizing only one session as illustrated in Figure 1. While the packet may begin with either layer type, the two layer types MUST interleave." In this example you show 3 base layer frames, an enhancement frame, and then a base layer frame. Since - you have begun interleaving in the middle of a packet, and - interleaving can begin with either layer type, and - there are no frame numbers, how can you tell that the enhancement layer frame is not the _beginning_ of the interleaving, and that it is not associated with the _following_ base layer frame? There seem to be some implicit assumptions that should be made explicit, so that implementors can avoid incompatibility. 7.5.2 For Media subtype ATRAC-X > The "baseLayer" parameter MUST be the first entry on this > line. It is RECOMMENDED that the "channelID" parameter be the > next entry. Again, make this a SHOULD, and explain under what conditions it is acceptable not to do so. Why are you allowing implementors NOT to have channelID be second? Why do you want them to? 7.5.3 For Media subtype ATRAC Advanced Lossless > o The Media subtype (payload format name) goes in SDP "a=rtpmap" > as the encoding name. This SHOULD be followed by the > "sampleRate" (as the RTP clock rate), and then the actual > number of channels regardless of the channelID parameter. What is the problem if this order isn't followed? If you have a SHOULD, it's good to tell implementors under what conditions it is acceptable for them not to do it. Otherwise you get inconsistent implementations. Some just ignore all SHOULDs. > It is RECOMMENDED Make it a SHOULD, with explanation. The same comment applies to the uses of RECOMMENDED that follow. I'll stop mentioning them. 7.6 Offer-Answer Model Considerations > In order to establish an interoperable transmission framework, an > Offer-Answer negotiation in SDP SHOULD observe the following > considerations. Under what conditions is it acceptable not to? 7.6.3 For Media subtype ATRAC-X > o When creating an offer with considerably high requirements > (such as 8 channels at 96kHz), it is RECOMMENDED that the > offer also contain a configuration with lower requirements > (such as a stereo only option). Although multiple alternative > configurations may be offered, care SHOULD be taken not to > offer too many payload types. I'm not sure what this SHOULD means. If this is just a general bit of advice, make the SHOULD lower case should -- or perhaps just delete it. If this is an important guide to implementation, then should the SHOULD be a MUST? If so, what specifically do you mean by "too many"? Is it possible for the offerer to know? If it should be a SHOULD, what is the impact of offering too many? Under what conditions is it acceptable to offer too many? When the receiver's capabilities are not known? > For best performance, we suggest an answer SHALL NOT contain > any values requiring further capabilities than the offer > contains, "suggest ... SHALL NOT". Either they MUST NOT or SHOULD NOT, but I wouldn't just "suggest" a requirement. What happens if an offer _does_ contain further capabilities?