Draft: draft-ietf-sipping-profile-datasets-01
Reviewer: Dale R. Worley
Review Date: 11 Sept. 2008
Review Deadline: 10 Sept. 2008
Status: pre-WGLC

Summary: This draft is on the right track but has open issues,
described in the review.=20

I have the following major concerns:

1. The draft is not tightly written, in that it talks *about* the user
agent profile dataset system, but it does not carefully define the
system based on prior knowledge.  E.g., several significant terms are
not defined in the glossary; the "application profile" is mentioned in
a few places, but its source and purpose are nowhere described; clear
distinction is not kept between the profile dataset system, the various
profile dataset schemas, and individual profile dataset instances.
All of these problems seem to be of the sort caused by authors who are
so familiar with the subject matter as to be unaware of what the
reader might not know, and could be cured by a careful editing.

2. The intended scope of the RFC is unclear.  One possibility is that
the RFC is intended to be a conceptual guide to the profile dataset
system, giving little more information than that profiles are
separated into datasets and that the UA must merge multiple profiles,
and listing all of the information that must be specified in the
dataset definition RFCs.  If this possibility is chosen, much of the
material in the draft should not be present, as it is long-winded
examples of matters about which the dataset definitions have complete
freedom:  e.g., the allowed value spaces of properties, the common
setting attributes, the merge algorithms.  In short, essentially all
of section 5 should be omitted because it is not normative or binding
on the writer of a dataset definition.

The other possibility (and the one that I vastly prefer) is that the
RFC defines the operation of the profile dataset system clearly enough
that from it one could from it implement a UA profile dataset toolkit,
a toolkit into which one could plug the dataset schemas to produce
(without further programming effort) software that would handle all
the processing that is discussed in the draft.

To implement this second possibility would require that the draft
specify a number of matters that it does not at present specify,
including:

- the datatypes allowed for the settings, and how they are specified
by the dataset schemas

    In particular, the draft seems to envision that all settings are
    either "scalar" datatypes or are subsets of values of a "scalar"
    datatype.  The allowed set of scalar datatypes is not specified.
    (Section 5.12 has a placeholder for this specification.)
    The subset types are not clearly defined, so I'm certain that
    various odd special cases will lead to interoperability problems.
    How the subset types would be specified in the dataset schemas is
    not specified.

- what merge algorithms will be allowed, and how they are to be
specified

    The least desirable choice is that each dataset definition will
    have a free hand to define its merge algorithms in English, as
    that makes it impossible to build a toolkit to handle the merge
    process.  However it seems unlikely that this RFC could specify
    any set of merge algorithms without the danger of omitting some
    algorithm that would turn out to be essential.  One possibility is
    that the specification could define that the merge algorithms are
    to be specified in the schemas using XSLT or another standardized
    language.

In any case, the authors need to decide what the intention of the RFC
is and adjust the text to match.

The remaining two problems are questions regarding the overall data
model used for profile information:  That all the data can be modeled
as triples "schema-URN/setting-name -> scalar value".

3. The handling of non-scalar settings is not clear.  In all UAs that
I know of, some aspects of the configuration information are
conceptually a "structure containing an array of structures".  E.g.,
there are a number of "lines", and for each line, the UA needs a set
of data:  user-part, domain, outgoing-proxy, auth-user, auth-password,
registration-interval, etc.  There is no clear mapping from this
conceptual structure into the datatypes that seem to be envisioned:
named scalars and named sets of scalars.

4. The handling of repeated datasets is not clear.  In the framework,
it seems to be assumed that each of the 4 sources of profiles (user,
application, device, local-network) will provide a set of datasets,
and within each set, no two profiles with have the same schema URN.
The schema URNs allow the UA to determine which datasets are to be
merged with each other, resulting in a set of merged datasets, each of
which has a unique schema URN.  From the merged datasets, datasets
that are not needed by the UA are discarded, and from thost that are
not discarded, the non-supported extensions are discarded.

However this system assumes that it would never be meaningful to have
more than one (merged) dataset with the same schema URN.  In practice,
this means that if a UA can contain an "entity" that is configured by
a dataset, that no UA will ever want to contain more than one entity
of that type, as in that case, there would be no way to provide
configuration to each of the entities separately.  However, this
assumption of no duplication seems (to me) to be unlikely to hold in
practice.