Hi Sarvi,
Thanks for the feedback. One or two additional comments / clarifications below.
Dave
----- Original Message -----
From: Shanmugham, Saravanan
To: Dave Burke ; speechsc@ietf.org
Sent: Friday, March 17, 2006 1:21 AM
Subject: RE: [Speechsc] Speaker Verification (Section 11) review comments-
>additional comments
-------------------------------------------------------------------------------
-
From: speechsc-bounces@ietf.org [mailto:speechsc-bounces@ietf.org] On Behalf
Of Dave Burke
Sent: Sunday, January 15, 2006 10:11 AM
To: Dave Burke; speechsc@ietf.org
Subject: Re: [Speechsc] Speaker Verification (Section 11) review comments -
>additional comments
Six more comments on Section 11:
11. Missing state machine diagram
[Sarvi>>] Can be added
12. In section 11.4.3, it says "...voiceprint identifier headers of the VERIFY
method". However, Voiceprint-Identifier is placed in START-SESSION not VERIFY
[Sarvi>>] It should be START-SESSION
13. The BNF is restricting the Voiceprint-Identifer to have only 3 characters
after the period. None of the examples follow this. Why the restriction in
length? Suggest:
voiceprint-identifier = "Voiceprint-Identifier" ":"
1*VCHAR "." 1*VCHAR
*[";" 1*VCHAR "." 1*VCHAR] CRLF
[Sarvi>>] I dont' see the restriction above. The above BNF should match
AAAAAA.BBBBBB as well. Am I looking at this wrong?
DB> The BNF above is the fix I'm suggesting. The one in the spec on page 129
uses 3VCHAR in place of 1*VCHAR and would only match AAAAAA.BBB
14. What kind of values does <decision> take when (a) training has been
performed or (b) for multi-verification (can more than one voice-print
be "accepted"?)
[Sarvi>>] For training, I don't think any of the <voiceprint> elements should
contain a <decision> element. For multiverification result, the value would
could be rejected, accepted and undecided. I believe there should be only one
<voice-print> with a decision element of the above possible values.
DB> Sorry - my query is only relevant to training. Perhaps 11.5.4 should then
have a sentence clarifying that <decision> is not present for training results?
15. Minor inconsistency: Why does <verification-score> range from -1.0 to 1.0
whereas confidence (for ASR) ranges from 0 to 1.0. Why not align <verification-
score> range with confidence range?
I don't remember this clearly, but I believe there was an earlier discussion
on the meaning of this verification score and how it should be interpreted and
decision was to go with -negative to possible range.
DB> Fine - I suppose partly it's because the value if not to be treated as a
formal probability.
16. Editorial: Use voice-print or voiceprint but be consistent.
[Sarvi>>] Ok I'd go with voiceprint, seems a more common occurace.
----- Original Message -----
From: Dave Burke
To: speechsc@ietf.org
Sent: Saturday, January 14, 2006 2:26 PM
Subject: [Speechsc] Speaker Verification (Section 11) review comments
Hello,
Had cause to review Section 11 of MRCPv2-09. Needs editorial attention -
please see below:
Dave
1. Typos
Respository-URI -> Repository-URI
Voiceprint-Identity -> Voiceprint-Identifier
[Sarvi>>] ok
2. Error in examples
According to the spec:
The value of the Verification-Mode header MUST be one of either "train"
or "verify".
... yet none of the examples include said header (and one erroneously places
it in the VERIFY-FROM-BUFFER message - it is only meant to be present in the
START-SESSION message).
[Sarvi>>] correct.
3. Not well defined how to specifiy shared resources:
The current text for sharing sessions between a co-resident recogniser or
recorder and a speaker verification engine is restrictive and not accurately
specified. The key point is that the related resources are related because
they were allocated within the same SIP dialog and not that they were
allocated within the same (INVITE) message transaction.
Suggest changing:
It is possible for a speaker verification resource to share the same
session with a recognizer resource or to operate in independently.
In order to share the same session, the SDP/SIP INVITE message for
the verification resource MUST also include the recognizer resource
request
to:
It is possible for a speaker verification resource to share the same
session with a recognizer resource or to operate independently.
In order to share the same session, the verification and recognizer
resources must be allocated from within the same SIP dialog.
[Sarvi>>] I believe this was the intent. The idea was that we may want to
start with just a Recorder/Recognizer and then add/drop the verification
engine as needed, through a re-INVITE.
This clarification will be made.
4. <result-type> not defined anywhere in the spec. Doesn't appear in schema.
Probably not necessary.
[Sarvi>>] I think the schema needs to be fixed. I believe the verification
result carries this information to differentiate a training result for a
verification result. Though, the client should already know that, I think it
might help to make the distinction within the XML.
5. <num-frames> not defined anywhere in the spec.
[Sarvi>>] Will add a definition for this. Will send out a proposed text for
this to make sure, there is no objection.
DB> Actually, maybe <utterance-length> is the element intended to contain this
information in which case we just need to replace <num-frames> in the examples
with <utterance-length>?
6. Not clear for some elements if they're required or optional (section 11.5.x)
[Sarvi>>] Will clarify
7. Define values in section 11.5.6. Presumably "et-phoned-home" is in context
only if we publish on 04/01/xx?
[Sarvi>>] Do not understand. could you please explain.
DB> It would be good if each of the <device> value types had a brief
explanation of what their meaning. It seems like "et-phoned-home" might be a
joke referring to a certain Spelberg movie about an extra terrestrial ("ET
phone home")?
8. Examples missing the xmlns in NLSML in VERIFICATION-COMPLETE message
bodies. Actually, shouldn't the http://www.ietf.org/xml/ns/mrcpv2 namespace
apply to all NLSML documents throughout the specification not just those
associated with verification?
9. What does the grammar attribute on <result> mean in the context of
verification?
[Sarvi>>] I believe this could contain the grammar URI that was matched with
the RECOGNIZE command. But it probably wouldn't make much sense in many cases
where there may not be an associated RECOGNIZE operation.
I think we should say that the result attribute should be ignored for
verification results.
10. Many examples include <extensions> in their NLSML. Presumably this needs
to be deleted (since the element is neither defined nor specified)?
[Sarvi>>] Yes.
Thx,
Sarvi
-------------------------------------------------------------------------------
-
_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc
-------------------------------------------------------------------------------
-
_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc
|