Folks:
Adam Roach; Ben Campbell; Cullen Jennings; Daniel Petrie; Dean Willis; Gonzalo 
Camarillo; Jonathan Rosenberg; Robert Sparks

Conclusion :
Identified we need to tear this apart addressing the problem of having to take a long time 
to generate a response separately from the problem of something that was going to 
respond went away.  

Summary:
The "Random Notes" below are the stuff I typed in the meeting. Right here I am going to 
describe what I took out of the meeting - this could be a totally warped impression of 
what happened - I was pretty tired. So read the random notes too and make your own 
summary. 

The problem we are trying to solve is how to clean up state on all the elements. The 
major two cases we need to deal with are 1) something died and will never respond 2) 
something is taking a long time to respond. Case 2 splits into two sub cases. The first 2a 
where it is a automata that is holding up the response and 2b where something is waiting 
for input from a human before responding. 

Some things to keep in mind when evaluating solutions are: systems use different values 
of T1. DNS SRV failures complicate things. A similar problem might be able to happen 
in INVITE transactions where things fork or CANCEL. 

We brainstormed a few possible approaches to this problem. 
-	Set an expire time of some sort on the transaction
 o	If this is absolute, then need synchronized time - Yuck
 o	If we ignore the network time (the hops) and only count the processing 
time at each node (the hipitty) then absolute time would not be needed
 o	Make the time start at 64*T1 for the first node and shrink as the message traverses 
nodes 
 o	won't work
-	Make the time at each node be large enough (i.e. current max hops * 64 * T1) that 
downstream nodes will timeout first. 
 o	64*T1*70 = 37 minutes * 1000 transaction per second is a lot of state 
-	Have any transaction that is going to take awhile, return some code that more or 
less amounts to try back later with the same request and I will give you the answer 
then. 
 o	Not clear how this work when nodes fail instead of just being slow

The reason I came to the meeting was to say that no node should have to hold state 
forever and that it is reasonable to put some upper bound on how long things are allowed 
to take and still be a single transaction. I don't think the problem is solvable without this 
constraint. 



Random Notes:

First question - should we kill proposal B?

Considered  how HTTP deals with the problem. Feeling was HTTP did not deal 
with it yet but would need  to in the future with SOAP. 

Decided that right now we don't have enough information to kill proposal B - 
need to brainstorm some options. 

Jonathan posed the problem as: Imagine the timeout is very long then figure out how to 
clean up state. There is state in user agent, proxies. 

Idea:  allow the proxy to go stateless at some point. 

Consider invite stuff. Proxy may have to maintain state arbitrarily long. Solution 
was to have uas send provisional to refresh timers. Not sure this was true. 

If we make things arbitrarily long, how deal with crashed things...

The ideas of shrinking the timer: The uac will have timer 70*64*t1. Each proxy have 
max hops * 64 * t1. uas has timer 64 *t1 or could possibly max hops *54 *t1.  This 
results in keeping state 37 minutes - all agree 37 minutes does not sound good. 

Invite has this same problem if the cancel fails you can get this cascaded.

Really two problem here - one is failure - other is device is just slow doing it's backend 
processing or something. 

Important point - not all systems have t1 at 500 ms - this can mess up things and we 
have to take it into account. 

The DNS SRV fail over needs to be considered.

Disused the idea of saying you have this much time to process this message. It can get 
decremented along the way. 

Is the whole discussion purely that the timer number is too low. 

If we want to deal with human interaction,  then it is a different problem that time for 
automata to respond. 

If the UAS timer is large enough, then the only time that there is no response is when 
something failed. 

One proposal is make timer bigger than t1*64 and then toss out the 408 somehow. 

Do we let the client continues to allow the transaction to processed using some sort of 
message back or  do we redo the request at some future time when the far side as 
completed the answer?