Draft: draft-ietf-sipping-spam-03 Reviewer: Vijay K. Gurbani Review Date: Jan 3, 2007 Review Deadline: Dec 31, 2006 Status: WGLC Review Summary: This draft is on the right track but has open issues, described in the review. Draft Summary: This is an excellent survey on spam in general, and SIP-related spam in particular. It ties in nicely the various extensions to SIP (consent, identity, presence, etc.) to solve a particular problem, namely preventing SIP spam. In my opinion, the sooner it gets to RFC status, the better it would be for the SIP user community. Issues needing discussion listed below, followed by some nits. Issues needing discussion: * S2.1, paragraph "This number (9 deliveries ... will be more effective." I am not sure how you are characterizing the first sentence in that paragraph. Certainly, a SMTP server (probably the closest equivalent to a SIP proxy) would be able to, and probably already does deliver 9 or more messages a second. For some existing analysis on SMTP server performance, see [1,2], which describes systems that can handle anywhere from 52 to 5000 messages/second. If however, the 9 deliveries per second refers to the MUA (closest to a SIP UA), then I would agree that it is hard to pin down how many messages per second an individual receives. For instance, sending a message to an enterprise reflector usually always elicits "On vacation" responses, and many times such responses are in numbers greater than 9 messages arriving per second. * S2.2, s/which only allow spam to be/which only allow IM to be I think you meant that the *IM* will be delivered only if the sender is on a white list, not that the *spam* will be delivered only if the sender is on a white list. * In S3.1, the third paragraph is redundant; it restates what the first sentence of the second paragraph alread said. I would suggest taking it out completely, and modifying the first sentence of the second paragraph as follows: OLD: Unfortunately, this type of spam filtering is almost completely useless for call spam. NEW: Unfortunately, this type of spam filtering, while successful for email spam, is completely useless for call spam. * In S3.1, to be complete, you should also add a blurb on whether or not content filtering helps for presence spam (since you mention that it does not help for call spam and may help for IM spam.) I would suggest the addition of the following paragraph at the end of the section: Content filtering will not help for presence spam because by definition, a request subscribing for the presence of a user will be devoid of any content. * In S3.2, you may want to change spammer@domain.com and spammer.com. domain.com is registered to someone in Vancouver, WA, and spammer.com is a registered to someone in Houston, TX. Suggest using spammer@spam.example.com for spammer@domain.com and spam.example.com for spammer.com. spammers.com also appears in S5. * In S3.3, it is worth discussing that in addition to Turing tests, the "introduction problem" can also be solved by tagged addresses, of which the Tagged Message Delivery Agent (TDMA, http://tmda.net/index.html) is a good example. Using tagged addresses, you can limit an email address by temporal interval (i.e., valid only for 5 days) or sender-specific addresses. I do think they are worth discussing (for email, they work well; they have limited utility in call spam, but could be used for IM spam and presence spam.) I will leave it upto you to determine whether a new sub-section should be added in Section 3 to discuss tagged addresses, or you can discuss them as part of S3.7 "Limited Use Addresses". * In S3.4, first complete paragraph on page 11: it is stated in the last sentence that "Fortunately, traditional content-based filtering can be applied to this type of information." I presume the phrase "this type of information" refers to the SIP address "sip:please-buy-my-product-at-this-website@spam.example.com". If so, then this is not content, but a header; more specifically, a From header. Would white- or black-lists not be a better match then content-based filtering in this case? * In S3.5, I don't think I agree with your assertion that "Reputation systems have not enjoyed much success outside of the instant messaging space." (third paragraph in that section.) Certainly, in the business space occupied by eBay and Amazon, reputation is very intrinsic and an important contributor to the smooth functioning of the enterprise. Since eBay and Amazon have authority over their namespace, they can warn other users of identity mining (eBay, for instance, displays a special icon if you have recently changed your identity.) Likewise, anyone who has bought a used book or CD through Amazon first scans the reputation of the seller before pressing the "1-click" button. My suggestion would be to begin the third paragraph using text suggested below: OLD: Reputation systems have not enjoyed much success outside of the instant messaging space. This is in part because few other communications systems admit of the same degree of centralization and monolithic control. NEW: Reputation systems have been successful in communication systems where centralization of resources (user identities, authentication, etc.) and monolithic control dominate. * S3.8, page 15, it is stated at the top of the page that "Automata cannot easily perform the image recognition needed to extract the word or number sequence, but a human user usually can." Note that there exist programs (called CAPTCHAs) that can acheive with 93% correct recognition rate of visual puzzles (see http://www.captcha.net, and [3]). * S3.8, of the two problems with the voice Turing tests mentioned towards the end of the section, the first one appears to *not* be a problem. Yes, voice Turing test would need to be language specific, but as you note, SIP already has provisions to help here. Even if it did not, the experience is no different than what happens now when A calls B, and is instead redirected to B's voicemail which prompts him to enter "1" for A, "2" for A's wife and so on. If A is in the US, and B is in Lisbon, the instructions are rendered in Portugese. A swears a bit, and hangs up, vowing to call B at a later time. The second problem -- cheap labor -- is indeed a problem. * In S3.9, there is no mention of Cullen's hashcash draft? Is that effort dead? * In S3.9, one disadvantage of Payments At Risk scheme is the discrepancy of currency value between the sender and recipient. If the sender is in a third world country (albeit being financed by a spamming corporation in a first world country) and the receiver is in a first world country, then the sender really does not care about not getting his money refunded. The benefactor funding the sender is able to use the currency valuation of his host country to conduct spam. The sender deposits the payment in his local currency, the exchange value of which is much lower when compared to that of his benefactor's country. * Isn't S3.13 similar to VoIP peering? If domains establish a peering relationship, then they could optionally add spamming fines to the service level agreement. In fact, the Circle of Trust idea of S3.12 is a simpler case of what is described in S3.13. Suggestion: remove S3.12, or subsume it in S3.13. * The last statement of S3.13 is a strongly worded opinion of the authors. While in and of itself, this is perfectly okay, I feel that standards documents ought to be a bit more impartial (okay, beat me here if you want.) Instead, I would suggest that the last two sentences of S3.13 be reworded as follows to convey the authors distate in an objective manner: OLD: However, it puts back into SIP much of the complexity and monopolistic structures that SIP promised to eliminate. As such, it is a solution that the authors find distasteful and contrary to the SIP design and architecture. NEW: However, centralized architectures as these are deliberately eschewed because they put back into SIP much of the complexity and mono- polistic structures that the protocol promises to eliminate. * In S6, I would suggest that the three core recommendations be given labels; something like: 1. Enforce SIP Identity ... 2. Leverage White Lists ... 3. Provide a Solution for the Introduction Problem ... Implicitly, the last paragraph contains the fourth recommendation: 4. Do Not Ignore Spam ... * There is no IANA Consideration section (I realize that this is an Informational track RFC, so an IANA Consideration section may not be required. I do not remember if the IESG requires such a section in all drafts, even Informationals.) References [1] Wyman Miles, "A High-Availability High-Performance E-Mail Cluster," Proceedings of the 30th annual ACM SIGUCCS conference on User services, pp. 84-88, November 20-23, 2002. [2] Yasushi Saito, Brian N. Bershad, and Henry M. Levy, "Manageability, availability, and performance in porcupine: a highly scalable, cluster- based mail service," ACM Transactions on Computer Systems (TOCS), pp. 298-332, 18(3), August 2000. [3] Luis von Ahn, Manuel Blum, and John Langford, "Telling Humans and Computers Apart Automatically," Communications of the ACM, pp. 57-60, 47(2), February 2004. Nits * S2.1, page 6, s/addresses, in that, in concert/addresses, such that in concert/ * S3.4, page 11, s/but to make the permission is extremely limited/but to ensure that the permission is extremely limited * S3.5, first paragraph, second sentence: s/and they attempt/and A attempts to/ * S3.5, first paragraph, third sentence: s/from A, a reputation score/from A, and along with the consent, a reputation score/ * S3.5, regarding "reputation mafias" in the fourth paragraph, I am curious: is there any published reference? It seems that such a mafia would neccessarily be a large, distributed, self-organizing system that shares a common goal, although the victims change over time. * S3.8, last paragraph: s/turing tests/Turing tests * In S3.9, I am not sure what the last sentence of the section is supposed to convey. Any research is subject to patent and IP constraints, I suppose; SIP spam is no exception. Best to simply take that sentence out. * S3.10, first paragraph: s/A sends to user B/A sends email to user B/ * S3.10, third paragraph: s/for small transactions, for transaction as small as one cent./ for small transactions, as small as one cent. * S3.11, second sentence of second parargraph: I am not sure I can parse what it says. Maybe you would consider, instead, to simply say: Whether they are, or are not effective remains to be seen. * S 3.11, last sentence of the section: suggest rewrite as follows: OLD: For example, Habeas inserts material in the header that, if a spammer inserted it without an appropriate license, allegedly causes the spammer to be violating US copyright and trademark laws, possibly reciprocal laws, and similar laws in many countries. NEW: For example, Habeas inserts material in the header that, if it was inserted by a spammer without an appropriate license, would allegedly causes the spammer to violate US copyright and trademark laws, possibly reciprocal laws, and similar laws in many countries. * S4.1, expand MTA and MUA on their first use. * Reference 24, the terminating comma should appear before Third Annual Workshop...