So I implemented this idea. I've got a decent Notes client based java spam filter working. I used different base64 code than the previous post. Also, I had to break writes to a rich text field into 25KB or smaller chunks. See the web links below for details on the theory of my implementation. One java agent builds the "probability table" then stores it in a notes document and another actually assigns new incoming mail a "spamminess" number. I started actually saving my spam last thursday morning. (I'm up to 255 spam messages.) I'm just about ready to actually trust the agent to file away some spam without my looking at it. Right now I run the agent against new mail manually. My probability hash table is already at 1 MB, and base64 unencoding that for every new mail individually seems a bit heavy handed. So right now it does it only when I run the agent, usually once per several emails. Any ideas on how to efficently store a >1 MB java object in a Notes database? I'm going to start filtering out some tokens, but I'll also be adding many more as I get more spam, so I think the object size will only grow. This message may only make sense in terms of this (rather old) thread. If folks are interested in how to implement a pretty good adaptive spam filter based on http://www.paulgraham.com/spam.html and http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html let me know, & I'll put together all the details of my implementation. It's currently ugly code with cryptic comments, so it might take me a few days to clean it up. Jamie Fraser jamie.fraser@doblin.com
|