kSpam - Feature Request: Increase efficiency of loading words for Bayesian filterCreated on |
Mar 14, 2004 |
Last modified on |
Dec 15, 2004 |
Created by |
LJ Wilson |
Last modified by |
Tom Lyne |
Status |
Added to app |
Use goodlist.txt, spamlist.txt, mailgood.nsf, mailspam.nsf to create a cache of tokens already loaded. For example, each time an email is examined in either mailgood.nsf or mailspam.nsf, mark it so it isn't checked again for the purposes of token extraction. Then the next time tokens are loaded, load the existing counts from spamlist.txt and goodlist.txt, then check messages that aren't marked as already being used for tokens in mailgood.nsf and mailspam.nsf to load new tokens/update counts for existing tokens.
Of course you would want to be able to reinitialize the "cache" to start fresh--maybe even a setting to let it recreate from scratch every so often (during off-peak hours, for example) .
Taken Actions by OwnersDocuments