Project: kSpam (Managed by Nico Vis, Tom Lyne)
Bayesian becomes ineffective (MIME & Token-limit)
Daniel Stelter on 03/18/2008 at 12:06 PM
I'm running kSpam now for a way long time (great tool!) and I
think I have a good control about it. ;)
(for reference: running kSpam 1.6b2 on Notes 7.0.2FP3 on W2k.)
Since some weeks I noticed that the Bayesian probability system
becomes more and more ineffective.
The problem is with MIME encoded mails.
kSpam limits the Bayesian Token to 15 and I have seen that
when receiving MIME (-Spam), most of the Tokens are stupid
"Content-Type", "multipart", "Content-Transfer-Encoding",
"NextPart" and other "-----"-stuff.
As far as I can see, kSpam takes the RAW-Text (via NSFItemGetText)
of the mail and goes from top-to-bottom to get the first 15 Tokens...
The "Subject" will be taken fine and after that
the mail-body begins its MIME-encoding/decoding text.
And before kSpam gets to the real Body content, the 15 Tokens are
I tried several things... I wrote an Agent to get rid of (better: convert)
MIME mails to RichText (with exact 1 Body containing SPAM message) in
the mailspam.nsf to let bload calculate only "real" SPAM text. So at least I have
now real SPAM-Tokens with real probabilities.
Doesn't make sense when these Tokens never be "touched" for calculation
in incoming mails, I know...
Even setup Bayesian to ignore the described Tokens will not help, because it's
not that the Tokens itself will be ignored, but only their probabilities (think about THIS!).
So what...I don't know...
Is anybody experiencing similar things?
I know (do I?) that kSpam is no longer in development...it's really a pity...
so what chances do we (okay, I) have letting kSpam not die?
|Entered 18-Mar-2008 12:06 by Daniel Stelter. Last Modified <none> by <none>.|
Check out other projects