OpenNTF.org - The Open Source Community for Collaboration Solutions

Bayesian becomes ineffective (MIME & Token-limit)
By Daniel Stelter 2 decades ago

I'm running kSpam now for a way long time (great tool!) and I

think I have a good control about it. ;)

(for reference: running kSpam 1.6b2 on Notes 7.0.2FP3 on W2k.)

Since some weeks I noticed that the Bayesian probability system

becomes more and more ineffective.

The problem is with MIME encoded mails.

kSpam limits the Bayesian Token to 15 and I have seen that

when receiving MIME (-Spam), most of the Tokens are stupid

"Content-Type", "multipart", "Content-Transfer-Encoding",

"NextPart" and other "—–"-stuff.

As far as I can see, kSpam takes the RAW-Text (via NSFItemGetText)

of the mail and goes from top-to-bottom to get the first 15 Tokens…

The "Subject" will be taken fine and after that

the mail-body begins its MIME-encoding/decoding text.

And before kSpam gets to the real Body content, the 15 Tokens are

filled. Hm…

I tried several things… I wrote an Agent to get rid of (better: convert)

MIME mails to RichText (with exact 1 Body containing SPAM message) in

the mailspam.nsf to let bload calculate only "real" SPAM text. So at least I have

now real SPAM-Tokens with real probabilities.

Doesn't make sense when these Tokens never be "touched" for calculation

in incoming mails, I know…

Even setup Bayesian to ignore the described Tokens will not help, because it's

not that the Tokens itself will be ignored, but only their probabilities (think about THIS!).

So what…I don't know…

Is anybody experiencing similar things?

I know (do I?) that kSpam is no longer in development…it's really a pity…

so what chances do we (okay, I) have letting kSpam not die?

-Daniel.

kspam is alive
By Nico Vis 2 decades ago

Hi Daniel,

thank you for you post.

Even if the last version were released in october-november we are always looking for new ideas to be implemented.

The calculation of the bayesian tokens is always a "problem". Depending on the method you use there are good and bad points.

For the bayesian part I populate the dbs with internet mail only, in both databaes, so that "Content-Type" and the other stuff like this are balanced. More you can set to ignore this tokens in the configuration document.

Sometime it is useful to tokenize mime messages because parts of encoded images and files can be tokenized (it works with virus) and will help to block next messages with different image or zip names but same content.

We'll think about it.

Regards

Nico

I'm glad to hear
By Daniel Stelter 2 decades ago

Hi Nico -

thank you for reading… ;)

Sure you're right. There are good reasons keeping mime messages tokenized. But at least on my side other software is taking about viruses…

With the "Ignore Token" possibility it's up to the (kSpam-) Admin whether he wants to take care about the "Content-Type" stuff or not… that's good!

I'm still playing with adding ignore tokens to the conf doc but it still fills up the 15 available tokens for the probability calculation.

Looking from "outside", an easy solution to fit all needs is to have the option to really ignore the ignored tokens…so at the end, each mail has 15 "real" tokens… but I think on the programming side, it's not so easy. As far as I understood, bload takes care about the ignore tokens only at startup while loading/calculating tokens…to really ignore tokens, bload has to check the ignore list realtime when receiving mails…tricky…

What could (maybe?) easier…: what about increasing the count of tokens a mail will get? Maybe configurable in the conf doc. I'm not sure what other effects this may have (performance? mem usage?)…

Other thoughts/ideas…?!

It's time for Easter now…have a good time!

Cheers -Daniel.

More tokens
By Nico Vis 2 decades ago

Hi Daniel,

thank you for you post, it's a good idea, I will check this possibility.

Have a nice Easter!

;-)

kSpam Download

kspam 2.0 for linux 32Bit - All,Open

kspam 2.0 release - All,Open

Linux Intel 64-bit for Domino 9? - All,Open

Clustered servers not refreshing Config - All,Open

goodlist.txt and spamlist.txt and "bload" task??? - All,Open

Source Code ? - All,Open

Error when starting kSpam - All,Open

kSpam on W2008 64bit Domino 8.5 - All,Open

No Description - All,Open

No Description - All,Open

Accuracy Not Good - All,Open

Anyone Running kSpam on Domino 8.5 Linux? - All,Open

mail from / from reason - All,Open

many [?SPAM?] tags - All,Open

Lots of Spam tagged as good - All,Open

Mailgood.nsf Agent creates Whitelist Allow-rules - All,Open

kSpam on Linux with Lotus Domino 8 - All,Open

bload.txt - All,Open

Mailspam.nsf Webform for end-users - All,Open

Wierd Deny List results - All,Open

Bayesian becomes ineffective (MIME & Token-limit)

kspam is alive

I'm glad to hear

More tokens

kSpam