• Strange behaviour with Bayesian filtering

    By Peter Gloor 2 decades ago

    Currently I observe a very strange behaviour with Bayesian filtering. For the last three weeks or so I recognized an increasing number of spams ending in the users mailbox. Currently kSPAM detects far less than 50% of the spam.



    Example: This morning I cleaned up my new mail folder and confirmed any new spam found in MailSpam as spam. Since than I've got 13 new spam mail in my mailbox, but only one single mail in MailSpam, which has been put there due to a deny rule.



    I've been using kSPAM for about 18 months and have never seen such a worse situation over such a long time (usually more than 95% spam goes to mailspam).



    Today I looked closer at some of the SPAM mails received in my mailbox and detected (to my big surprise) words in KS_BL_TOKENS that don't appear anywhere in the corresponding mails body or subject. In addition KS_BL_TOKENS included words listed in KS_BL_IGNORE (in Notes.INI).



    Does anybody have an explanation for this strange behaviour? Has anybody else seen this?



    For the moment the only thing I can do is to restart the Domino server and hope that helps.



    Depending on the number of documents I regularely cleanup the databases (mailspam/mailgood) by deleting some of the oldest documents. I'm not sure, but I have the feeling that the problem started after the last cleanup.



    Domino is still 5.0.11. kSPam is 1.39b2. I'm not aware of any big changes to the Notes installation/configuration or the kSPAM configuration (except adding a few subject deny and address allow rules).

    • Error messages found in ks_log

      By Peter Gloor 2 decades ago

      Sorry, when writing my post I had no access to the server. Meanwhile I checked ks_log and found a lot of messages with the following text: "scanning message… Could not open shared memory, is addin running? Bayesian data not loaded". In addition the goodlist and spamlist files looked messed up.



      Running bload I see some error messages concerning corrupted documents (not the usual end of data warnings). I guess that one of the databases (mailspam or mailgood) is somehow corrupt. I will try to fixup and compact both and if this does not help I wiil replace them with fresh databases and start rebuilding bayesian filtering from scratch.



      Peter

      • Re:

        By Tom Lyne 2 decades ago

        You seem to be on the right track, the shared memory containing all the tokens and probabilities will not be created if bload doesn't finish it's calculations successfully.



        -tom