• Greatest token prob = 0.5

    By LJ Wilson 2 decades ago

    I've been playing with the linux version for the last 5 days.. The comblist.txt file has 24737 items, 24691 of those with a prob of 0.5, and the rest are less than that. I've got 3223 docs in mailspam.nsf, and 407 in mailgood.nsf. I have yet to see any messages go into mailspam.nsf on their own. This is a RHEL 3.0 ES system running Domino 6.5.1. Everything seems to work ok except for the probabilities being assigned to the tokens. The ks_log.txt showed "Empty buffer returned by NIFReadEntries." until I added a couple of rules–other than that it looks ok to me–typical excerpt for a message:

    03/10/2004 03:40:51 PM - Based on form: Memo

    SUBJECT:Fwd: Discount V|@Gra % Valiu+m+ ) _X_ANAx > V|c^od|n Som.a. >/P/nter

    min fdqncvxubkcy


    FROM:"jayme whitener"

    MAIL_FROM:lspn4027@tpnet.pl

    BODY LENGTH: 768

    scanning message…



    Running Callback

    Running Event

    Running Callback

    Running Event

    03/10/2004 03:42:32 PM - Based on form: Memo

    SUBJECT:Your two cents is worth something

    FROM:PaidSurvey

    MAIL_FROM:b-0Ywxbcefbhfg-ddhjbfe-@msg.tunnelvisson.com

    BODY LENGTH: 1907

    scanning message…



    Running Callback

    Running Event



    notes.ini:

    KS_BAYESIAN_FILTER=1

    KS_BAYESIAN_MARK=1

    KS_BAYESIAN_PREP_2=1

    KS_BL_DUMPLISTS=1

    KS_BL_PERIOD=360

    KS_STATS=1

    KS_DEBUG=1

    KS_BAYESIAN_BOUNDARY=90

    KS_BAYESIAN_RATIO=10

    KS_MIN_FROM_LENGTH=4

    KS_MARK=1



    show stat smtp.kspam.

    SMTP.kSpam.Allow = 1

    SMTP.kSpam.Bayesian.Boundary = 90

    SMTP.kSpam.Bayesian.Filter = On

    SMTP.kSpam.Bayesian.Mark = On

    SMTP.kSpam.Bayesian.Prep = Off

    SMTP.kSpam.Bayesian.Prep_2 = On

    SMTP.kSpam.Bayesian.Ratio = 10

    SMTP.kSpam.Conf.CopyDB = mailspam.nsf

    SMTP.kSpam.Conf.Debug = On

    SMTP.kSpam.Conf.DefaultAction = Accept

    SMTP.kSpam.Conf.FilterFromInt = Off

    SMTP.kSpam.Conf.Mark = On

    SMTP.kSpam.Conf.MaxNumbersInFrom = 0

    SMTP.kSpam.Conf.MinFromLength = 4

    SMTP.kSpam.Conf.Reload = On

    SMTP.kSpam.LastMajorError = Could not create mutex

    SMTP.kSpam.Rule.1 = 1

    SMTP.kSpam.Version = 1.38 beta 2

    18 statistics found

    ***

    Thanks…jack

    • Hmmmm...

      By Tom Lyne 2 decades ago

      I see you have KS_BAYESIAN_MARK enabled, does any of your mail actually have the KS_BL_PROB and KS_BL_TOKENS Notes items when they come through?



      What is the value of the Notes item KS_IID?



      You could try starting out with tainted email in mailgood.nsf and mailspam.nsf, i.e. copy the same spam email into mailspam.nsf 100 times and the same ham email into mailgood.nsf 100 times. Both emails must be completely different, and in fact really only need one or two lines of text in each.



      Then examine the comblist.txt files to see what it contains. Also send through the same spam message to see what happens to it.



      The SMTP.kSpam.LastMajorError could also indicate a problem, if the two KS_BL_XXX fields aren't present on any email then the mutex wasn't created when the Bayesian part of kSpam wanted to get the probabilities from the kSpam.bload addin.



      -tom

      • answers...

        By LJ Wilson 2 decades ago

        Tom, thanks for the quick response!


        1. Yes, I'm seeing KS_BL_PROB and KS_BL_TOKEN, and KS_IID values on the Notes items when they come through. From my last spam email:

          Field Name: KS_BL_PROB

          Data Type: Text

          Data Length: 6 bytes

          Seq Num: 1

          Dup Item ID: 0

          Field Flags: SUMMARY



          "0.0050"



          Field Name: KS_BL_TOKENS

          Data Type: Text List

          Data Length: 245 bytes

          Seq Num: 1

          Dup Item ID: 0

          Field Flags: SUMMARY



          "Fwd: 0.5000"

          "Val: 0.5000"

          "GRa: 0.5000"

          "Nax: 0.5000"

          "Pnt: 0.5000"

          "rmin: 0.5000"

          "oma: 0.5000"

          "ltnjlauoxkvr: 0.4000"

          "Content-Type: 0.5000"

          "multipart: 0.5000"

          "mixed: 0.4000"

          "boundary: 0.5000"

          "–32454412886230924: 0.4000"

          "and: 0.1256"

          "the: 0.1059"

          **




        2. The value of the Notes item KS_IID is always "WCS", which is the domino mail domain. I have my email forwarded from our main (production) domino server running smtp (in domino domain DAV) to an internal server (in a different domain–WCS) so I can test.

          Field Name: KS_IID

          Data Type: Text

          Data Length: 3 bytes

          Seq Num: 1

          Dup Item ID: 0

          Field Flags: SUMMARY

          "WCS"


        3. I'll try this next: "You could try starting out with tainted email in mailgood.nsf and mailspam.nsf, i.e. copy the same spam email into mailspam.nsf 100 times and the same ham email into mailgood.nsf 100 times. Both emails must be completely different, and in fact really only need one or two lines of text in each."



          Then examine the comblist.txt files to see what it contains. Also send through the same spam message to see what happens to it.


        4. I think the "last mutux error" was happening when I was doing a tell kspam quit; load kspam and mail was coming in during the "loading words" part. A KS_IID was always assigned to those emails, but the other two KS_BL_XXX fields were not.
        • Results from 100 in each db:

          By LJ Wilson 2 decades ago
          1. I took one good and one bad email and made 100 copies of each in the mailgood and mailspam db's respectively as you suggested. In comblist.txt I now get 58 items, 24 have 0.500000 prob, and the rest less than that.


          2. Results from sending the same spam message back through after doing the above:



            Field Name: KS_BL_PROB

            Data Type: Text

            Data Length: 6 bytes

            Seq Num: 1

            Dup Item ID: 0

            Field Flags: SUMMARY



            "0.0050"



            Field Name: KS_BL_TOKENS

            Data Type: Text List

            Data Length: 245 bytes

            Seq Num: 1

            Dup Item ID: 0

            Field Flags: SUMMARY



            "Fwd: 0.5000"

            "Val: 0.5000"

            "GRa: 0.5000"

            "Nax: 0.5000"

            "Pnt: 0.5000"

            "rmin: 0.5000"

            "oma: 0.5000"

            "ltnjlauoxkvr: 0.4000"

            "Content-Type: 0.5000"

            "multipart: 0.5000"

            "mixed: 0.4000"

            "boundary: 0.5000"

            "–32454412886230924: 0.4000"

            "and: 0.1256"

            "the: 0.1059"



            Field Name: KS_IID

            Data Type: Text

            Data Length: 3 bytes

            Seq Num: 1

            Dup Item ID: 0

            Field Flags: SUMMARY



            "WCS"
          • Is anyone else having these problems on Linux?

            By Tom Lyne 2 decades ago
            • I am afraid, yes

              By Christian Brandlehner 2 decades ago

              I didn't had the time to dive that deep into bayesian filtering but even after collecting hundred of bad mails the KS_BL_PROB is always too low to have emails filtered. Body/Subject rules are working fine so it is still a powerful tool.

              I am running Domino 6.5 on Linux. Please let me know if you need any additional information.



              Christian

            • Yes.. Me too

              By Bjarte Lunde 2 decades ago
              • Solution for Linux Version

                By LJ Wilson 2 decades ago

                In bload/WordList.c:


                ifndef min

                define min( x, y) ( (x) > (y) ? (x) : (y) )

                endif



                Should read:


                ifndef min

                define min( x, y) ( (x) > (y) ? (y) : (x) )

                endif



                After making that change, I now get probabilities of 0.99 in the comblist.txt file.



                …jack

                • Thanks LJ

                  By Tom Lyne 2 decades ago

                  Is there a chance you could send a recompiled binary to tomlyne com>. That way I can post it here for others who are less adventurous than yourself.



                  -tom

                  • Linux compile notes, Binaries on their way...

                    By LJ Wilson 2 decades ago

                    Glad to do it. It's the least I can do–thanks so much for making this available–next week I'll be recommending we implement this in production.



                    If anyone wants to compile this themselves–here's what I did on my box–a Linux RedHat 3.0 RHEL ES System, running Domino 6.5.1.


                    cat/proc/version

                    Linux version 2.4.21-9.0.1.EL (bhcompile@bugs.devel.redhat.com) (gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-26)) #1 Mon Feb 9 22:44:14 EST 2004


                    gcc -v

                    Reading specs from/usr/lib/gcc-lib/i386-redhat-linux/3.2.3/specs

                    Configured with: ../configure –prefix=/usr –mandir=/usr/share/man –infodir=/usr/share/info –enable-shared –enable-threads=posix –disable-checking –with-system-zlib –enable-__cxa_atexit –host=i386-redhat-linux

                    Thread model: posix

                    gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-24)



                    **

                    Changes made for the kSpam_v1.39b2_SRC.zip files to compile (and return probabilities > 0.5 to comblist.txt):


                    1. bload/WordList.c (fix of the min function–it was returning max!

                      27c27

                      < #define min( x, y) ( (x) > (y) ? (x) : (y) )



                      > #define min( x, y) ( (x) > (y) ? (y) : (x) )


                    2. bload/btree.c (missing a semicolon)

                      120c120

                      < free(node_list[i])



                      > free(node_list[i]);


                    3. bload/linux.mak (copied linux.mak to Makefile before running make–no changes to extmgr/linux.mak, other than copying it to Makefile).

                      45c45

                      < LIBS = -lnotes -lm -lnsl -lc -ldl -lpthread -lresolv



                      > LIBS = -lnotes -lm -lnsl -lc -ldl -lpthread -lresolv -lndgts

                      53a54,56

                      >

                      > clean :

                      > rm -f BayesianAddin.o btree.o Kerrors.o LoadProbabilities.o log.o SaveProbabilities.o WordList.o


                    4. Setup these environment variables (I put mine in root's ~/.bash_profile). These are from the Domino Notes API user guide:

                      lotus capi setup

                      export LOTUS=/opt/lotus

                      export NOTES_DATA_DIR=/home/notes/data

                      export Notes_ExecDirectory=/opt/lotus/notes/latest/linux

                      export PATH="$NOTES_DATA_DIR:$PATH"


                    5. Change the permissions/owner of the binaries after they are compiled:

                      as root:

                      cd/opt/lotus/notes/latest/linux

                      chmod 555 kspam

                      chown root.daemon kspam



                      So they would should look like:

                      -r-xr-xr-x 1 root daemon 56704 Mar 13 07:40 kspam

                      -r-xr-xr-x 1 root daemon 61428 Mar 13 07:34 libkspam.so



                      …jack
                    • new binaries?

                      By Bjarte Lunde 2 decades ago

                      I can't make my own binaries…

                      Any idea of when these will be available?



                      Cheers,

                      • Bjarte
                      • I'll post them wehn I receive them...

                        By Tom Lyne 2 decades ago

                        I don't have a linux box either to compile them on. I've asked Hans to recompile them for me.



                        -tom

                        • Sent them 3/13-just resent again (3/16)

                          By LJ Wilson 2 decades ago

                          Tom, let me know if you get them.

                          • libndgts.so

                            By Bjarte Lunde 2 decades ago

                            When i try to load the new kspam i get the following error:



                            /opt/lotus/notes/latest/linux/kspam: error while loading shared libraries: libndgts.so: cannot open shared object file: No such file or directory



                            Why? And where to find this file?



                            Cheers,

                            • Bjarte
                            • fixed

                              By Bjarte Lunde 2 decades ago

                              Upgraded the server last night to version 6.5.1

                              And now everything works fine…

                              Thanx for a great program and good help!

                              • Thanks for the update! <eom>

                                By LJ Wilson 2 decades ago