I've been trying to figure out some of my obvious spam that had been getting through kspam and intended to work out the calculations in Excel. I've included that below.
I finally did get a handle on most of my HTML spam by adding 20, 50 or 100 each of a bunch of HTML formatting codes (i.e. font, FACE, BODY, etc.) that doesn't show up in normal email much, but is valid. I added enough of these into the goodmail.nsf by pasting the same words into a single text mail over and over.
One note. It seems that in order to use the new probabilites it's not enough just to let kspam recalc, which it does every 6 hours by default. I had to unload nspam and reload it (or restart Domino . . .) That forced the new calcs to be applied against new email.
Anyway, here is the method I used in Excel.
Token
Probability<br/>
1 - Probability<br/>
You: 0.2181 0.7819
center: 0.2449 0.7551
idea: 0.0272 0.9728
font: 0.1142 0.8858
div: 0.1005 0.8995
bgcolor:0.0773 0.9227
corp: 0.01 0.99
FF0000: 0.99 0.01
52:01.0 0.99 0.01
44:01.0 0.99 0.01
51:01.0 0.99 0.01
Verdana:0.99 0.01
50:00.0 0.01 0.99
Editor: 0.01 0.99
FrontPage:0.99 0.01
=PRODUCT(E1:E15)<br/>
=PRODUCT(F1:F15)<br/>
=PRODUCT(E1:E15)/(PRODUCT(E1:E15)+PRODUCT(F1:F15)<br/>
Try it, and it works out to withing 5 decimal places of what Kspam is reporting as KS_BL_PROB.