• Anybody have a REGEX rule to filter garbage characters?

    By Wade J Dugas 2 decades ago

    I get many spams with all garbage characters in the subject (the entire email is garbage as far as I'm concerned). Here is an example:



    ªÂó║ ┴°┴ñ ▒·▓²ÃÐ ÃÃ║╬©ª ┐°Ã¤¢├│¬┐õ? ©­░° ▒Ý╝¸ÃÐ░¸┐í ÃÃ┴÷©ª ┴ª░┼ÃÏ¥▀ Ãı┤¤┤┘ 0,.0 g



    Here is KS_BL_TOKENS for this email:



    Content-Type: 0.4000,multipart: 0.4000,alternative: 0.4000,boundary: 0.4000,219: 0.4000,DF9: 0.4000,–D: 0.4000,Content-Transfer-Encoding: 0.4000,quoted-printable: 0.4000,text: 0.0580,html: 0.1027,new: 0.0727,uid: 0.0238,bufer: 0.9900,626: 0.9900



    KS_BL_PROB is 0.0034. So, this email makes it through the filter. I have KS_BL_IGNORE setup to ignore most of the MIMIE/HTML tokens, but kSpam seems to be ignoring KS_BL_IGNORE :-) (see my other post: http://www.openntf.org/Projects/pmt.nsf/b521bae9087a360985256bed004b4ec4/504c928fbfbd9f948625707c0037749c!OpenDocument )



    There must be a way to block this type of email with a REGEX rule. ( The Bayesian filter in my brain tells me there is only a 0.0001 probability of me being able to write it ! )



    Any ideas?

    • Garbage Character REGEX

      By Dan Cihon 2 decades ago

      I am new to REGEX. Here is something I found on the Internet. I'm not sure how to take what is says and make it work here, but maybe someone else would know.





      Match Non Standard Characters



      The following one is a little odd. Basically, Gravity regexp lists are not preprogrammed ranges, instead the lists are continuous character ranges, according to the current code page. So, the following condition will match a subject line which contains any character out of the "normal" or lowere range of letters, digits, or punctuation marks.

      Note: If you can't read it clearly this is a caret followed by a space followed by a hyphen followed by a tilde.



      Subject contains reg. expr. "[^ -~]"

      Would hit the following article subject. (It may not even appear properly in your browser, it should be an upside down question mark followed by a Y with two horizontal lines through it).



      Hello there ¿ ¥

      If you want to find one oddball character use the character's number with the ALT key. So, to find the funny Y hold ALT and type 157. This method works for the European characters too.



      Subject contains reg. expr. "¥"

      NOTE: You may not be able to enter this with the ALT key in the rule window. You probably need to add a dummy string then edit the string directly in the condition window. Also, you don't need a regular expression to do this. It seems to work in a non-regexp rule condition.

    • Try this REGEX

      By Peter Gloor 2 decades ago

      Try this in the Subject field of a deny rule:


      REGEX#:[^a-zA-Z0-9_ÅÆÇÖÜßäåæçèéêëïöü]{3,}



      Matches if more than three consecutively characters are outside the given set of characters:

      a-z

      A-Z

      0-9

      _

      ÅÆÇÖÜ

      ß

      äåæçèéêëïöü



      This list includes most characters in North Western Europe inclusive France (excl. Irland).

      Add additional characters if you need to accept characters from other languages as well.