Networking

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • New plugin - asking for feedback

    3 answers - 2054 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hi all,
    maybe this list can give me some feedback on a plugin I've written a few
    weeks ago.
    The Plugin is based on parts of the 'NiXSpam' project by the German IT
    magazine iX. NiXSpam is an elaborate procmail recipe (for more info see
    - it's German, though), and it uses a
    cool way of computing hashes from the body of mails to detect highly
    similar ones (which - propably - are spam).
    example:
    Given a mail that has at least 16 spaces in it, NiXSpam does the following:
    - reduce all duplicate occurences of [:space:]-chars to just one
    - remove all characters of the [:graph:]-class
    - then compute a MD5-hash and compare that to existing ones.
    (For procmail code see end of post)
    Now, in NiXSpam this is a purely local thing - hashes are written to a
    file, and procmail subsequently simply does a grep on it.
    However, in march somebody volunteered to feed the hashes computed by
    the iX mail server into a blacklist DNS server. I subsequently wrote a
    plugin for SA () and even
    managed to set up another DNS server with hashes from our own spam.
    Maybe someone is interested in trying out the plugin and report some
    results? I, for my part, was surprised to find the spam that hit us to
    be apparently quite different from that hitting iX.
    BTW: Both tests against our and iX's blacklist hit about 50% of all
    incoming mails, and I've still to find a false positive. For me, this works.
    Dirk
    PS: I think I should mention
    - Bert Ungerer, (iX)
    - Manuel Schmitt, who hosts iX's blacklist
    - KungFuHasi, who posted the perl code computing the hashes @ heise
    In procmail, the above mentioned reads as follows:
    :0B
    * -15^0
    # This checksum requires at least 16 spaces/tabs:
    * 1^1 [ ]
    {
    :0 bw
    md5hash=|tr -s '[:space:]' \
    |tr -d '[:graph:]' \
    |md5sum
    # Hashsumme bereits in der Datei?
    :0 Aw
    * ? fgrep -s $md5hash $HASHFILE
    { KNWN=YES }
    }
  • No.1 | | 771 bytes | |

    Hi Dirk,

    Dirk Bonengel wrote:

    Hi all,

    maybe this list can give me some feedback on a plugin I've written a
    few weeks ago.
    The Plugin is based on parts of the 'NiXSpam' project by the German IT
    magazine iX. NiXSpam is an elaborate procmail recipe (for more info
    see - it's German, though), and it
    uses a cool way of computing hashes from the body of mails to detect
    highly similar ones (which - propably - are spam).

    Correct me if I'm wrong, but this sounds very similar to razor/pyzor/dcc!

    Chris T

    PGP SIGNATURE
    Version: GnuPG v1.2.4 (Darwin)
    Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

    S7r6z1HGxEtlwWvGeUnN1ZM=
    =2P78
    PGP SIGNATURE
  • No.2 | | 1138 bytes | |

    Tue, Jul 26, 2005 at 03:27:09PM +0200, Dirk Bonengel wrote:
    Given a mail that has at least 16 spaces in it, NiXSpam does the following:
    - reduce all duplicate occurences of [:space:]-chars to just one
    - remove all characters of the [:graph:]-class

    Since [:graph:] is anything "printable" except spaces, this basically
    is a hash of punctuation characters. For instance, one test mail of mine
    broke down into: -:< ":/./-$/:/./?"></>-:-
    - then compute a MD5-hash and compare that to existing ones.

    This is a troubling aspect since it's so trivial to do hash busting
    techniques to the break it. Similarly related is the fact this is using
    pristine body so encoding/etc is an issue.

    The results seem to be decent for now, though, ymmv:

    VERALL% SPAM% HAM% S/ RANK SCRE NAME
    25869 22571 3298 0.873 0.00 0.00 (all messages)
    100.000 87.2512 12.7488 0.873 0.00 0.00 (all messages as %)
    23.631 27.0701 0.0000 1.000 0.00 1.50 IXHASH2
    23.662 27.1011 0.0000 1.000 0.00 1.50 IXHASH

    I have some comments about the code too, but that's another discussion. :)
  • No.3 | | 1359 bytes | |

    Yes, you're right.

    The advantage - at least for me - is that, after having set up rbldnsd,
    I can easily access the data I get from the spamtrap addresses we run.
    With razor you can't run your own server, and if you're actually allowed
    to use the client if you use SpamAssassin on a commercial basis is
    unclear (to me at least).
    With DCC and Pyzor you could set up your own server, but would either
    lose the much efficieny of DCC/Pyzor in doing so or would have to tinker
    with the C/Python code to use, say, both your own and the 'real' pyzor
    data store, I guess.

    Chris Thielen schrieb:

    Hi Dirk,

    Dirk Bonengel wrote:
    >
    >Hi all,
    >>

    >maybe this list can give me some feedback on a plugin I've written a
    >few weeks ago.
    >The Plugin is based on parts of the 'NiXSpam' project by the German
    >IT magazine iX. NiXSpam is an elaborate procmail recipe (for more
    >info see - it's German, though), and
    >it uses a cool way of computing hashes from the body of mails to
    >detect highly similar ones (which - propably - are spam).
    >
    >
    >

    Correct me if I'm wrong, but this sounds very similar to razor/pyzor/dcc!

    Chris T

Re: New plugin - asking for feedback


max 4000 letters.
Your nickname that display:
In order to stop the spam: 3 + 2 =
QUESTION ON "Networking"

EMSDN.COM