Networking

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Scoring base64 blob messages

    9 answers - 590 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    I received a spam today where the text was only a base64-encoded blob.
    Content-Type: text/html;
    charset="us-ascii"
    Content-Transfer-Encoding: base64
    Subject: feel young and strong again
    Does SA convert the blob into text before scanning? It contains a number
    of drug-related words and a URI that points to "pharmconnect.org".
    Also is there an SA rule that scores messages that contain only a single
    base64 part (as opposed to a base64-encoded attachment)? I doubt many
    legitimate messages arrive with only a single base64 part.
    Peter
  • No.1 | | 1177 bytes | |

    Peter H. Lemieux wrote:
    I received a spam today where the text was only a base64-encoded blob.

    Content-Type: text/html;
    charset="us-ascii"
    Content-Transfer-Encoding: base64
    Subject: feel young and strong again

    --
    Does SA convert the blob into text before scanning?
    Yes. It's done that for a LNG time Even SA 2.3x did that. Even
    "rawbody" rules are run after decoding base64.

    this would be a huge hole in SA and every spammer would very
    quickly use base64 for all their spam. (Yes, spammers D very
    aggressively study spamassassin and tune their mail to fit it's
    weaknesses. VERY aggressively. Anything this obvious and easy would be
    discovered and become widespread within two months of a SA release.)
    It contains a number of drug-related words and a URI that points to
    "pharmconnect.org".

    Also is there an SA rule that scores messages that contain only a
    single base64 part (as opposed to a base64-encoded attachment)? I
    doubt many legitimate messages arrive with only a single base64 part.
    No, but there is one that detects base64 encoding of text sections.
    MIME_BASE64_TEXT.
  • No.2 | | 509 bytes | |

    Thu, 26, 2006 at 09:46:28AM -0400, Peter H. Lemieux wrote:
    Does SA convert the blob into text before scanning? It contains a number
    of drug-related words and a URI that points to "pharmconnect.org".

    Yes.

    Also is there an SA rule that scores messages that contain only a single
    base64 part (as opposed to a base64-encoded attachment)? I doubt many
    legitimate messages arrive with only a single base64 part.

    No, because there are going to be a lot of mails that would hit that.
  • No.3 | | 1139 bytes | |

    Theo Van Dinter wrote:
    Thu, 26, 2006 at 09:46:28AM -0400, Peter H. Lemieux wrote:
    >Does SA convert the blob into text before scanning? It contains a number
    >of drug-related words and a URI that points to "pharmconnect.org".


    Yes.

    I was pretty sure this was the case but wanted to confirm it.

    >Also is there an SA rule that scores messages that contain only a single
    >base64 part (as opposed to a base64-encoded attachment)? I doubt many
    >legitimate messages arrive with only a single base64 part.


    No, because there are going to be a lot of mails that would hit that.

    Really? Maybe it's because I live in the US, but I can't think of a
    legitimate message I've ever received consisting only of a base64 blob.
    of curiosity, how frequently does this appear in the SA ham corpus?
    Rather than making anyone else do the work for me, is there something I
    can read about how to determine the frequency of different message
    features appearing in the corpus?

    Thanks, Theo.

    Peter
  • No.4 | | 1111 bytes | |

    Thu, 26, 2006 at 12:19:23PM -0400, Peter H. Lemieux wrote:
    >No, because there are going to be a lot of mails that would hit that.


    Really? Maybe it's because I live in the US, but I can't think of a
    legitimate message I've ever received consisting only of a base64 blob.

    You look at a lot of raw messages? ;)

    of curiosity, how frequently does this appear in the SA ham corpus?

    Well, there isn't "a" SA corpus, so there's no answer to that question. As
    for how often it happens in my corpus, I don't know I'd have to write a rule
    and run it against the messages.

    Rather than making anyone else do the work for me, is there something I
    can read about how to determine the frequency of different message
    features appearing in the corpus?

    You can generate some rules and use mass-check to run against your own corpus
    to gather some statistics. I'm willing to run some rules for you against my
    corpus if you want. I just don't have time to come up with the rules right
    now.
  • No.5 | | 1004 bytes | |

    Peter H. Lemieux wrote:
    Theo Van Dinter wrote:
    >Thu, 26, 2006 at 09:46:28AM -0400, Peter H. Lemieux wrote:

    Also is there an SA rule that scores messages that contain only a
    single base64 part (as opposed to a base64-encoded attachment)? I
    doubt many legitimate messages arrive with only a single base64 part.
    >>

    >No, because there are going to be a lot of mails that would hit that.


    Really? Maybe it's because I live in the US, but I can't think of a
    legitimate message I've ever received consisting only of a base64 blob.
    of curiosity, how frequently does this appear in the SA ham corpus?
    Rather than making anyone else do the work for me, is there something I
    can read about how to determine the frequency of different message
    features appearing in the corpus?

    Most messages sent from a Blackberry would hit this rule, for example.
  • No.6 | | 978 bytes | |

    Peter H. Lemieux wrote:
    Theo Van Dinter wrote:
    >Thu, 26, 2006 at 09:46:28AM -0400, Peter H. Lemieux wrote:


    Also is there an SA rule that scores messages that contain only a
    single base64 part (as opposed to a base64-encoded attachment)? I
    doubt many legitimate messages arrive with only a single base64 part.
    >>

    >No, because there are going to be a lot of mails that would hit that.


    Really? Maybe it's because I live in the US, but I can't think of a
    legitimate message I've ever received consisting only of a base64 blob.

    I'm not sure what to say to that. ;)

    of curiosity, how frequently does this appear in the SA ham corpus?

    Ticketmaster sends out *a lot* of their mail this way. I'm sure it's
    partly in an attempt to avoid having their mail FP against crappy filters.

    Daryl
  • No.7 | | 438 bytes | |

    Fri, 27, 2006 at 11:44:48AM -0400, Daryl C. W. 'Shea wrote:
    Ticketmaster sends out *a lot* of their mail this way. I'm sure it's
    partly in an attempt to avoid having their mail FP against crappy filters.

    I'd also imagine that sometimes it's just easier to do this than try to pay
    attention to what is being sent and determine if encoding is necessary.
    Programmers tend to be lazy after all. :)
  • No.8 | | 1867 bytes | |

    Theo Van Dinter wrote:
    Thu, 26, 2006 at 12:19:23PM -0400, Peter H. Lemieux wrote:
    No, because there are going to be a lot of mails that would hit that.
    >Really? Maybe it's because I live in the US, but I can't think of a
    >legitimate message I've ever received consisting only of a base64 blob.


    You look at a lot of raw messages? ;)

    Doesn't everybody?

    Seriously, I do look at a lot of raw messages; for instance, I review the
    full text of nearly every spam message that doesn't get caught by my
    filters and shows up in my inbox. I don't get much mail from
    Blackberry users or Ticketmaster!

    >Rather than making anyone else do the work for me, is there something I
    >can read about how to determine the frequency of different message
    >features appearing in the corpus?


    Well, there isn't "a" SA corpus, so there's no answer to that question.

    Ah, I hadn't read this page before:

    My recollection was that 2.x used a centrally-defined corpus rather than
    a variety of developers' corpora (see, I read the wiki). Either things
    changed with the switch in scoring algorithms in 3.x, or my recollection
    is shoddy. Probably the latter.

    You can generate some rules and use mass-check to run against your own corpus
    to gather some statistics. I'm willing to run some rules for you against my
    corpus if you want. I just don't have time to come up with the rules right
    now.

    Thanks for the offer, Theo, but don't spend your valuable time on this.
    I'll give it shot some day when I've got some spare moments. If I do get
    some candidate rules, I'll pass them along to you for testing.

    Thanks again!
    Peter
  • No.9 | | 685 bytes | |

    Fri, 27, 2006 at 05:24:58PM -0400, Peter H. Lemieux wrote:
    >Well, there isn't "a" SA corpus, so there's no answer to that question.


    Ah, I hadn't read this page before:

    My recollection was that 2.x used a centrally-defined corpus rather than
    a variety of developers' corpora (see, I read the wiki). Either things
    changed with the switch in scoring algorithms in 3.x, or my recollection
    is shoddy. Probably the latter.

    Yeah, sorry. We've had separate corpora since I started with SA several years
    ago. There was a "public corpus" of mail made available which could be
    confusing your memory. :)

Re: Scoring base64 blob messages


max 4000 letters.
Your nickname that display:
In order to stop the spam: 6 + 5 =
QUESTION ON "Networking"

EMSDN.COM