[ILUG] some fun bayes tokens

Justin Mason jm at jmason.org
Mon Nov 4 17:59:09 GMT 2002


Padraig Brady said:
> > This is nice -- mail sent from a Red Hat Linux box is only 0.1% likely to
> > be spam, in my corpus ;)
> 
> How many messages?

129, as far as I can see... (see below)

> >   N:H:X-Mailer:iNNN-redhat-linux 129 0.00173812278080732
> 
> What resolution do you require? Since it's multiplicative
> wouldn't 0.01 be enough or at most 0.001?

yes -- it's just an artifact of the float representation.  BTW it's
actually got 0 spam signs against it, but it's capped at 0.01 so that 1
strong non-spam sign can't outweigh many not-quite-as-strong spam signs.
statistically, it works better that way.

John -- dunno about release just yet, there's a good bit of QA we
have to do first.  But CVS works quite nicely right now ;)

--j.



More information about the ILUG mailing list