[ILUG] sa-learn

Darragh Bailey felix at compsoc.nuigalway.ie
Wed Nov 3 12:47:12 GMT 2004


Quoting Timothy Murphy <tim at birdsnest.maths.tcd.ie>:

> On Wednesday 03 November 2004 11:37, Darragh Bailey wrote:
>
> > I'm using mbox format as well and the time taken to perform
> > sa-learn --spam --no-rebuild --mbox ~/mail/spam/spam && sa-learn --ham
> > --no-rebuild --mbox ~/mail/spam/ham && sa-learn --rebuild
> >
> > is about 1 minute.
>
> What does --no-rebuild do?
> It does not seem to be listed by "man sa-learn" on my system (Fedora-2).
>
> Incidentally, timing info seems pretty useless to me
> unless you give some indication of the machine you're using.
>
> My Sony Picturebook (660MHz, 256MB RAM) would certainly take
> at least 9 minutes with the specified load,
> and would certainly not process 2500 messages in 1 minute, as stated.
>
> By the way, what is a "false positive"
> and how did the OP collect 2500 of them?
>

supprised that your man sa-learn doesn't give that option
--no-rebuild = Skip building databases after scan

When scanning lots of spam/ham mails it speeds up the process since it only
resyncs the database when finished.


Dual PIII 850MHz with 512MB Ram. While Spamassassin doesn't benifit from the 2
cpu's it does allow some other work to be done on the other cpu's. But then
there is also 20 other users using the machine (right now), so I would imagine
it should balance itself out.

I actually deleted my current database and ran
time sa-learn --spam --mbox ~/mail/spam/spam && time sa-learn --ham --no-rebuild
--mbox ~/mail/spam/ham && time sa-learn --rebuild

to get a better estimate since it would obviously take longer than normal, to
scan each mail from scratch.

Learned from 1156 message(s) (1232 message(s) examined).

real    2m13.788s
user    1m51.481s
sys     0m9.910s
Learned from 1033 message(s) (1033 message(s) examined).

real    1m5.182s
user    1m3.650s
sys     0m0.818s

synced Bayes databases from journal in 75 seconds: 218531 unique entries (218531
total entries)

real    1m16.769s
user    1m1.198s
sys     0m10.403s

even adding up those times the actual cpu time is 20 seconds. To me it looked as
though it took just over 4m 30s. The 1minute real time stands up though for
subsequence scans.

I still think the attachments are whats more likely causing the problem with the
long processing time in this case.


--
Darragh

"Nothing's foolproof to a sufficently talented fool"



More information about the ILUG mailing list