Exim and Spam Filtering

Question

for years I've been using Exim as MTA. I had three strategies to prevent Spam. First of all blacklists. The second was to delay the mail delivery and the third was SpamAssassin.

But this year, more and more Spam got through the filter, IMHO the spam is sent by hacked servers / accounts.

On the other hand, I hear from customers with a firewall subscription like Sophos and WatchGuard, that they have almost no Spam any more.

I also tried to adapt the config and switch from SpamAssassin to RSpamd, but instead I got more and more false positives.

I also tried the methods, described on the Github Site of exim: https://github.com/Exim/exim/wiki/SpamFiltering

But most of the information is outdated.

Can some one tell me, what's the 2018 method to get rid of Spam with Exim?

I'm still pretty happy with my Exim + spamassassin setup. I do verification callout on the envelope sender; this stops a lot. Some so-called legitimate mail does fail this, so I have a cronjob that reports what senders were blocked temporarily, so I can whitelist if needed. Also check the HELO hostname that they don't try to use your own, which also stops an amazing amount. — wurtel
– wurtel, Commented Oct 1, 2018 at 8:41
rspamd needs a bit of setup before it works well. Go through the modules making sure that things make sense. Make sure the greylisting is enabled from exim and try to work out what goes wrong with your false positives. — okapi
– okapi, Commented Dec 9, 2019 at 11:57

Kondybas · Accepted Answer · 2018-10-01 08:45:57Z

Tools like Sophos, WatchGuard or IronPort all collect the statistics from the clients and build the generalized set of bayesian tokens distributed back amongst the clients. Bayes filtering is an ultimate weapon but need continuous heavy updating.

Standalone system with small amount of the email is the worst case for bayesian as far as statistics too small for training. Relatively good result can be achieved if there is not less than 500 incoming messages are proceeded per day.

First all messages should be tested for the basic RFC compliance. If sending host has no reverse DNS record or it has skipped the HELO or else - it can be qualified as spam
Second we check the sending host's name against our own blacklist (described below).

All messages that catched by this two stages are passed to the bayesian for learning. No messages should be dropped or rejected. If it's ham it should be delivered. If it's spam it should train our bayesian filter.

Third all messages not catched by stages 1 and 2 are evaluated by bayesian. I prefer the Spamassassin so this messages evaluated not only by bayesian but also by large set of euristics.

Every message succesfully passed all three stages is delivered to the user's mailbox. Sure there is probabilities of the false-positives and false-negatives exists. But users can interact with the mail system marking messages as ham/spam at their wish. Every marked message is passed for the bayesian learning.

Weekly or monthly the postmaster should inspect the logs. All sending host should be ranged by the number of spam catched. Top-10 is a good candidates to be added to the blacklist. Blacklist is just a list of hosts we know as pure spammers. All messages from that hosts should be passed for the bayesian learning immediately.

The more spam is sended to that setup the better it is filtered out. The only interaction needed from the postmaster is to update the blacklist on the regular basis. After few months of training this setup will pass no more than 1 false-negative per 3000-5000 incoming messages.

Thank you for you comment. May I ask you, how you pass the Mails, that are matching the first to criteria to SA to train the bayesian filter? And how can a user interact with the mail system to mark a mail as SPAM? Because adding a string at the bottom of the mail seems a bad idea. — user39063
– user39063, Commented Oct 1, 2018 at 10:44
exim allows to create the transport having the pipe as a driver. Some router can use that transport to pass the message directly to the sa-learn. But that is not very useful because sa-learn have a huge overhead for warming-up. Way more efficient is to store all the spam messages into some directory and launch the sa-learn periodically, say once an hour or even once a day. Users can interact with system via IMAP server like dovecot that can launch different scripts as a reaction on the message moved to/from Spam folder. — Kondybas
– Kondybas, Commented Oct 1, 2018 at 11:24

Stack Exchange Network

Exim and Spam Filtering

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Exim and Spam Filtering

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions