Tools like Sophos, WatchGuard or IronPort all collect the statistics from the clients and build the generalized set of bayesian tokens distributed back amongst the clients. Bayes filtering is an ultimate weapon but need continuous heavy updating.
Standalone system with small amount of the email is the worst case for bayesian as far as statistics too small for training. Relatively good result can be achieved if there is not less than 500 incoming messages are proceeded per day.
- First all messages should be tested for the basic RFC compliance. If sending host has no reverse DNS record or it has skipped the HELO or else - it can be qualified as spam
- Second we check the sending host's name against our own blacklist (described below).
All messages that catched by this two stages are passed to the bayesian for learning. No messages should be dropped or rejected. If it's ham it should be delivered. If it's spam it should train our bayesian filter.
- Third all messages not catched by stages 1 and 2 are evaluated by bayesian. I prefer the Spamassassin so this messages evaluated not only by bayesian but also by large set of euristics.
Every message succesfully passed all three stages is delivered to the user's mailbox. Sure there is probabilities of the false-positives and false-negatives exists. But users can interact with the mail system marking messages as ham/spam at their wish. Every marked message is passed for the bayesian learning.
Weekly or monthly the postmaster should inspect the logs. All sending host should be ranged by the number of spam catched. Top-10 is a good candidates to be added to the blacklist. Blacklist is just a list of hosts we know as pure spammers. All messages from that hosts should be passed for the bayesian learning immediately.
The more spam is sended to that setup the better it is filtered out. The only interaction needed from the postmaster is to update the blacklist on the regular basis. After few months of training this setup will pass no more than 1 false-negative per 3000-5000 incoming messages.