Thanks to GerryM @https://forum.iredmail.org/
14 KiB
Auto learn spam/ham with Dovecot imap_sieve
plugin
[TOC]
!!! attention
This feature is enabled by default if your iRedMail server was deployed
with our [iRedMail Easy platform](https://www.iredmail.org/easy.html).
!!! warning
The bayesian classifier can only score new messages after it already learn
200 known spams and 200 known hams.
Summary
Dovecot offers plugin imap_sieve
to run sieve script for spam/virus scanning,
it's useful to let end users report spam/ham messages within webmail or MUA,
then on server side we call SpamAssassin to learn the reported messages. The
more spams/hams end users reported, the more precisely SpamAssassin can catch
the spams.
This tutorial shows you how to enable Dovecot plugin imap_sieve
and create
required shell/sieve scripts to learn spams automatically.
After setup, you can encourage end users to report spam messages by
moving/dragging spam to Junk
folder. With more spams reported, your iRedMail
server can precisely catch more spams.
Requirements
- A working iRedMail server.
- Dovecot version 2.2.24 or later.
imap_sieve
plugin is available in version 2.2.24 and later releases.- CentOS 7 ships Dovecot-2.2.36
- Debian 9 ships Dovecot-2.2.27
- Ubuntu 16.04 ships Dovecot-2.2.22 (WARNING: Not qualified)
- Ubuntu 18.04 ships Dovecot-2.2.33
- OpenBSD 6.4 ships Dovecot-2.2.36
- FreeBSD ships Dovecot-3.x in ports tree
Enable imap_sieve
plugin
Please update Dovecot config file /etc/dovecot/dovecot.conf
to:
- Enable new parameter
mail_attribute_dict
globally. - Enable new plugin
imap_sieve
inprotocol imap {}
section. - Add required settings for
imap_sieve
inplugin {}
section.
# Store METADATA information within user's HOME directory
mail_attribute_dict = file:%Lh/dovecot-attributes
protocol imap {
...
mail_plugins = ... imap_sieve
}
plugin {
sieve_plugins = sieve_imapsieve sieve_extprograms
imapsieve_url = sieve://127.0.0.1:4190
# From elsewhere to Junk folder
imapsieve_mailbox1_name = Junk
imapsieve_mailbox1_causes = COPY APPEND
imapsieve_mailbox1_before = file:/var/vmail/sieve/report_spam.sieve
# From Junk folder to elsewhere
imapsieve_mailbox2_name = *
imapsieve_mailbox2_from = Junk
imapsieve_mailbox2_causes = COPY
imapsieve_mailbox2_before = file:/var/vmail/sieve/report_ham.sieve
sieve_pipe_bin_dir = /etc/dovecot/sieve/pipe
sieve_global_extensions = +vnd.dovecot.pipe +vnd.dovecot.environment
}
Create required directories and files
We will create few directories and files used by imap_sieve
plugin:
- Directories:
/etc/dovecot/sieve/pipe
: used to store script called byimap_sieve
plugin./var/vmail/imapsieve_copy
: used to store reported spam/ham emails.
- Files:
/var/vmail/sieve/report_spam.sieve
: used to save a copy of reported spam./var/vmail/sieve/report_ham.sieve
: used to save a copy of reported ham.
- Shell script:
/etc/dovecot/sieve/pipe/imapsieve_copy
Create directories:
mkdir -p /etc/dovecot/sieve/pipe
mkdir -p /var/vmail/imapsieve_copy
chown vmail:vmail /var/vmail/imapsieve_copy
chmod 0700 /var/vmail/imapsieve_copy
Create file /var/vmail/sieve/report_spam.sieve
with content below:
require ["vnd.dovecot.pipe", "copy", "imapsieve", "environment", "variables"];
if environment :matches "imap.user" "*" {
set "username" "${1}";
}
pipe :copy "imapsieve_copy" [ "${username}", "spam" ];
Create file /var/vmail/sieve/report_ham.sieve
with content below:
require ["vnd.dovecot.pipe", "copy", "imapsieve", "environment", "variables"];
if environment :matches "imap.mailbox" "*" {
set "mailbox" "${1}";
}
if string "${mailbox}" "Trash" {
stop;
}
if environment :matches "imap.user" "*" {
set "username" "${1}";
}
pipe :copy "imapsieve_copy" [ "${username}", "ham" ];
Create file /etc/dovecot/sieve/pipe/imapsieve_copy
with content below:
#!/usr/bin/env bash
# Author: Zhang Huangbin <zhb@iredmail.org>
# Purpose: Read full email message from stdin, and save to a local file.
# Usage: bash imapsieve_copy <email> <spam|ham> <output_base_dir>
export USER="$1"
export MSG_TYPE="$2"
export OUTPUT_BASE_DIR="/var/vmail/imapsieve_copy"
export OUTPUT_DIR="${OUTPUT_BASE_DIR}/${MSG_TYPE}"
export FILE="${OUTPUT_DIR}/${USER}-$(date +%Y%m%d%H%M%S)-${RANDOM}${RANDOM}.eml"
export OWNER="vmail"
export GROUP="vmail"
for dir in "${OUTPUT_BASE_DIR}" "${OUTPUT_DIR}"; do
if [[ ! -d ${dir} ]]; then
mkdir -p ${dir}
chown ${OWNER}:${GROUP} ${dir}
chmod 0700 ${dir}
fi
done
cat > ${FILE} < /dev/stdin
# Logging
#export LOG='logger -p local5.info -t imapsieve_copy'
#[[ $? == 0 ]] && ${LOG} "Copied one ${MSG_TYPE} email reported by ${USER}: ${FILE}"
Set correct file owner and permissions:
chown vmail:vmail /var/vmail/sieve/report_spam.sieve \
/var/vmail/sieve/report_ham.sieve \
/etc/dovecot/sieve/pipe/imapsieve_copy
chmod 0700 /var/vmail/sieve/report_spam.sieve \
/var/vmail/sieve/report_ham.sieve \
/etc/dovecot/sieve/pipe/imapsieve_copy
Restart Dovecot service to enable this plugin.
service dovecot restart
Setup cron job to scan and learn spam/ham messages
Dovecot can now save a copy of reported spam/ham automatically, we still need a shell script to call SpamAssassin to actually learn spam/ham periodly.
Create script /etc/dovecot/sieve/scan_reported_mails.sh
with content below,
it's used to call sa-learn
command to learn reported spam/ham emails:
!!! attention
If you're running FreeBSD or OpenBSD, please change the Amavisd daemon
user name in variable `AMAVISD_USER` below.
#!/usr/bin/env bash
# Author: Zhang Huangbin <zhb@iredmail.org>
# Purpose: Copy spam/ham to another directory and call sa-learn to learn.
# Paths to find program.
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH"
export OWNER="vmail"
export GROUP="vmail"
# The Amavisd daemon user.
# Note: on OpenBSD, it's "_vscan". On FreeBSD, it's "vscan".
export AMAVISD_USER='amavis'
export AMAVISD_USER_HOMEDIR="$(eval echo ~${AMAVISD_USER})"
# Kernel name, in upper cases.
export KERNEL_NAME="$(uname -s | tr '[a-z]' '[A-Z]')"
# A temporary lock file. should be removed after successfully examed messages.
export LOCK_FILE='/tmp/scan_reported_mails.lock'
# Logging to syslog with 'logger' command.
export LOG='logger -p local5.info -t scan_reported_mails'
# `sa-learn` command, with optional arguments.
export SA_LEARN="sa-learn -u ${AMAVISD_USER} --dbpath ${AMAVISD_USER_HOMEDIR}/.spamassassin"
# Spool directory.
# Must be owned by vmail:vmail.
export SPOOL_DIR='/var/vmail/imapsieve_copy'
# Directories which store spam and ham emails.
# These 2 should be created while setup Dovecot antispam plugin.
export SPOOL_SPAM_DIR="${SPOOL_DIR}/spam"
export SPOOL_HAM_DIR="${SPOOL_DIR}/ham"
# Directory used to store emails we're going to process.
# We will copy new spam/ham messages to these directories, scan them, then
# remove them.
export SPOOL_LEARN_SPAM_DIR="${SPOOL_DIR}/processing/spam"
export SPOOL_LEARN_HAM_DIR="${SPOOL_DIR}/processing/ham"
if [ -e ${LOCK_FILE} ]; then
find $(dirname ${LOCK_FILE}) -maxdepth 1 -ctime 1 "$(basename ${LOCK_FILE})" >/dev/null 2>&1
if [ X"$?" == X'0' ]; then
rm -f ${LOCK_FILE} >/dev/null 2>&1
else
${LOG} "Lock file exists (${LOCK_FILE}), abort."
exit
fi
fi
for dir in "${SPOOL_DIR}" "${SPOOL_LEARN_SPAM_DIR}" "${SPOOL_LEARN_HAM_DIR}"; do
if [[ ! -d ${dir} ]]; then
mkdir -p ${dir}
fi
chown ${OWNER}:${GROUP} ${dir}
chmod 0700 ${dir}
done
# If there're a lot files, direct `mv` command may fail with error like
# `argument list too long`, so we need `find` in this case.
if [[ X"${KERNEL_NAME}" == X'OPENBSD' ]] || [[ X"${KERNEL_NAME}" == X'FREEBSD' ]]; then
[[ -d ${SPOOL_SPAM_DIR} ]] && find ${SPOOL_SPAM_DIR} -name '*.eml' -exec mv {} ${SPOOL_LEARN_SPAM_DIR}/ \;
[[ -d ${SPOOL_HAM_DIR} ]] && find ${SPOOL_HAM_DIR} -name '*.eml' -exec mv {} ${SPOOL_LEARN_HAM_DIR}/ \;
else
[[ -d ${SPOOL_SPAM_DIR} ]] && find ${SPOOL_SPAM_DIR} -name '*.eml' -exec mv -t ${SPOOL_LEARN_SPAM_DIR}/ {} +
[[ -d ${SPOOL_HAM_DIR} ]] && find ${SPOOL_HAM_DIR} -name '*.eml' -exec mv -t ${SPOOL_LEARN_HAM_DIR}/ {} +
fi
# Try to delete empty directory, if failed, that means we have some messages to
# scan.
rmdir ${SPOOL_LEARN_SPAM_DIR} &>/dev/null
if [[ X"$?" != X'0' ]]; then
output="$(${SA_LEARN} --spam ${SPOOL_LEARN_SPAM_DIR})"
rm -rf ${SPOOL_LEARN_SPAM_DIR} &>/dev/null
${LOG} '[SPAM]' ${output}
fi
rmdir ${SPOOL_LEARN_HAM_DIR} &>/dev/null
if [[ X"$?" != X'0' ]]; then
output="$(${SA_LEARN} --ham ${SPOOL_LEARN_HAM_DIR})"
rm -rf ${SPOOL_LEARN_HAM_DIR} &>/dev/null
${LOG} '[CLEAN]' ${output}
fi
rm -f ${LOCK_FILE} &>/dev/null
Run command crontab -e -u root
to setup cron job for root user, scan emails
every 10 minutes:
# iRedMail: Scan reported mails.
*/10 * * * * /bin/bash /etc/dovecot/sieve/scan_reported_mails.sh
Tests
Report spam: Move email from Inbox to Junk
!!! attention
If you're running Roundcube webmail, you can enable its plugin `markasjunk`
to help move spam to Junk folder with one click.
- Login to webmail or any IMAP client like Outlook/Thunderbird.
- Move an email from
Inbox
folder toJunk
folder.
In Dovecot log file /var/log/dovecot/imap.log
(or dovecot.log
if you didn't
configure syslog daemon to separate log content), you should see log lines like
below:
Jan 31 21:10:42 c7 dovecot: imap(<email>): sieve: pipe action: piped message to program `imapsieve_copy'
Jan 31 21:10:42 c7 dovecot: imap(<email>): sieve: left message in mailbox 'Junk'
Jan 31 21:10:42 c7 dovecot: imap(<email>): expunge: box=INBOX, uid=7, msgid=, size=7805, from=<email>, subject=<subject>
In the meantime, you should see an email in /var/vmail/imapsieve_copy/spam/
,
file name in <email>-<timestamp>-<random_number>.eml
format.
Report ham: Move email from Junk to any other folder (except Trash
)
If you found a clean email in Junk
folder, just move it from Junk
to any
other folder except Trash
.
In Dovecot log file /var/log/dovecot/imap.log
(or dovecot.log
), you should
see log lines like below:
Jan 31 21:15:51 c7 dovecot: imap(<email>): sieve: pipe action: piped message to program `imapsieve_copy'
Jan 31 21:15:51 c7 dovecot: imap(<email>): sieve: left message in mailbox 'INBOX'
Jan 31 21:15:51 c7 dovecot: imap(<email>): expunge: box=Junk, uid=7, msgid=, size=7805, from=<email>, subject=<subject>
In the meantime, you should see an email in /var/vmail/imapsieve_copy/ham/
,
file name in <email>-<timestamp>-<random_number>.eml
format.
Scan reported mails
It's ok to run the script manually to scan reported mails:
bash /etc/dovecot/sieve/scan_reported_mails.sh
If it scanned messages, it will log a message in /var/log/syslog
or
/var/log/messages
like this:
Jan 31 04:51:34 mail scan_reported_mails: [CLEAN] Learned tokens from 1 message(s) (1 message(s) examined)
Jan 31 05:03:16 mail scan_reported_mails: [SPAM] Learned tokens from 1 message(s) (1 message(s) examined)
Check detailed bayes learning log on command line
You can either turn on debug mode in Amavisd and SpamAssassin
to check how bayes learning works in SpamAssassin, or run sa-learn
manually
to check it with a sample email.
To check on command line, please upload/save a sample email to
/opt/sample.eml
, then run sa-learn
as root user:
# su -s /bin/bash - amavis -c "spamassassin -D bayes < /opt/sample.eml"
May 21 05:27:08.244 [32241] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x2fe8cb8), bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
May 21 05:27:08.264 [32241] dbg: bayes: using username: amavis
May 21 05:27:08.264 [32241] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x387a1c8)
M
...
Check bayes data
Run sa-learn
as Amavisd daemon user with --dump
argument will show the bayes data like below:
# su -s /bin/bash amavis -c "sa-learn --dump magic"
0.000 0 3 0 non-token data: bayes db version
0.000 0 3778575 0 non-token data: nspam
0.000 0 6326326 0 non-token data: nham
0.000 0 539978 0 non-token data: ntokens
0.000 0 1558372204 0 non-token data: oldest atime
0.000 0 1558415857 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal sync atime
0.000 0 1558415403 0 non-token data: last expiry atime
0.000 0 43200 0 non-token data: last expire atime delta
0.000 0 59325 0 non-token data: last expire reduction count
nspam
means number of learnt spams.nham
means number of learnt ham/clean emails.
See also
References
-
Dovecot wiki: Antispam with Sieve
You may notice a difference between current tutorial and Dovecot wiki tutorial: our setup saves reported mails and scan it later with
sa-learn
by cron job, but setup in Dovecot wiki callssa-learn
directly. We make this change due to performance issue: when user moves a message toJunk
folder, webmail will wait forsa-learn
to finish the scan then return responsive, but if user moves a log messages at the same time, webmail will hang and user have to wait there. This is not good user experience.