# Auto learn spam/ham with Dovecot `imap_sieve` plugin [TOC] ## Summary Dovecot offers plugin `imap_sieve` to run sieve script for spam/virus scanning, it's useful to let end users report spam/ham messages within webmail or MUA, then on server side we call SpamAssassin to learn the reported messages. The more spams/hams end users reported, the more precisely SpamAssassin can catch the spams. This tutorial shows you how to enable Dovecot plugin `imap_sieve` and create required shell/sieve scripts to learn spams automatically. After setup, you can encourage end users to report spam messages by moving/dragging spam to `Junk` folder. With more spams reported, your iRedMail server can precisely catch more spams. ## Requirements - A working iRedMail server. - Dovecot version 2.2.24 or later. `imap_sieve` plugin is available in version 2.2.24 and later releases. - CentOS 7 ships Dovecot-2.2.36 - Debian 9 ships Dovecot-2.2.27 - Ubuntu 16.04 ships Dovecot-2.2.22 (__WARNING: Not qualified__) - Ubuntu 18.04 ships Dovecot-2.2.33 - OpenBSD 6.4 ships Dovecot-2.2.36 - FreeBSD ships Dovecot-3.x in ports tree ## Enable `imap_sieve` plugin Please update Dovecot config file `/etc/dovecot/dovecot.conf` to: * Enable new parameter `mail_attribute_dict` globally. * Enable new plugin `imap_sieve` in `protocol imap {}` section. * Add required settings for `imap_sieve` in `plugin {}` section. ``` # Store METADATA information within user's HOME directory mail_attribute_dict = file:%h/dovecot-attributes protocol imap { ... mail_plugins = ... imap_sieve } plugin { sieve_plugins = sieve_imapsieve sieve_extprograms imapsieve_url = sieve://127.0.0.1:4190 # From elsewhere to Junk folder imapsieve_mailbox1_name = Junk imapsieve_mailbox1_causes = COPY APPEND imapsieve_mailbox1_before = file:/var/vmail/sieve/report_spam.sieve # From Junk folder to elsewhere imapsieve_mailbox2_name = * imapsieve_mailbox2_from = Junk imapsieve_mailbox2_causes = COPY imapsieve_mailbox2_before = file:/var/vmail/sieve/report_ham.sieve sieve_pipe_bin_dir = /etc/dovecot/sieve/pipe sieve_global_extensions = +vnd.dovecot.pipe +vnd.dovecot.environment } ``` ## Create required directories and files We will create few directories and files used by `imap_sieve` plugin: * Directories: - `/etc/dovecot/sieve/pipe`: used to store script called by `imap_sieve` plugin. - `/var/vmail/imapsieve_copy`: used to store reported spam/ham emails. * Files: - `/var/vmail/sieve/report_spam.sieve`: used to save a copy of reported spam. - `/var/vmail/sieve/report_ham.sieve`: used to save a copy of reported ham. * Shell script: - `/etc/dovecot/sieve/pipe/imapsieve_copy` Create directories: ``` mkdir -p /etc/dovecot/sieve/pipe mkdir -p /var/vmail/imapsieve_copy chown vmail:vmail /var/vmail/imapsieve_copy chmod 0700 /var/vmail/imapsieve_copy ``` Create file `/var/vmail/sieve/report_spam.sieve` with content below: ``` require ["vnd.dovecot.pipe", "copy", "imapsieve", "environment", "variables"]; if environment :matches "imap.user" "*" { set "username" "${1}"; } pipe :copy "imapsieve_copy" [ "${username}", "spam" ]; ``` Create file `/var/vmail/sieve/report_ham.sieve` with content below: ``` require ["vnd.dovecot.pipe", "copy", "imapsieve", "environment", "variables"]; if environment :matches "imap.mailbox" "*" { set "mailbox" "${1}"; } if string "${mailbox}" "Trash" { stop; } if environment :matches "imap.user" "*" { set "username" "${1}"; } pipe :copy "imapsieve_copy" [ "${username}", "ham" ]; ``` Create file `/etc/dovecot/sieve/pipe/imapsieve_copy` with content below: ``` #!/usr/bin/env bash # Author: Zhang Huangbin # Purpose: Read full email message from stdin, and save to a local file. # Usage: bash imapsieve_copy export USER="$1" export MSG_TYPE="$2" export OUTPUT_BASE_DIR="/var/vmail/imapsieve_copy" export OUTPUT_DIR="${OUTPUT_BASE_DIR}/${MSG_TYPE}" export FILE="${OUTPUT_DIR}/${USER}-$(date +%Y%m%d%H%M%S)-${RANDOM}${RANDOM}.eml" export OWNER="vmail" export GROUP="vmail" for dir in "${OUTPUT_BASE_DIR}" "${OUTPUT_DIR}"; do if [[ ! -d ${dir} ]]; then mkdir -p ${dir} chown ${OWNER}:${GROUP} ${dir} chmod 0700 ${dir} fi done cat > ${FILE} < /dev/stdin # Logging #export LOG='logger -p local5.info -t imapsieve_copy' #[[ $? == 0 ]] && ${LOG} "Copied one ${MSG_TYPE} email reported by ${USER}: ${FILE}" ``` Set correct file owner and permissions: ``` chown vmail:vmail /var/vmail/sieve/report_spam.sieve \ /var/vmail/sieve/report_ham.sieve \ /etc/dovecot/sieve/pipe/imapsieve_copy chmod 0700 /var/vmail/sieve/report_spam.sieve \ /var/vmail/sieve/report_ham.sieve \ /etc/dovecot/sieve/pipe/imapsieve_copy ``` Restart Dovecot service to enable this plugin. ``` service dovecot restart ``` ## Setup cron job to scan and learn spam/ham messages Dovecot can now save a copy of reported spam/ham automatically, we still need a shell script to call SpamAssassin to actually learn spam/ham periodly. Create script `/etc/dovecot/sieve/scan_reported_mails.sh` with content below, it's used to call `sa-learn` command to learn reported spam/ham emails: !!! attention If you're running FreeBSD or OpenBSD, please change the Amavisd daemon user name in variable `AMAVISD_USER` below. ``` #!/usr/bin/env bash # Author: Zhang Huangbin # Purpose: Copy spam/ham to another directory and call sa-learn to learn. # Paths to find program. export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" export OWNER="vmail" export GROUP="vmail" # The Amavisd daemon user. # Note: on OpenBSD, it's "_vscan". On FreeBSD, it's "vscan". export AMAVISD_USER='amavis' # Kernel name, in upper cases. export KERNEL_NAME="$(uname -s | tr '[a-z]' '[A-Z]')" # A temporary lock file. should be removed after successfully examed messages. export LOCK_FILE='/tmp/scan_reported_mails.lock' # Logging to syslog with 'logger' command. export LOG='logger -p local5.info -t scan_reported_mails' # `sa-learn` command, with optional arguments. export SA_LEARN="sa-learn -u ${AMAVISD_USER}" # Spool directory. # Must be owned by vmail:vmail. export SPOOL_DIR='/var/vmail/imapsieve_copy' # Directories which store spam and ham emails. # These 2 should be created while setup Dovecot antispam plugin. export SPOOL_SPAM_DIR="${SPOOL_DIR}/spam" export SPOOL_HAM_DIR="${SPOOL_DIR}/ham" # Directory used to store emails we're going to process. # We will copy new spam/ham messages to these directories, scan them, then # remove them. export SPOOL_LEARN_SPAM_DIR="${SPOOL_DIR}/processing/spam" export SPOOL_LEARN_HAM_DIR="${SPOOL_DIR}/processing/ham" if [ -e ${LOCK_FILE} ]; then find $(dirname ${LOCK_FILE}) -maxdepth 1 -ctime 1 "$(basename ${LOCK_FILE})" >/dev/null 2>&1 if [ X"$?" == X'0' ]; then rm -f ${LOCK_FILE} >/dev/null 2>&1 else ${LOG} "Lock file exists (${LOCK_FILE}), abort." exit fi fi for dir in "${SPOOL_DIR}" "${SPOOL_LEARN_SPAM_DIR}" "${SPOOL_LEARN_HAM_DIR}"; do if [[ ! -d ${dir} ]]; then mkdir -p ${dir} fi chown ${OWNER}:${GROUP} ${dir} chmod 0700 ${dir} done # If there're a lot files, direct `mv` command may fail with error like # `argument list too long`, so we need `find` in this case. if [[ X"${KERNEL_NAME}" == X'OPENBSD' ]]; then [[ -d ${SPOOL_SPAM_DIR} ]] && find ${SPOOL_SPAM_DIR} -name '*.eml' -exec mv {} ${SPOOL_LEARN_SPAM_DIR}/ \; [[ -d ${SPOOL_HAM_DIR} ]] && find ${SPOOL_HAM_DIR} -name '*.eml' -exec mv {} ${SPOOL_LEARN_HAM_DIR}/ \; else [[ -d ${SPOOL_SPAM_DIR} ]] && find ${SPOOL_SPAM_DIR} -name '*.eml' -exec mv -t ${SPOOL_LEARN_SPAM_DIR}/ {} + [[ -d ${SPOOL_HAM_DIR} ]] && find ${SPOOL_HAM_DIR} -name '*.eml' -exec mv -t ${SPOOL_LEARN_HAM_DIR}/ {} + fi # Try to delete empty directory, if failed, that means we have some messages to # scan. rmdir ${SPOOL_LEARN_SPAM_DIR} &>/dev/null if [[ X"$?" != X'0' ]]; then output="$(${SA_LEARN} --spam ${SPOOL_LEARN_SPAM_DIR})" rm -rf ${SPOOL_LEARN_SPAM_DIR} &>/dev/null ${LOG} '[SPAM]' ${output} fi rmdir ${SPOOL_LEARN_HAM_DIR} &>/dev/null if [[ X"$?" != X'0' ]]; then output="$(${SA_LEARN} --ham ${SPOOL_LEARN_HAM_DIR})" rm -rf ${SPOOL_LEARN_HAM_DIR} &>/dev/null ${LOG} '[CLEAN]' ${output} fi rm -f ${LOCK_FILE} &>/dev/null ``` Run command `crontab -e -u root` to setup cron job for root user, scan emails every 10 minutes: ``` # iRedMail: Scan reported mails. */10 * * * * /bin/bash /etc/dovecot/sieve/scan_reported_mails.sh ``` ## Tests ### Report spam: Move email from Inbox to Junk - Login to webmail or any IMAP client like Outlook/Thunderbird. - Move an email from `Inbox` folder to `Junk` folder. In Dovecot log file `/var/log/dovecot/imap.log` (or `dovecot.log` if you didn't configure syslog daemon to separate log content), you should see log lines like below: ``` Jan 31 21:10:42 c7 dovecot: imap(): sieve: pipe action: piped message to program `imapsieve_copy' Jan 31 21:10:42 c7 dovecot: imap(): sieve: left message in mailbox 'Junk' Jan 31 21:10:42 c7 dovecot: imap(): expunge: box=INBOX, uid=7, msgid=, size=7805, from=, subject= ``` In the meantime, you should see an email in `/var/vmail/imapsieve_copy/spam/`, file name in `--.eml` format. ### Report ham: Move email from Junk to any other folder (except `Trash`) If you found a clean email in `Junk` folder, just move it from `Junk` to any other folder except `Trash`. In Dovecot log file `/var/log/dovecot/imap.log` (or `dovecot.log`), you should see log lines like below: ``` Jan 31 21:15:51 c7 dovecot: imap(): sieve: pipe action: piped message to program `imapsieve_copy' Jan 31 21:15:51 c7 dovecot: imap(): sieve: left message in mailbox 'INBOX' Jan 31 21:15:51 c7 dovecot: imap(): expunge: box=Junk, uid=7, msgid=, size=7805, from=, subject= ``` In the meantime, you should see an email in `/var/vmail/imapsieve_copy/ham/`, file name in `--.eml` format. ### Scan reported mails It's ok to run the script manually to scan reported mails: ``` bash /etc/dovecot/sieve/scan_reported_mails.sh ``` If it scanned messages, it will log a message in `/var/log/syslog` or `/var/log/messages` like this: ``` Jan 31 04:51:34 mail scan_reported_mails: [CLEAN] Learned tokens from 1 message(s) (1 message(s) examined) Jan 31 05:03:16 mail scan_reported_mails: [SPAM] Learned tokens from 1 message(s) (1 message(s) examined) ``` ### Check detailed bayes learning log on command line You can either [turn on debug mode in Amavisd and SpamAssassin](./debug.amavisd.html) to check how bayes learning works in SpamAssassin, or run `sa-learn` manually to check it with a sample email. To check on command line, please upload/save a sample email to `/opt/sample.eml`, then run `sa-learn` as root user: ``` # su -s /bin/bash amavis -c "spamassassin -D bayes < /opt/sample.eml" May 21 05:27:08.244 [32241] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x2fe8cb8), bayes_store_module=Mail::SpamAssassin::BayesStore::MySQL May 21 05:27:08.264 [32241] dbg: bayes: using username: amavis May 21 05:27:08.264 [32241] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::MySQL=HASH(0x387a1c8) M ... ``` ## Check bayes data Run `sa-learn` with `--dump` argument will show the bayes data like below: ``` # sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 3778575 0 non-token data: nspam 0.000 0 6326326 0 non-token data: nham 0.000 0 539978 0 non-token data: ntokens 0.000 0 1558372204 0 non-token data: oldest atime 0.000 0 1558415857 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 1558415403 0 non-token data: last expiry atime 0.000 0 43200 0 non-token data: last expire atime delta 0.000 0 59325 0 non-token data: last expire reduction count ``` * `nspam` means number of learnt spams. * `nham` means number of learnt ham/clean emails. ## References * [Dovecot wiki: Antispam with Sieve](https://wiki.dovecot.org/HowTo/AntispamWithSieve) You may notice a difference between current tutorial and Dovecot wiki tutorial: our setup saves reported mails and scan it later with `sa-learn` by cron job, but setup in Dovecot wiki calls `sa-learn` directly. We make this change due to performance issue: when user moves a message to `Junk` folder, webmail will wait for `sa-learn` to finish the scan then return responsive, but if user moves a log messages at the same time, webmail will hang and user have to wait there. This is not good user experience.