2019-01-31 22:50:18 -06:00
<!DOCTYPE html>
< html >
< head >
< meta http-equiv = "Content-Type" content = "text/html; charset=utf-8" / >
< title > Auto learn spam/ham with Dovecot `imap_sieve` plugin< / title >
< link rel = "stylesheet" type = "text/css" href = "./css/markdown.css" / >
< / head >
< body >
2019-07-13 06:21:55 -05:00
2019-01-31 22:50:18 -06:00
< div id = "navigation" >
< a href = "https://www.iredmail.org" target = "_blank" >
< img alt = "iRedMail web site"
src="./images/logo-iredmail.png"
style="vertical-align: middle; height: 30px;"
/>
< span > iRedMail< / span >
< / a >
// < a href = "./index.html" > Document Index< / a > < / div > < h1 id = "auto-learn-spamham-with-dovecot-imap_sieve-plugin" > Auto learn spam/ham with Dovecot < code > imap_sieve< / code > plugin< / h1 >
< div class = "toc" >
< ul >
< li > < a href = "#auto-learn-spamham-with-dovecot-imap_sieve-plugin" > Auto learn spam/ham with Dovecot imap_sieve plugin< / a > < ul >
< li > < a href = "#summary" > Summary< / a > < / li >
< li > < a href = "#requirements" > Requirements< / a > < / li >
< li > < a href = "#enable-imap_sieve-plugin" > Enable imap_sieve plugin< / a > < / li >
< li > < a href = "#create-required-directories-and-files" > Create required directories and files< / a > < / li >
< li > < a href = "#setup-cron-job-to-scan-and-learn-spamham-messages" > Setup cron job to scan and learn spam/ham messages< / a > < / li >
< li > < a href = "#tests" > Tests< / a > < ul >
< li > < a href = "#report-spam-move-email-from-inbox-to-junk" > Report spam: Move email from Inbox to Junk< / a > < / li >
< li > < a href = "#report-ham-move-email-from-junk-to-any-other-folder-except-trash" > Report ham: Move email from Junk to any other folder (except Trash)< / a > < / li >
< li > < a href = "#scan-reported-mails" > Scan reported mails< / a > < / li >
2019-05-21 00:34:06 -05:00
< li > < a href = "#check-detailed-bayes-learning-log-on-command-line" > Check detailed bayes learning log on command line< / a > < / li >
2019-01-31 22:50:18 -06:00
< / ul >
< / li >
2019-05-21 00:34:06 -05:00
< li > < a href = "#check-bayes-data" > Check bayes data< / a > < / li >
2019-01-31 22:50:18 -06:00
< li > < a href = "#references" > References< / a > < / li >
< / ul >
< / li >
< / ul >
< / div >
< h2 id = "summary" > Summary< / h2 >
< p > Dovecot offers plugin < code > imap_sieve< / code > to run sieve script for spam/virus scanning,
it's useful to let end users report spam/ham messages within webmail or MUA,
then on server side we call SpamAssassin to learn the reported messages. The
more spams/hams end users reported, the more precisely SpamAssassin can catch
the spams.< / p >
< p > This tutorial shows you how to enable Dovecot plugin < code > imap_sieve< / code > and create
required shell/sieve scripts to learn spams automatically.< / p >
< p > After setup, you can encourage end users to report spam messages by
moving/dragging spam to < code > Junk< / code > folder. With more spams reported, your iRedMail
server can precisely catch more spams.< / p >
< h2 id = "requirements" > Requirements< / h2 >
< ul >
< li > A working iRedMail server.< / li >
< li > Dovecot version 2.2.24 or later. < code > imap_sieve< / code > plugin is available in version
2.2.24 and later releases.< ul >
< li > CentOS 7 ships Dovecot-2.2.36< / li >
< li > Debian 9 ships Dovecot-2.2.27< / li >
< li > Ubuntu 16.04 ships Dovecot-2.2.22 (< strong > WARNING: Not qualified< / strong > )< / li >
< li > Ubuntu 18.04 ships Dovecot-2.2.33< / li >
< li > OpenBSD 6.4 ships Dovecot-2.2.36< / li >
< li > FreeBSD ships Dovecot-3.x in ports tree< / li >
< / ul >
< / li >
< / ul >
< h2 id = "enable-imap_sieve-plugin" > Enable < code > imap_sieve< / code > plugin< / h2 >
< p > Please update Dovecot config file < code > /etc/dovecot/dovecot.conf< / code > to:< / p >
< ul >
< li > Enable new parameter < code > mail_attribute_dict< / code > globally.< / li >
< li > Enable new plugin < code > imap_sieve< / code > in < code > protocol imap {}< / code > section.< / li >
< li > Add required settings for < code > imap_sieve< / code > in < code > plugin {}< / code > section.< / li >
< / ul >
< pre > < code > # Store METADATA information within user's HOME directory
mail_attribute_dict = file:%h/dovecot-attributes
protocol imap {
...
mail_plugins = ... imap_sieve
}
plugin {
sieve_plugins = sieve_imapsieve sieve_extprograms
imapsieve_url = sieve://127.0.0.1:4190
2019-03-27 02:31:14 -06:00
# From elsewhere to Junk folder
2019-01-31 22:50:18 -06:00
imapsieve_mailbox1_name = Junk
2019-03-27 02:31:14 -06:00
imapsieve_mailbox1_causes = COPY APPEND
2019-01-31 22:50:18 -06:00
imapsieve_mailbox1_before = file:/var/vmail/sieve/report_spam.sieve
2019-03-27 02:31:14 -06:00
# From Junk folder to elsewhere
2019-01-31 22:50:18 -06:00
imapsieve_mailbox2_name = *
imapsieve_mailbox2_from = Junk
imapsieve_mailbox2_causes = COPY
imapsieve_mailbox2_before = file:/var/vmail/sieve/report_ham.sieve
sieve_pipe_bin_dir = /etc/dovecot/sieve/pipe
sieve_global_extensions = +vnd.dovecot.pipe +vnd.dovecot.environment
}
< / code > < / pre >
< h2 id = "create-required-directories-and-files" > Create required directories and files< / h2 >
< p > We will create few directories and files used by < code > imap_sieve< / code > plugin:< / p >
< ul >
< li > Directories:< ul >
< li > < code > /etc/dovecot/sieve/pipe< / code > : used to store script called by < code > imap_sieve< / code > plugin.< / li >
< li > < code > /var/vmail/imapsieve_copy< / code > : used to store reported spam/ham emails.< / li >
< / ul >
< / li >
< li > Files:< ul >
< li > < code > /var/vmail/sieve/report_spam.sieve< / code > : used to save a copy of reported spam.< / li >
< li > < code > /var/vmail/sieve/report_ham.sieve< / code > : used to save a copy of reported ham.< / li >
< / ul >
< / li >
< li > Shell script:< ul >
< li > < code > /etc/dovecot/sieve/pipe/imapsieve_copy< / code > < / li >
< / ul >
< / li >
< / ul >
< p > Create directories:< / p >
< pre > < code > mkdir -p /etc/dovecot/sieve/pipe
mkdir -p /var/vmail/imapsieve_copy
chown vmail:vmail /var/vmail/imapsieve_copy
chmod 0700 /var/vmail/imapsieve_copy
< / code > < / pre >
< p > Create file < code > /var/vmail/sieve/report_spam.sieve< / code > with content below:< / p >
< pre > < code > require [" vnd.dovecot.pipe" , " copy" , " imapsieve" , " environment" , " variables" ];
if environment :matches " imap.user" " *" {
set " username" " ${1}" ;
}
pipe :copy " imapsieve_copy" [ " ${username}" , " spam" ];
< / code > < / pre >
< p > Create file < code > /var/vmail/sieve/report_ham.sieve< / code > with content below:< / p >
< pre > < code > require [" vnd.dovecot.pipe" , " copy" , " imapsieve" , " environment" , " variables" ];
if environment :matches " imap.mailbox" " *" {
set " mailbox" " ${1}" ;
}
if string " ${mailbox}" " Trash" {
stop;
}
if environment :matches " imap.user" " *" {
set " username" " ${1}" ;
}
pipe :copy " imapsieve_copy" [ " ${username}" , " ham" ];
< / code > < / pre >
< p > Create file < code > /etc/dovecot/sieve/pipe/imapsieve_copy< / code > with content below:< / p >
< pre > < code > #!/usr/bin/env bash
# Author: Zhang Huangbin < zhb@iredmail.org>
# Purpose: Read full email message from stdin, and save to a local file.
# Usage: bash imapsieve_copy < email> < spam|ham> < output_base_dir>
export USER=" $1"
export MSG_TYPE=" $2"
export OUTPUT_BASE_DIR=" /var/vmail/imapsieve_copy"
export OUTPUT_DIR=" ${OUTPUT_BASE_DIR}/${MSG_TYPE}"
export FILE=" ${OUTPUT_DIR}/${USER}-$(date +%Y%m%d%H%M%S)-${RANDOM}${RANDOM}.eml"
export OWNER=" vmail"
export GROUP=" vmail"
for dir in " ${OUTPUT_BASE_DIR}" " ${OUTPUT_DIR}" ; do
if [[ ! -d ${dir} ]]; then
mkdir -p ${dir}
chown ${OWNER}:${GROUP} ${dir}
chmod 0700 ${dir}
fi
done
cat > ${FILE} < /dev/stdin
# Logging
#export LOG='logger -p local5.info -t imapsieve_copy'
#[[ $? == 0 ]] & & ${LOG} " Copied one ${MSG_TYPE} email reported by ${USER}: ${FILE}"
< / code > < / pre >
< p > Set correct file owner and permissions:< / p >
< pre > < code > chown vmail:vmail /var/vmail/sieve/report_spam.sieve \
/var/vmail/sieve/report_ham.sieve \
/etc/dovecot/sieve/pipe/imapsieve_copy
chmod 0700 /var/vmail/sieve/report_spam.sieve \
/var/vmail/sieve/report_ham.sieve \
/etc/dovecot/sieve/pipe/imapsieve_copy
< / code > < / pre >
< p > Restart Dovecot service to enable this plugin.< / p >
< pre > < code > service dovecot restart
< / code > < / pre >
< h2 id = "setup-cron-job-to-scan-and-learn-spamham-messages" > Setup cron job to scan and learn spam/ham messages< / h2 >
< p > Dovecot can now save a copy of reported spam/ham automatically, we still need
a shell script to call SpamAssassin to actually learn spam/ham periodly.< / p >
< p > Create script < code > /etc/dovecot/sieve/scan_reported_mails.sh< / code > with content below,
it's used to call < code > sa-learn< / code > command to learn reported spam/ham emails:< / p >
2019-05-21 00:11:47 -05:00
< div class = "admonition attention" >
< p class = "admonition-title" > Attention< / p >
< p > If you're running FreeBSD or OpenBSD, please change the Amavisd daemon
user name in variable < code > AMAVISD_USER< / code > below.< / p >
< / div >
2019-01-31 22:50:18 -06:00
< pre > < code > #!/usr/bin/env bash
# Author: Zhang Huangbin < zhb@iredmail.org>
# Purpose: Copy spam/ham to another directory and call sa-learn to learn.
# Paths to find program.
export PATH=" /bin:/usr/bin:/usr/local/bin:$PATH"
export OWNER=" vmail"
export GROUP=" vmail"
2019-05-21 00:11:47 -05:00
# The Amavisd daemon user.
# Note: on OpenBSD, it's " _vscan" . On FreeBSD, it's " vscan" .
export AMAVISD_USER='amavis'
2019-01-31 22:50:18 -06:00
# Kernel name, in upper cases.
export KERNEL_NAME=" $(uname -s | tr '[a-z]' '[A-Z]')"
# A temporary lock file. should be removed after successfully examed messages.
export LOCK_FILE='/tmp/scan_reported_mails.lock'
# Logging to syslog with 'logger' command.
export LOG='logger -p local5.info -t scan_reported_mails'
# `sa-learn` command, with optional arguments.
2019-05-21 00:11:47 -05:00
export SA_LEARN=" sa-learn -u ${AMAVISD_USER}"
2019-01-31 22:50:18 -06:00
# Spool directory.
# Must be owned by vmail:vmail.
export SPOOL_DIR='/var/vmail/imapsieve_copy'
# Directories which store spam and ham emails.
# These 2 should be created while setup Dovecot antispam plugin.
export SPOOL_SPAM_DIR=" ${SPOOL_DIR}/spam"
export SPOOL_HAM_DIR=" ${SPOOL_DIR}/ham"
# Directory used to store emails we're going to process.
# We will copy new spam/ham messages to these directories, scan them, then
# remove them.
export SPOOL_LEARN_SPAM_DIR=" ${SPOOL_DIR}/processing/spam"
export SPOOL_LEARN_HAM_DIR=" ${SPOOL_DIR}/processing/ham"
if [ -e ${LOCK_FILE} ]; then
find $(dirname ${LOCK_FILE}) -maxdepth 1 -ctime 1 " $(basename ${LOCK_FILE})" > /dev/null 2> & 1
if [ X" $?" == X'0' ]; then
rm -f ${LOCK_FILE} > /dev/null 2> & 1
else
${LOG} " Lock file exists (${LOCK_FILE}), abort."
exit
fi
fi
for dir in " ${SPOOL_DIR}" " ${SPOOL_LEARN_SPAM_DIR}" " ${SPOOL_LEARN_HAM_DIR}" ; do
if [[ ! -d ${dir} ]]; then
mkdir -p ${dir}
fi
chown ${OWNER}:${GROUP} ${dir}
chmod 0700 ${dir}
done
# If there're a lot files, direct `mv` command may fail with error like
# `argument list too long`, so we need `find` in this case.
if [[ X" ${KERNEL_NAME}" == X'OPENBSD' ]]; then
[[ -d ${SPOOL_SPAM_DIR} ]] & & find ${SPOOL_SPAM_DIR} -name '*.eml' -exec mv {} ${SPOOL_LEARN_SPAM_DIR}/ \;
[[ -d ${SPOOL_HAM_DIR} ]] & & find ${SPOOL_HAM_DIR} -name '*.eml' -exec mv {} ${SPOOL_LEARN_HAM_DIR}/ \;
else
[[ -d ${SPOOL_SPAM_DIR} ]] & & find ${SPOOL_SPAM_DIR} -name '*.eml' -exec mv -t ${SPOOL_LEARN_SPAM_DIR}/ {} +
[[ -d ${SPOOL_HAM_DIR} ]] & & find ${SPOOL_HAM_DIR} -name '*.eml' -exec mv -t ${SPOOL_LEARN_HAM_DIR}/ {} +
fi
# Try to delete empty directory, if failed, that means we have some messages to
# scan.
rmdir ${SPOOL_LEARN_SPAM_DIR} & > /dev/null
if [[ X" $?" != X'0' ]]; then
output=" $(${SA_LEARN} --spam ${SPOOL_LEARN_SPAM_DIR})"
rm -rf ${SPOOL_LEARN_SPAM_DIR} & > /dev/null
${LOG} '[SPAM]' ${output}
fi
rmdir ${SPOOL_LEARN_HAM_DIR} & > /dev/null
if [[ X" $?" != X'0' ]]; then
output=" $(${SA_LEARN} --ham ${SPOOL_LEARN_HAM_DIR})"
rm -rf ${SPOOL_LEARN_HAM_DIR} & > /dev/null
${LOG} '[CLEAN]' ${output}
fi
rm -f ${LOCK_FILE} & > /dev/null
< / code > < / pre >
< p > Run command < code > crontab -e -u root< / code > to setup cron job for root user, scan emails
every 10 minutes:< / p >
< pre > < code > # iRedMail: Scan reported mails.
*/10 * * * * /bin/bash /etc/dovecot/sieve/scan_reported_mails.sh
< / code > < / pre >
< h2 id = "tests" > Tests< / h2 >
< h3 id = "report-spam-move-email-from-inbox-to-junk" > Report spam: Move email from Inbox to Junk< / h3 >
< ul >
< li > Login to webmail or any IMAP client like Outlook/Thunderbird.< / li >
< li > Move an email from < code > Inbox< / code > folder to < code > Junk< / code > folder.< / li >
< / ul >
< p > In Dovecot log file < code > /var/log/dovecot/imap.log< / code > (or < code > dovecot.log< / code > if you didn't
configure syslog daemon to separate log content), you should see log lines like
below:< / p >
< pre > < code > Jan 31 21:10:42 c7 dovecot: imap(< email> ): sieve: pipe action: piped message to program `imapsieve_copy'
Jan 31 21:10:42 c7 dovecot: imap(< email> ): sieve: left message in mailbox 'Junk'
Jan 31 21:10:42 c7 dovecot: imap(< email> ): expunge: box=INBOX, uid=7, msgid=, size=7805, from=< email> , subject=< subject>
< / code > < / pre >
< p > In the meantime, you should see an email in < code > /var/vmail/imapsieve_copy/spam/< / code > ,
file name in < code > < email> -< timestamp> -< random_number> .eml< / code > format.< / p >
< h3 id = "report-ham-move-email-from-junk-to-any-other-folder-except-trash" > Report ham: Move email from Junk to any other folder (except < code > Trash< / code > )< / h3 >
< p > If you found a clean email in < code > Junk< / code > folder, just move it from < code > Junk< / code > to any
other folder except < code > Trash< / code > .< / p >
< p > In Dovecot log file < code > /var/log/dovecot/imap.log< / code > (or < code > dovecot.log< / code > ), you should
see log lines like below:< / p >
< pre > < code > Jan 31 21:15:51 c7 dovecot: imap(< email> ): sieve: pipe action: piped message to program `imapsieve_copy'
Jan 31 21:15:51 c7 dovecot: imap(< email> ): sieve: left message in mailbox 'INBOX'
Jan 31 21:15:51 c7 dovecot: imap(< email> ): expunge: box=Junk, uid=7, msgid=, size=7805, from=< email> , subject=< subject>
< / code > < / pre >
< p > In the meantime, you should see an email in < code > /var/vmail/imapsieve_copy/ham/< / code > ,
file name in < code > < email> -< timestamp> -< random_number> .eml< / code > format.< / p >
< h3 id = "scan-reported-mails" > Scan reported mails< / h3 >
< p > It's ok to run the script manually to scan reported mails:< / p >
< pre > < code > bash /etc/dovecot/sieve/scan_reported_mails.sh
< / code > < / pre >
< p > If it scanned messages, it will log a message in < code > /var/log/syslog< / code > or
< code > /var/log/messages< / code > like this:< / p >
< pre > < code > Jan 31 04:51:34 mail scan_reported_mails: [CLEAN] Learned tokens from 1 message(s) (1 message(s) examined)
Jan 31 05:03:16 mail scan_reported_mails: [SPAM] Learned tokens from 1 message(s) (1 message(s) examined)
< / code > < / pre >
2019-05-21 00:34:06 -05:00
< h3 id = "check-detailed-bayes-learning-log-on-command-line" > Check detailed bayes learning log on command line< / h3 >
< p > You can either < a href = "./debug.amavisd.html" > turn on debug mode in Amavisd and SpamAssassin< / a >
to check how bayes learning works in SpamAssassin, or run < code > sa-learn< / code > manually
to check it with a sample email.< / p >
< p > To check on command line, please upload/save a sample email to
< code > /opt/sample.eml< / code > , then run < code > sa-learn< / code > as root user:< / p >
< pre > < code > # su -s /bin/bash amavis -c " spamassassin -D bayes < /opt/sample.eml"
May 21 05:27:08.244 [32241] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x2fe8cb8), bayes_store_module=Mail::SpamAssassin::BayesStore::MySQL
May 21 05:27:08.264 [32241] dbg: bayes: using username: amavis
May 21 05:27:08.264 [32241] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::MySQL=HASH(0x387a1c8)
M
...
< / code > < / pre >
< h2 id = "check-bayes-data" > Check bayes data< / h2 >
< p > Run < code > sa-learn< / code > with < code > --dump< / code > argument will show the bayes data like below:< / p >
< pre > < code > # sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 3778575 0 non-token data: nspam
0.000 0 6326326 0 non-token data: nham
0.000 0 539978 0 non-token data: ntokens
0.000 0 1558372204 0 non-token data: oldest atime
0.000 0 1558415857 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal sync atime
0.000 0 1558415403 0 non-token data: last expiry atime
0.000 0 43200 0 non-token data: last expire atime delta
0.000 0 59325 0 non-token data: last expire reduction count
< / code > < / pre >
< ul >
< li > < code > nspam< / code > means number of learnt spams.< / li >
< li > < code > nham< / code > means number of learnt ham/clean emails.< / li >
< / ul >
2019-01-31 22:50:18 -06:00
< h2 id = "references" > References< / h2 >
< ul >
< li >
< p > < a href = "https://wiki.dovecot.org/HowTo/AntispamWithSieve" > Dovecot wiki: Antispam with Sieve< / a > < / p >
< p > You may notice a difference between current tutorial and Dovecot wiki
tutorial: our setup saves reported mails and scan it later with < code > sa-learn< / code >
by cron job, but setup in Dovecot wiki calls < code > sa-learn< / code > directly. We make
this change due to performance issue: when user moves a message to < code > Junk< / code >
folder, webmail will wait for < code > sa-learn< / code > to finish the scan then return
responsive, but if user moves a log messages at the same time, webmail will
hang and user have to wait there. This is not good user experience.< / p >
< / li >
< / ul > < div class = "footer" >
2019-09-06 00:54:43 -05:00
< p style = "text-align: center; color: grey;" > All documents are available in < a href = "https://github.com/iredmail/docs/" > BitBucket repository< / a > , and published under < a href = "http://creativecommons.org/licenses/by-nd/3.0/us/" target = "_blank" > Creative Commons< / a > license. You can < a href = "https://github.com/iredmail/docs/archive/master.zip" > download the latest version< / a > for offline reading. If you found something wrong, please do < a href = "https://www.iredmail.org/contact.html" > contact us< / a > to fix it.< / p >
2019-01-31 22:50:18 -06:00
< / div >
<!-- Global site tag (gtag.js) - Google Analytics -->
< script async src = "https://www.googletagmanager.com/gtag/js?id=UA-3293801-21" > < / script >
< script >
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-3293801-21');
< / script >
< / body > < / html >