Review: store.spamassassin.bayes.in.sql.md.

This commit is contained in:
Zhang Huangbin 2014-09-17 17:32:53 +08:00
parent 8f0bcd4799
commit 56d7836357
4 changed files with 285 additions and 35 deletions

View File

@ -1,10 +1,12 @@
<http://www.iredmail.org/wiki/index.php?title=IRedMail/FAQ/Store.SpamAssassin.Bayes.In.SQL> # How to store spamassassin bayes in SQL
#How to store spamassassin bayes in SQL
__THIS ARTICLE IS STILL A DRAFT, DO NOT APPLY IT IN PRODUCTION SERVER.__ __THIS ARTICLE IS STILL A DRAFT, DO NOT APPLY IT IN PRODUCTION SERVER.__
##Summary ## Summary
This article is used to configure related components to store SpamAssassin Bayes data in SQL server, and allow webmail users to report spam with one click.
This article will guide you to configure related components to store
SpamAssassin Bayes data in SQL server, and allow webmail users to report spam
with one click.
Tested with: Tested with:
@ -17,25 +19,35 @@ Tested with:
Notes: Notes:
* This article should work with all iRedMail releases. We take iRedMail-__0.8.0__ for example. * This article should work with all iRedMail releases. We take iRedMail-0.8.0 for example.
* This article should work with all backends: OpenLDAP, MySQL, PostgreSQL. We take MySQL backend for example. * This article should work with all backends: OpenLDAP, MySQL, MariaDB, PostgreSQL. We take MySQL backend for example.
* This article should work with Amavisd-new-2.6.0 and later versions, includes Amavisd-new-2.7.x. * This article should work with Amavisd-new-2.6.0 and later versions.
IMPORTANT NOTE: __IMPORTANT NOTE__:
* The bayesian classifier can only score new messages if it already has 200 known spams and 200 known hams. * The bayesian classifier can only score new messages if it already has 200
* If Spamassassin fails to identify a spam, teach it so it can do better next time. e.g. Mark it as spam in roundcube webmail. known spams and 200 known hams.
* Read __References__ section at the end of this article before asking. * If Spamassassin fails to identify a spam, teach it so it can do better next
time. e.g. Mark it as spam in roundcube webmail.
* Read `References` section at the end of this article before asking questions.
##Create required SQL database used to store bayes data ## Create required SQL database used to store bayes data
We need to create a SQL database and necessary tables to store SpamAssassin
bayes data. The RPM package installed on CentOS 6 doesn't ship SQL template
for bayes database, so we have to download it from Apache web site. We're
running SpamAssassin-3.3.1, so what we need is this SQL template file:
http://svn.apache.org/repos/asf/spamassassin/tags/spamassassin_release_3_3_1/sql/bayes_mysql.sql.
If you're running different version, please find the proper SQL file here:
[http://svn.apache.org/repos/asf/spamassassin/tags/](http://svn.apache.org/repos/asf/spamassassin/tags/).
We need to create a SQL database and necessary tables to store SpamAssassin bayes data. The RPM package installed on CentOS 6 doesn't ship SQL template for bayes database, but we can download it from Apache web site. We're running SpamAssassin-3.3.1, so what we need is this SQL template file: <http://svn.apache.org/repos/asf/spamassassin/tags/spamassassin_release_3_3_1/sql/bayes_mysql.sql> (if you're running different version, please find the proper SQL file here: <http://svn.apache.org/repos/asf/spamassassin/tags/>)
<pre> <pre>
# cd /root/ # cd /root/
# wget http://svn.apache.org/repos/asf/spamassassin/tags/spamassassin_release_3_3_1/sql/bayes_mysql.sql # wget http://svn.apache.org/repos/asf/spamassassin/tags/spamassassin_release_3_3_1/sql/bayes_mysql.sql
</pre> </pre>
Create MySQL database and import SQL template file: Create MySQL database and import SQL template file:
<pre> <pre>
# mysql -uroot -p # mysql -uroot -p
mysql> CREATE DATABASE sa_bayes; mysql> CREATE DATABASE sa_bayes;
@ -43,17 +55,19 @@ mysql> USE sa_bayes;
mysql> SOURCE /root/bayes_mysql.sql; mysql> SOURCE /root/bayes_mysql.sql;
</pre> </pre>
Create a new MySQL user (with password __sa\_user\_password__) and grant permissions: Create a new MySQL user (with password `sa_user_password`) and grant
permissions. __IMPORTANT NOTE__: Please replace password `sa_user_password`
by your own password.
* Note: Please replace password __sa\_user\_password__ by your own.
<pre> <pre>
mysql> GRANT SELECT, INSERT, UPDATE, DELETE ON sa_bayes.* TO sa_user@localhost IDENTIFIED BY 'sa_user_password'; mysql> GRANT SELECT, INSERT, UPDATE, DELETE ON sa_bayes.* TO sa_user@localhost IDENTIFIED BY 'sa_user_password';
mysql> FLUSH PRIVILEGES; mysql> FLUSH PRIVILEGES;
</pre> </pre>
##Enable Bayes modules in SpamAssassin ## Enable Bayes modules in SpamAssassin
Edit `/etc/mail/spamassassin/local.cf`, add (or modify below settings):
Edit /etc/mail/spamassassin/local.cf, add (or modify below settings):
<pre> <pre>
use_bayes 1 use_bayes 1
bayes_auto_learn 1 bayes_auto_learn 1
@ -78,6 +92,7 @@ bayes_sql_override_username vmail
</pre> </pre>
Make sure SpamAssassin will load bayes modules: Make sure SpamAssassin will load bayes modules:
<pre> <pre>
# /etc/init.d/amavisd stop # /etc/init.d/amavisd stop
# amavisd -c /etc/amavisd/amavisd.conf debug 2>&1 | grep -i 'bayes' # amavisd -c /etc/amavisd/amavisd.conf debug 2>&1 | grep -i 'bayes'
@ -85,14 +100,17 @@ May 16 09:59:33 ... SpamAssassin loaded plugins: ..., Bayes, ...
May 16 10:27:38 ... extra modules loaded after daemonizing/chrooting: May 16 10:27:38 ... extra modules loaded after daemonizing/chrooting:
Mail/SpamAssassin/BayesStore/MySQL.pm, Mail/SpamAssassin/BayesStore/SQL.pm, ... Mail/SpamAssassin/BayesStore/MySQL.pm, Mail/SpamAssassin/BayesStore/SQL.pm, ...
</pre> </pre>
Looks fine. Now press 'Ctrl-C' to terminate above command. Looks fine. Now press `Ctrl-C` to terminate above command.
Start Amavisd service: Start Amavisd service:
<pre> <pre>
# /etc/init.d/amavisd restart # /etc/init.d/amavisd restart
</pre> </pre>
It is required we initialize the database by learning a message. We use the sample spam email shipped in the RPM package provided by CentOS 6: It is required we initialize the database by learning a message. We use the
sample spam email shipped in the RPM package provided by CentOS 6:
<pre> <pre>
# rpm -ql spamassassin | grep 'sample-spam' # rpm -ql spamassassin | grep 'sample-spam'
/usr/share/doc/spamassassin-3.3.1/sample-spam.txt /usr/share/doc/spamassassin-3.3.1/sample-spam.txt
@ -101,19 +119,26 @@ It is required we initialize the database by learning a message. We use the samp
Learned tokens from 1 message(s) (1 message(s) examined) Learned tokens from 1 message(s) (1 message(s) examined)
</pre> </pre>
##Enable Roundcube plugin: markasjunk2 ## Enable Roundcube plugin: markasjunk2
* We need a third-party Roundcube plugin to allow webmail users to report spam: Mark as Junk 2. You can download it here: <https://github.com/JohnDoh/Roundcube-Plugin-Mark-as-Junk-2/releases> * We need a third-party Roundcube plugin to allow webmail users to report spam:
`Mark as Junk 2`. You can download it here:
[https://github.com/JohnDoh/Roundcube-Plugin-Mark-as-Junk-2/releases](https://github.com/JohnDoh/Roundcube-Plugin-Mark-as-Junk-2/releases)
* After download, please uncompress it and copy it to roundcube plugins directory: /var/www/roundcubemail/plugins/. Then we get a new directory: /var/www/roundcubemail/plugins/markasjunk2/ * After download, please uncompress it and copy it to roundcube plugins
directory: `/var/www/roundcubemail/plugins/`. Then we get a new directory:
`/var/www/roundcubemail/plugins/markasjunk2/`.
* Enter directory `/var/www/roundcubemail/plugins/markasjunk2/`, generate
config file by copying its sample config file:
* Enter directory /var/www/roundcubemail/plugins/markasjunk2/, generate config file by copying its sample config file:
<pre> <pre>
# cd /var/www/roundcubemail/plugins/markasjunk2/ # cd /var/www/roundcubemail/plugins/markasjunk2/
# cp config.inc.php.dist config.inc.php # cp config.inc.php.dist config.inc.php
</pre> </pre>
* Edit roundcubemail/plugins/markasjunk2/config.inc.php, update below settings: * Edit `roundcubemail/plugins/markasjunk2/config.inc.php`, update below settings:
<pre> <pre>
$rcmail_config['markasjunk2_learning_driver'] = 'cmd_learn'; $rcmail_config['markasjunk2_learning_driver'] = 'cmd_learn';
$rcmail_config['markasjunk2_read_spam'] = true; $rcmail_config['markasjunk2_read_spam'] = true;
@ -122,17 +147,21 @@ $rcmail_config['markasjunk2_move_spam'] = true;
$rcmail_config['markasjunk2_move_ham'] = true; $rcmail_config['markasjunk2_move_ham'] = true;
$rcmail_config['markasjunk2_mb_toolbar'] = true; $rcmail_config['markasjunk2_mb_toolbar'] = true;
$rcmail_config['markasjunk2_spam_cmd'] = 'sa-learn --spam --username=vmail %f'; $rcmail_config['markasjunk2_spam_cmd'] = 'sa-learn --spam --username=vmail %f';
$rcmail_config['markasjunk2_ham_cmd'] = 'sa-learn --ham --username=vmail %f'; $rcmail_config['markasjunk2_ham_cmd'] = 'sa-learn --ham --username=vmail %f';
</pre> </pre>
* Enable this plugin in Roundcube config file(/var/www/roundcubemail/config/main.inc.php) by appending 'markasjunk2' in plugin list: * Enable this plugin in Roundcube config file
`/var/www/roundcubemail/config/main.inc.php` by appending `markasjunk2`
in plugin list:
<pre> <pre>
$rcmail_config['plugins'] = array("password", "managesieve", "markasjunk2"); $rcmail_config['plugins'] = array(..., "markasjunk2");
</pre> </pre>
* Since learning driver __cmd\_learn__ requires PHP function __exec__, we have to enable it in /etc/php.ini: * Learning driver `cmd_learn` requires PHP function `exec`, so we have to
remove it from PHP config file `/etc/php.ini`, parameter `disabled_functions`:
<pre> <pre>
# OLD SETTING # OLD SETTING
# disable_functions =show_source,system,shell_exec,passthru,exec,phpinfo,proc_open ; # disable_functions =show_source,system,shell_exec,passthru,exec,phpinfo,proc_open ;
@ -145,9 +174,10 @@ disable_functions =show_source,system,shell_exec,passthru,phpinfo,proc_open ;
You will see a new toolbar button after logging into Roundcube webmail: You will see a new toolbar button after logging into Roundcube webmail:
![](images/Markasjunk2_toolbar_button.png "Markasjunk2_toolbar_button.png") ![](../images/Markasjunk2_toolbar_button.png)
Check SQL database `sa_bayes` before we testing this plugin:
Check SQL database __sa_bayes__ before we testing this plugin:
<pre> <pre>
# mysql -uroot -p # mysql -uroot -p
mysql> USE sa_bayes; mysql> USE sa_bayes;
@ -159,7 +189,10 @@ mysql> SELECT COUNT(*) FROM bayes_token;
+----------+ +----------+
</pre> </pre>
Back to Roundcube webmail, select a spam email (or a testing email), click "Mark as Junk" button, then this email will be scanned by command __sa-learn__. Check database __sa\_bayes__ again to make sure it's working: Back to Roundcube webmail, select a spam email (or a testing email), click
`Mark as Junk` button, then this email will be scanned by command `sa-learn`.
Check database `sa_bayes` again to make sure it's working:
<pre> <pre>
# mysql -uroot -p # mysql -uroot -p
mysql> USE sa_bayes; mysql> USE sa_bayes;
@ -170,8 +203,12 @@ mysql> SELECT COUNT(*) FROM bayes_token;
| 143 | | 143 |
+----------+ +----------+
</pre> </pre>
Note: You may get different result number as shown above. Note: You may get different result number as shown above.
<br/>So far so good. That's all we need to do.
##References So far so good. That's all we need to do.
* [Bayes Introduction](http://wiki.apache.org/spamassassin/BayesInSpamAssassin). Please do read section __Things to remember__.
## References
* [Bayes Introduction](http://wiki.apache.org/spamassassin/BayesInSpamAssassin). Please do read section `Things to remember`.
* [SpamAssassin Bayes Frequently Asked Questions](http://wiki.apache.org/spamassassin/BayesFaq) * [SpamAssassin Bayes Frequently Asked Questions](http://wiki.apache.org/spamassassin/BayesFaq)

View File

@ -3,7 +3,7 @@ body{
font-family: Georgia, Palatino, serif; font-family: Georgia, Palatino, serif;
line-height: 1; line-height: 1;
max-width: 70%; max-width: 70%;
padding: 30px; padding: 30px 0px 100px 0px;
} }
h1, h2, h3, h4, h5 { h1, h2, h3, h4, h5 {
font-weight: 500; font-weight: 500;

View File

@ -0,0 +1,212 @@
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title></title>
<link href="../css/markdown.css" rel="stylesheet"></head>
</head>
<body>
<h1 id="how-to-store-spamassassin-bayes-in-sql">How to store spamassassin bayes in SQL</h1>
<p><strong>THIS ARTICLE IS STILL A DRAFT, DO NOT APPLY IT IN PRODUCTION SERVER.</strong></p>
<h2 id="summary">Summary</h2>
<p>This article will guide you to configure related components to store
SpamAssassin Bayes data in SQL server, and allow webmail users to report spam
with one click.</p>
<p>Tested with:</p>
<ul>
<li>iRedMail-0.8.0, iRedMail-0.8.7. </li>
<li>CentOS 6.2 (x86_64)</li>
<li>SpamAssassin-3.3.1</li>
<li>Amavisd-new-2.6.6</li>
<li>MySQL-5.1.61</li>
<li>Roundcubemail-0.7.2</li>
</ul>
<p>Notes:</p>
<ul>
<li>This article should work with all iRedMail releases. We take iRedMail-0.8.0 for example.</li>
<li>This article should work with all backends: OpenLDAP, MySQL, MariaDB, PostgreSQL. We take MySQL backend for example.</li>
<li>This article should work with Amavisd-new-2.6.0 and later versions.</li>
</ul>
<p><strong>IMPORTANT NOTE</strong>:</p>
<ul>
<li>The bayesian classifier can only score new messages if it already has 200
known spams and 200 known hams.</li>
<li>If Spamassassin fails to identify a spam, teach it so it can do better next
time. e.g. Mark it as spam in roundcube webmail.</li>
<li>Read <code>References</code> section at the end of this article before asking questions.</li>
</ul>
<h2 id="create-required-sql-database-used-to-store-bayes-data">Create required SQL database used to store bayes data</h2>
<p>We need to create a SQL database and necessary tables to store SpamAssassin
bayes data. The RPM package installed on CentOS 6 doesn't ship SQL template
for bayes database, so we have to download it from Apache web site. We're
running SpamAssassin-3.3.1, so what we need is this SQL template file:
http://svn.apache.org/repos/asf/spamassassin/tags/spamassassin_release_3_3_1/sql/bayes_mysql.sql.
If you're running different version, please find the proper SQL file here:
<a href="http://svn.apache.org/repos/asf/spamassassin/tags/">http://svn.apache.org/repos/asf/spamassassin/tags/</a>.</p>
<pre>
# cd /root/
# wget http://svn.apache.org/repos/asf/spamassassin/tags/spamassassin_release_3_3_1/sql/bayes_mysql.sql
</pre>
<p>Create MySQL database and import SQL template file:</p>
<pre>
# mysql -uroot -p
mysql> CREATE DATABASE sa_bayes;
mysql> USE sa_bayes;
mysql> SOURCE /root/bayes_mysql.sql;
</pre>
<p>Create a new MySQL user (with password <code>sa_user_password</code>) and grant
permissions. <strong>IMPORTANT NOTE</strong>: Please replace password <code>sa_user_password</code>
by your own password.</p>
<pre>
mysql> GRANT SELECT, INSERT, UPDATE, DELETE ON sa_bayes.* TO sa_user@localhost IDENTIFIED BY 'sa_user_password';
mysql> FLUSH PRIVILEGES;
</pre>
<h2 id="enable-bayes-modules-in-spamassassin">Enable Bayes modules in SpamAssassin</h2>
<p>Edit <code>/etc/mail/spamassassin/local.cf</code>, add (or modify below settings):</p>
<pre>
use_bayes 1
bayes_auto_learn 1
bayes_auto_expire 1
# Store bayesian data in MySQL
bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn DBI:mysql:sa_bayes:127.0.0.1:3306
# Store bayesian data in PostgreSQL
#bayes_store_module Mail::SpamAssassin::BayesStore::PgSQL
#bayes_sql_dsn DBI:Pg:sa_bayes:127.0.0.1:5432
bayes_sql_username sa_user
bayes_sql_password sa_user_password
# Override the username used for storing
# data in the database. This could be used to group users together to
# share bayesian filter data. You can also use this config option to
# trick sa-learn to learn data as a specific user.
bayes_sql_override_username vmail
</pre>
<p>Make sure SpamAssassin will load bayes modules:</p>
<pre>
# /etc/init.d/amavisd stop
# amavisd -c /etc/amavisd/amavisd.conf debug 2>&1 | grep -i 'bayes'
May 16 09:59:33 ... SpamAssassin loaded plugins: ..., Bayes, ...
May 16 10:27:38 ... extra modules loaded after daemonizing/chrooting:
Mail/SpamAssassin/BayesStore/MySQL.pm, Mail/SpamAssassin/BayesStore/SQL.pm, ...
</pre>
<p>Looks fine. Now press <code>Ctrl-C</code> to terminate above command.</p>
<p>Start Amavisd service:</p>
<pre>
# /etc/init.d/amavisd restart
</pre>
<p>It is required we initialize the database by learning a message. We use the
sample spam email shipped in the RPM package provided by CentOS 6:</p>
<pre>
# rpm -ql spamassassin | grep 'sample-spam'
/usr/share/doc/spamassassin-3.3.1/sample-spam.txt
# sa-learn --spam --username=vmail /usr/share/doc/spamassassin-3.3.1/sample-spam.txt
Learned tokens from 1 message(s) (1 message(s) examined)
</pre>
<h2 id="enable-roundcube-plugin-markasjunk2">Enable Roundcube plugin: markasjunk2</h2>
<ul>
<li>
<p>We need a third-party Roundcube plugin to allow webmail users to report spam:
<code>Mark as Junk 2</code>. You can download it here:
<a href="https://github.com/JohnDoh/Roundcube-Plugin-Mark-as-Junk-2/releases">https://github.com/JohnDoh/Roundcube-Plugin-Mark-as-Junk-2/releases</a></p>
</li>
<li>
<p>After download, please uncompress it and copy it to roundcube plugins
directory: <code>/var/www/roundcubemail/plugins/</code>. Then we get a new directory:
<code>/var/www/roundcubemail/plugins/markasjunk2/</code>.</p>
</li>
<li>
<p>Enter directory <code>/var/www/roundcubemail/plugins/markasjunk2/</code>, generate
config file by copying its sample config file:</p>
</li>
</ul>
<pre>
# cd /var/www/roundcubemail/plugins/markasjunk2/
# cp config.inc.php.dist config.inc.php
</pre>
<ul>
<li>Edit <code>roundcubemail/plugins/markasjunk2/config.inc.php</code>, update below settings:</li>
</ul>
<pre>
$rcmail_config['markasjunk2_learning_driver'] = 'cmd_learn';
$rcmail_config['markasjunk2_read_spam'] = true;
$rcmail_config['markasjunk2_unread_ham'] = false;
$rcmail_config['markasjunk2_move_spam'] = true;
$rcmail_config['markasjunk2_move_ham'] = true;
$rcmail_config['markasjunk2_mb_toolbar'] = true;
$rcmail_config['markasjunk2_spam_cmd'] = 'sa-learn --spam --username=vmail %f';
$rcmail_config['markasjunk2_ham_cmd'] = 'sa-learn --ham --username=vmail %f';
</pre>
<ul>
<li>Enable this plugin in Roundcube config file
<code>/var/www/roundcubemail/config/main.inc.php</code> by appending <code>markasjunk2</code>
in plugin list:</li>
</ul>
<pre>
$rcmail_config['plugins'] = array(..., "markasjunk2");
</pre>
<ul>
<li>Learning driver <code>cmd_learn</code> requires PHP function <code>exec</code>, so we have to
remove it from PHP config file <code>/etc/php.ini</code>, parameter <code>disabled_functions</code>:</li>
</ul>
<pre>
# OLD SETTING
# disable_functions =show_source,system,shell_exec,passthru,exec,phpinfo,proc_open ;
# NEW SETTING. exec is removed.
disable_functions =show_source,system,shell_exec,passthru,phpinfo,proc_open ;
</pre>
<ul>
<li>Restarting Apache web server.</li>
</ul>
<p>You will see a new toolbar button after logging into Roundcube webmail:</p>
<p><img alt="" src="../images/Markasjunk2_toolbar_button.png" /></p>
<p>Check SQL database <code>sa_bayes</code> before we testing this plugin:</p>
<pre>
# mysql -uroot -p
mysql> USE sa_bayes;
mysql> SELECT COUNT(*) FROM bayes_token;
+----------+
| count(*) |
+----------+
| 65 |
+----------+
</pre>
<p>Back to Roundcube webmail, select a spam email (or a testing email), click
<code>Mark as Junk</code> button, then this email will be scanned by command <code>sa-learn</code>.
Check database <code>sa_bayes</code> again to make sure it's working:</p>
<pre>
# mysql -uroot -p
mysql> USE sa_bayes;
mysql> SELECT COUNT(*) FROM bayes_token;
+----------+
| count(*) |
+----------+
| 143 |
+----------+
</pre>
<p>Note: You may get different result number as shown above.</p>
<p>So far so good. That's all we need to do.</p>
<h2 id="references">References</h2>
<ul>
<li><a href="http://wiki.apache.org/spamassassin/BayesInSpamAssassin">Bayes Introduction</a>. Please do read section <code>Things to remember</code>.</li>
<li><a href="http://wiki.apache.org/spamassassin/BayesFaq">SpamAssassin Bayes Frequently Asked Questions</a></li>
</ul></body></html>

View File

@ -14,6 +14,7 @@
<li><a href="faq-howto/howto.configure.thunderbird.for.iredmail.html"> Configure Thunderbird as mail client (IMAP, SMTP and global ldap address book)</a></li> <li><a href="faq-howto/howto.configure.thunderbird.for.iredmail.html"> Configure Thunderbird as mail client (IMAP, SMTP and global ldap address book)</a></li>
<li><a href="faq-howto/howto.enable.smtps.service.html"> How to enable SMTPS service (SMTP over SSL, port 465)</a></li> <li><a href="faq-howto/howto.enable.smtps.service.html"> How to enable SMTPS service (SMTP over SSL, port 465)</a></li>
<li><a href="faq-howto/pipe.incoming.email.for.certain.user.to.external.script.html"> How to pipe incoming email for certain user to external script </a></li> <li><a href="faq-howto/pipe.incoming.email.for.certain.user.to.external.script.html"> How to pipe incoming email for certain user to external script </a></li>
<li><a href="faq-howto/store.spamassassin.bayes.in.sql.html"> How to store spamassassin bayes in SQL</a></li>
<li><a href="faq-howto/unattended.iredmail.installation.html"> How to perform silent/unattended iRedMail installation</a></li> <li><a href="faq-howto/unattended.iredmail.installation.html"> How to perform silent/unattended iRedMail installation</a></li>
<li><a href="faq-howto/use.or.migrate.password.hashes.html"> How to use or migrate password hashes</a></li> <li><a href="faq-howto/use.or.migrate.password.hashes.html"> How to use or migrate password hashes</a></li>
</ul> </ul>