Mail::SpamAssassin is a module to identify spam using several methods
including text analysis, internet-based realtime blacklists, statistical
analysis, and internet-based hashing algorithms.
Using its rule base, it uses a wide range of heuristic tests on mail headers
and body text to identify ``spam'', also known as unsolicited bulk email.
Once identified, the mail can then be tagged as spam for later filtering
using the user's own mail user-agent application or at the mail transfer
agent.
SpamAssassin also includes support for reporting spam messages to collaborative
filtering databases, such as Vipul's Razor ( http://razor.sourceforge.net/ ).
- $f = new Mail::SpamAssassin( [ { opt => val, ... } ] )
-
Constructs a new "Mail::SpamAssassin" object. You may pass the
following attribute-value pairs to the constructor.
-
- rules_filename
-
The filename to load spam-identifying rules from. (optional)
- site_rules_filename
-
The directory to load site-specific spam-identifying rules from. (optional)
- userprefs_filename
-
The filename to load preferences from. (optional)
- userstate_dir
-
The directory user state is stored in. (optional)
- config_text
-
The text of all rules and preferences. If you prefer not to load the rules
from files, read them in yourself and set this instead. As a result, this will
override the settings for "rules_filename", "site_rules_filename",
and "userprefs_filename".
- languages_filename
-
If you want to be able to use the language-guessing rule
"UNWANTED_LANGUAGE_BODY", and are using "config_text" instead of
"rules_filename", "site_rules_filename", and "userprefs_filename", you will
need to set this. It should be the path to the languages file normally
found in the SpamAssassin rules directory.
- local_tests_only
-
If set to 1, no tests that require internet access will be performed. (default:
0)
- dont_copy_prefs
-
If set to 1, the user preferences file will not be created if it doesn't
already exist. (default: 0)
- save_pattern_hits
-
If set to 1, the patterns hit can be retrieved from the
"Mail::SpamAssassin::PerMsgStatus" object. Used for debugging.
- home_dir_for_helpers
-
If set, the HOME environment variable will be set to this value
when using test applications that require their configuration data,
such as Razor, Pyzor and DCC.
- username
-
If set, the "username" attribute will use this as the current user's name.
Otherwise, the default is taken from the runtime environment (ie. this process'
effective UID under UNIX).
-
If none of "rules_filename", "site_rules_filename", "userprefs_filename", or
"config_text" is set, the "Mail::SpamAssassin" module will search for the
configuration files in the usual installed locations.
- parse($message, $parse_now)
-
Parse will return a Mail::SpamAssassin::Message object with just the
headers parsed. When calling this function, there are two optional
parameters that can be passed in: $message is either undef (which will
use STDIN), a scalar of the entire message, an array reference of the
message with 1 line per array element, or a file glob which holds the
entire contents of the message; and $parse_now, which specifies whether
or not to create the MIME tree at parse time or later as necessary.
The $parse_now option, by default, is set to false (0).
This allows SpamAssassin to not have to generate the tree of
Mail::SpamAssassin::Message::Node objects and their related data if the
tree is not going to be used. This is handy, for instance, when running
"spamassassin -d", which only needs the pristine header and body which
is always parsed and stored by this function.
- $status = $f->check ($mail)
-
Check a mail, encapsulated in a "Mail::SpamAssassin::Message" object,
to determine if it is spam or not.
Returns a "Mail::SpamAssassin::PerMsgStatus" object which can be
used to test or manipulate the mail message.
Note that the "Mail::SpamAssassin" object can be re-used for further messages
without affecting this check; in OO terminology, the "Mail::SpamAssassin"
object is a ``factory''. However, if you do this, be sure to call the
"finish()" method on the status objects when you're done with them.
- $status = $f->check_message_text ($mailtext)
-
Check a mail, encapsulated in a plain string $mailtext, to determine if it
is spam or not.
Otherwise identical to "check()" above.
- $status = $f->learn ($mail, $id, $isspam, $forget)
-
Learn from a mail, encapsulated in a "Mail::SpamAssassin::Message"
object.
If $isspam is set, the mail is assumed to be spam, otherwise it will
be learnt as non-spam.
If $forget is set, the attributes of the mail will be removed from
both the non-spam and spam learning databases.
$id is an optional message-identification string, used internally
to tag the message. If it is "undef", the Message-Id of the message
will be used. It should be unique to that message.
Returns a "Mail::SpamAssassin::PerMsgLearner" object which can be used to
manipulate the learning process for each mail.
Note that the "Mail::SpamAssassin" object can be re-used for further messages
without affecting this check; in OO terminology, the "Mail::SpamAssassin"
object is a ``factory''. However, if you do this, be sure to call the
"finish()" method on the learner objects when you're done with them.
"learn()" and "check()" can be run using the same factory. "init_learner()"
must be called before using this method.
- $f->init_learner ( [ { opt => val, ... } ] )
-
Initialise learning. You may pass the following attribute-value pairs to this
method.
-
- caller_will_untie
-
Whether or not the code calling this method will take care of untie'ing
from the Bayes databases (by calling "finish_learner()") (optional, default 0).
- force_expire
-
Should an expiration run be forced to occur immediately? (optional, default 0).
- learn_to_journal
-
Should learning data be written to the journal, instead of directly to the
databases? (optional, default 0).
- wait_for_lock
-
Whether or not to wait a long time for locks to complete (optional, default 0).
-
- $f->rebuild_learner_caches ({ opt => val })
-
Rebuild any cache databases; should be called after the learning process.
Options include: "verbose", which will output diagnostics to "stdout"
if set to 1.
- $f->finish_learner ()
-
Finish learning.
- $f->dump_bayes_db()
-
Dump the contents of the Bayes DB
- $f->signal_user_changed ( [ { opt => val, ... } ] )
-
Signals that the current user has changed (possibly using "setuid"), meaning
that SpamAssassin should close any per-user databases it has open, and re-open
using ones appropriate for the new user.
Note that this should be called after reading any per-user configuration, as
that data may override some paths opened in this method. You may pass the
following attribute-value pairs:
-
- username
-
The username of the user. This will be used for the "username" attribute.
- user_dir
-
A directory to use as a 'home directory' for the current user's data,
overriding the system default. This directory must be readable and writable by
the process. Note that the resulting "userstate_dir" will be the
".spamassassin" subdirectory of this dir.
- userstate_dir
-
A directory to use as a directory for the current user's data, overriding the
system default. This directory must be readable and writable by the process.
The default is "user_dir/.spamassassin".
-
- $f->report_as_spam ($mail, $options)
-
Report a mail, encapsulated in a "Mail::SpamAssassin::Message" object, as
human-verified spam. This will submit the mail message to live,
collaborative, spam-blocker databases, allowing other users to block this
message.
It will also submit the mail to SpamAssassin's Bayesian learner.
Options is an optional reference to a hash of options. Currently these
can be:
-
- dont_report_to_dcc
-
Inhibits reporting of the spam to DCC.
- dont_report_to_pyzor
-
Inhibits reporting of the spam to Pyzor.
- dont_report_to_razor
-
Inhibits reporting of the spam to Razor.
- dont_report_to_spamcop
-
Inhibits reporting of the spam to SpamCop.
-
- $f->revoke_as_spam ($mail, $options)
-
Revoke a mail, encapsulated in a "Mail::SpamAssassin::Message" object, as
human-verified ham (non-spam). This will revoke the mail message from live,
collaborative, spam-blocker databases, allowing other users to block this
message.
It will also submit the mail to SpamAssassin's Bayesian learner as nonspam.
Options is an optional reference to a hash of options. Currently these
can be:
-
- dont_report_to_razor
-
Inhibits revoking of the spam to Razor.
-
- $f->add_address_to_whitelist ($addr)
-
Given a string containing an email address, add it to the automatic
whitelist database.
- $f->add_all_addresses_to_whitelist ($mail)
-
Given a mail message, find as many addresses in the usual headers (To, Cc, From
etc.), and the message body, and add them to the automatic whitelist database.
- $f->remove_address_from_whitelist ($addr)
-
Given a string containing an email address, remove it from the automatic
whitelist database.
- $f->remove_all_addresses_from_whitelist ($mail)
-
Given a mail message, find as many addresses in the usual headers (To, Cc, From
etc.), and the message body, and remove them from the automatic whitelist
database.
- $f->add_address_to_blacklist ($addr)
-
Given a string containing an email address, add it to the automatic
whitelist database with a high score, effectively blacklisting them.
- $f->add_all_addresses_to_blacklist ($mail)
-
Given a mail message, find addresses in the From headers and add them to the
automatic whitelist database with a high score, effectively blacklisting them.
Note that To and Cc addresses are not used.
- $text = $f->remove_spamassassin_markup ($mail)
-
Returns the text of the message, with any SpamAssassin-added text (such
as the report, or X-Spam-Status headers) stripped.
Note that the $mail object is not modified.
- $f->read_scoreonly_config ($filename)
-
Read a configuration file and parse user preferences from it.
User preferences are as defined in the "Mail::SpamAssassin::Conf" manual page.
In other words, they include scoring options, scores, whitelists and
blacklists, and so on, but do not include rule definitions, privileged
settings, etc. unless "allow_user_rules" is enabled; and they never include
the administrator settings.
- $f->load_scoreonly_sql ($username)
-
Read configuration paramaters from SQL database and parse scores from it. This
will only take effect if the perl "DBI" module is installed, and the
configuration parameters "user_scores_dsn", "user_scores_sql_username", and
"user_scores_sql_password" are set correctly.
The username in $username will also be used for the "username" attribute of
the Mail::SpamAssassin object.
- $f->load_scoreonly_ldap ($username)
-
Read configuration paramaters from an LDAP server and parse scores from it.
This will only take effect if the perl "Net::LDAP" and "URI" modules are
installed, and the configuration parameters "user_scores_dsn",
"user_scores_ldap_username", and "user_scores_ldap_password" are set
correctly.
The username in $username will also be used for the "username" attribute of
the Mail::SpamAssassin object.
- $f->set_persistent_address_list_factory ($factoryobj)
-
Set the persistent address list factory, used to create objects for the
automatic whitelist algorithm's persistent-storage back-end. See
"Mail::SpamAssassin::PersistentAddrList" for the API these factory objects
must implement, and the API the objects they produce must implement.
- $f->compile_now ($use_user_prefs, $keep_userstate)
-
Compile all patterns, load all configuration files, and load all
possibly-required Perl modules.
Normally, Mail::SpamAssassin uses lazy evaluation where possible, but if you
plan to fork() or start a new perl interpreter thread to process a message,
this is suboptimal, as each process/thread will have to perform these actions.
Call this function in the master thread or process to perform the actions
straightaway, so that the sub-processes will not have to.
If $use_user_prefs is 0, this will initialise the SpamAssassin
configuration without reading the per-user configuration file and it will
assume that you will call "read_scoreonly_config" at a later point.
If $keep_userstate is true, compile_now() will revert any configuration
options which have a default with __userstate__ in it post-init(),
and then re-change the option before returning. This lets you change
$ENV{'HOME'} to a temp directory, have compile_now() and create any
files there as necessary (auto-whitelist, etc,) without disturbing the
actual files as changed by a configuration option. By default, this
is disabled.
- $f->debug_diagnostics ()
-
Output some diagnostic information, useful for debugging SpamAssassin
problems.
- $failed = $f->lint_rules ()
-
Syntax-check the current set of rules. Returns the number of
syntax errors discovered, or 0 if the configuration is valid.
- $f->finish()
-
Destroy this object, so that it will be garbage-collected once it
goes out of scope. The object will no longer be usable after this
method is called.
- $f->create_default_prefs ($filename, $username [ , $userdir ] )
-
Copy default preferences file into home directory for later use and
modification, if it does not already exist and "dont_copy_prefs"
is
not set.
- $f->copy_config ( [ $source ], [ $dest ] )
-
Used for daemons to keep a persistent Mail::SpamAssassin object's
configuration correct if switching between users. Pass an associative
array reference as either $source
or $dest, and set the other to 'undef'
so that the object will use its current configuration. i.e.:
# create object w/ configuration
my $spamtest = Mail::SpamAssassin->new( ... );
# backup configuration to %conf_backup
my %conf_backup = ();
$spamtest->copy_config(undef, \%conf_backup) ||
die "error returned from copy_config!\n";
... do stuff, perhaps modify the config, etc ...
# reset the configuration back to the original
$spamtest->copy_config(\%conf_backup, undef) ||
die "error returned from copy_config!\n";
> for more information.
SpamAssassin is distributed under the Apache License, Version 2.0, as
described in the file
included with the distribution.