SpamBayes
{{Multiple issues|
{{Sources exist |date=May 2024}}
{{Primary sources |date=May 2024}}
}}
{{Infobox software
| name = SpamBayes
| logo =
| screenshot =
| caption =
| collapsible =
| author = Tim Peters
| developer =
| released = September 2002
| latest_release_version = 1.0.4
| latest_release_date = March 2005
| latest_preview_version = 1.1a6
| latest_preview_date = {{start date|2008|12|06}}{{cite web | url=http://sourceforge.net/projects/spambayes/files/spambayes/1.1a6/CHANGELOG.txt/download | title=Download CHANGELOG.TXT (SpamBayes anti-spam) }}
| programming_language = Python
| operating_system =
| platform = Cross-platform
| size =
| language = English only
| genre = E-mail filtering
| license = PSFL
| website = [http://spambayes.sourceforge.net/ spambayes.sourceforge.net]
}}
SpamBayes is a Bayesian spam filter written in Python which uses techniques laid out by Paul Graham in his essay "A Plan for Spam". It has subsequently been improved by Gary Robinson and Tim Peters, among others.{{cite magazine|last=Robinson|first=Gary|date=1 March 2003|title=A Statistical Approach to the Spam Problem|magazine=Linux Journal|issn=1075-3583|url=https://www.linuxjournal.com/article/6467}}
The most notable difference between a conventional Bayesian filter and the filter used by SpamBayes is that there are three classifications rather than two: spam, non-spam (called ham in SpamBayes), and unsure. The user trains a message as being either ham or spam; when filtering a message, the spam filters generate one score for ham and another for spam.
If the spam score is high and the ham score is low, the message will be classified as spam.
If the spam score is low and the ham score is high, the message will be classified as ham.
If the scores are both high or both low, the message will be classified as unsure.
This approach leads to a low number of false positives and false negatives, but it may result in a number of unsures which need a human decision.
Web filtering
Some work has gone into applying SpamBayes to filter internet content via a proxy web server.{{Cite web |last=Montanaro |first=Skip |date=2003-12-07 |title=[spambayes-dev] Web filtering |url=https://mail.python.org/pipermail/spambayes-dev/2003-December/001804.html |access-date=2023-04-18}}{{cite web | url=http://mail.python.org/pipermail/spambayes-dev/2003-December/001804.html | title=[spambayes-dev] Web filtering | date=7 December 2003 }}{{Cite web|url=http://osdir.com/ml/mail.spam.spambayes.devel/2008-05/msg00004.html|title = OSDIR|date = 6 November 2020}}
References
{{reflist}}
External links
- {{official website|http://spambayes.sourceforge.net/}}
- [http://www.paulgraham.com/spam.html Paul Graham's original idea]
- [http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html Essay discussing improvements on Graham's original idea] {{Webarchive|url=https://web.archive.org/web/20070703150601/http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html |date=2007-07-03 }}
- [http://spambayes.sourceforge.net/background.html Explaining how SpamBayes works]
- [http://ceas.cc/papers-2004/136.pdf Paper on SpamBayes for the Conference on E-mail and Anti-Spam]
- [http://home.dataparty.no/kristian/reviews/bayesian/ Winning the War on spam: Comparison of Bayesian spam filters]