<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel about="http://permalink.gmane.org/gmane.mail.bogofilter.general">
    <title>gmane.mail.bogofilter.general</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11308"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11307"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11306"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11305"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11304"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11303"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11302"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11301"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11300"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11299"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11298"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11297"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11296"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11295"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11294"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11293"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11292"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11291"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11290"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.bogofilter.general/11289"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11308">
    <title>Re: nan in bogofilter stats</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11308</link>
    <description>_______________________________________________
Bogofilter mailing list
Bogofilter&lt; at &gt;bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter
</description>
    <dc:creator>Anne Wilson</dc:creator>
    <dc:date>2008-11-19T11:55:08</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11307">
    <title>Re: nan in bogofilter stats</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11307</link>
    <description>Don't know for sure but I certainly get HEAPS of Russian and Japanese (and 
even some German) spam.
Many hundreds of messages per week plus all the usual English crap.

"Real" mail is in the tens per week.

Stephen

On Wednesday 19 November 2008 19:38:42 Matthias Andree wrote:



</description>
    <dc:creator>Stephen Davies</dc:creator>
    <dc:date>2008-11-19T10:49:05</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11306">
    <title>Re: nan in bogofilter stats</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11306</link>
    <description>

This looks rather unbalanced - does it roughly reflect your
junk/solicited mail ratio?

</description>
    <dc:creator>Matthias Andree</dc:creator>
    <dc:date>2008-11-19T09:08:42</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11305">
    <title>Re: nan in bogofilter stats</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11305</link>
    <description>On Wed, 19 Nov 2008 11:49:44 +1030
Stephen Davies wrote:


Glad you had a backup!  Due to occasional BerkelyDB problems in
bogofilter's early days (years ago), I have a cron job that dumps the
database daily and copies each day's wordlist to a day-of-week
directory.  No doubt it's overkill, but I'd rather be safe :-&gt;
_______________________________________________
Bogofilter mailing list
Bogofilter&lt; at &gt;bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter

</description>
    <dc:creator>David Relson</dc:creator>
    <dc:date>2008-11-19T04:06:11</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11304">
    <title>Re: nan in bogofilter stats</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11304</link>
    <description>OK. I brought back a backup wordlist and now get:

bogoutil -w wordlist.db .MSG_COUNT
                                 spam   good
.MSG_COUNT                     266713  12982

(Assuming that the previous spam count was correct, this implies nearly 20000 
spams per month. Bogofilter has been busy!!)

I'll look at the change to TRANSACTIONAL also.

Cheers,
Stephen

On Wednesday 19 November 2008 11:33:36 David Relson wrote:



</description>
    <dc:creator>Stephen Davies</dc:creator>
    <dc:date>2008-11-19T01:19:44</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11303">
    <title>Re: nan in bogofilter stats</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11303</link>
    <description>On Wed, 19 Nov 2008 10:47:25 +1030
Stephen Davies wrote:


To compute a token's spamicity, bogofilter needs to know how
many spam and ham messages have been registered (in the
wordlist).  .MSG_COUNT is the special token that provides this info.

The numbers 312870 and 1 indicate that 312870 spam messages and 1 ham
message have been registered.  The value 312870 is reasonable while the
value 1 seems unreasonably low.

FWIW, "bogoutil -d wordlist.db &gt; wordlist.txt" will dump your wordlist
as a text file.  Each line has a token, its spam and ham counts, and a
timestamp.  .MSG_COUNT's "good" value _should_ be greater than any ham
count.

It might be time to start a new wordlist and register all the ham and
spam you have available.  I'd also recommend backing up your wordlist
periodically in case of future problems.  Lastly, switching from
NON-TRANSACTIONAL bogofilter to TRANSACTIONAL bogofilter will provide a
more secure database environment.

HTH,

David
_______________________________________________
Bogofilte</description>
    <dc:creator>David Relson</dc:creator>
    <dc:date>2008-11-19T01:03:36</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11302">
    <title>Re: nan in bogofilter stats</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11302</link>
    <description>Thanks for the feedback David.

I get:

bogoutil -w wordlist.db .MSG_COUNT
                                 spam   good
.MSG_COUNT                     312870      1

What does this actually mean?

Cheers,
Stephen


On Wednesday 19 November 2008 10:32:40 David Relson wrote:



</description>
    <dc:creator>Stephen Davies</dc:creator>
    <dc:date>2008-11-19T00:17:25</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11301">
    <title>Re: nan in bogofilter stats</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11301</link>
    <description>On Wed, 19 Nov 2008 09:38:25 +1030
Stephen Davies wrote:

...[snip]...

Hello Stephen,

It sounds like your wordlist is b0rked. "nan" shows up when .MSG_COUNT
has 0 for either spam or ham counts.  run 

"bogoutil -w wordlist.db .MSG_COUNT"

to test this.  If a zero shows up, then it's time to replace your
wordlist with a backup copy.

HTH,

David
_______________________________________________
Bogofilter mailing list
Bogofilter&lt; at &gt;bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter

</description>
    <dc:creator>David Relson</dc:creator>
    <dc:date>2008-11-19T00:02:40</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11300">
    <title>nan in bogofilter stats</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11300</link>
    <description>Recently I am seeing large numbers of false Ham results from my well-trained 
bogofilter.

The following is the output of a bogofilter scan of an obvious spam mail.

The Ham result seems to result from the "nan" values.

Where do these come from and how do I fix it?

Cheers and thanks,
Stephen Davies


[scldad&lt; at &gt;mustang bogofilter]$ bogofilter --version
bogofilter version 1.1.6
    Database: Berkeley DB 4.6.21: (December 28, 2007) NON-TRANSACTIONAL

X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.1.6
                                        n    pgood     pbad      fw     U
  "$59.95"                           2160       nan  0.006905       nan -
  "Viagra"                          12385       nan  0.039592       nan -
  "buy"                             13387       nan  0.042795       nan -
  "childrencloud.com"                  11       nan  0.000035       nan -
  "from:Blackburn"                     23       nan  0.000074       nan -
  "from:Destin"                        11       nan  0.000</description>
    <dc:creator>Stephen Davies</dc:creator>
    <dc:date>2008-11-18T23:08:25</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11299">
    <title>Re Tuning Bogofilter</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11299</link>
    <description>Message: 1
Date: Mon, 20 Oct 2008 21:18:47 -0400
From: Tom Anderson &lt;tanderson&lt; at &gt;orderamidchaos.com&gt;
Subject: Re: Tuning bogofilter
To: bf-users &lt;bogofilter&lt; at &gt;bogofilter.org&gt;
Message-ID: &lt;48FD2DF7.9010303&lt; at &gt;orderamidchaos.com&gt;
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

David Relson wrote:

[Hide Quoted Text]
We recommend against training over and over with the same messages as
it biases the wordlist.  Training with ham and spam yields a wordlist
that indicates how often individual words (tokens) occur in ham and in
spam.  For a simplified example:  if "xyzw" occurs in 10% of your ham
and in 20% of your spam, then a message with "xyzw" is twice as likely
to be spam as it is to be ham.  If you keep training with the same spam
you skew the results -- which is not recommended.
I don't really see biasing a spam message as spam to be particularly
problematic.  If indeed a particular word ought to be hammier, then it
will become so in the course of training your hams.  My experience has
been that someti</description>
    <dc:creator>barsalou</dc:creator>
    <dc:date>2008-10-22T23:22:20</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11298">
    <title>Re: Tuning bogofilter</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11298</link>
    <description>
I don't really see biasing a spam message as spam to be particularly 
problematic.  If indeed a particular word ought to be hammier, then it 
will become so in the course of training your hams.  My experience has 
been that sometimes you don't receive enough spams to make some tokens 
spammy enough, and I therefore train these spams multiple times until 
bogofilter recognizes them appropriately as spams.  Otherwise, I will 
keep receiving them as false negatives.  For instance, if "xyzw" has 
only occurred twice, but it is absolutely 100% always spammy and you 
never ever want to see it again, then just keep training on that spam 
until bogofilter recognizes it as such.  This is as if you have received 
the spam many times, but without the inconvenience of actually having 
done so.  When you later train a ham message which contains another 
token "abcd" which may have appeared alongside "xyzw" and is now 
spammier than it should have been, "abcd" will become hammier while 
"xyzw" remains very spammy.  In th</description>
    <dc:creator>Tom Anderson</dc:creator>
    <dc:date>2008-10-21T01:18:47</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11297">
    <title>Re: Berkeley DB vs Sqlite3</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11297</link>
    <description>_______________________________________________
Bogofilter mailing list
Bogofilter&lt; at &gt;bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter
</description>
    <dc:creator>Gour</dc:creator>
    <dc:date>2008-10-21T06:25:01</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11296">
    <title>Re: Berkeley DB vs Sqlite3</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11296</link>
    <description>Gour schrieb:


bf_compact was fixed in release 1.1.7. Since then, it is supposed to
support Oracle/SleepyCat Berkeley DB, sqlite3, TokyoCabinet, and its
predecessor, QDBM.
We forgot to update the bf_compact documentation. Sorry for that.

Best regards
Matthias

_______________________________________________
Bogofilter mailing list
Bogofilter&lt; at &gt;bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter

</description>
    <dc:creator>Matthias Andree</dc:creator>
    <dc:date>2008-10-20T13:16:58</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11295">
    <title>Re: Tuning bogogilter</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11295</link>
    <description>On Sun, 19 Oct 2008 20:21:12 -0800
barsalou wrote:

...[snip]...
 

Goodness, you must have been using an _old_ version of bogofilter.  The
change in default wording from "Yes, No" to "Spam, Ham, Unsure" was
_years_ ago.

In any case, 'tis good you found the problem!


Yes.
_______________________________________________
Bogofilter mailing list
Bogofilter&lt; at &gt;bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter

</description>
    <dc:creator>David Relson</dc:creator>
    <dc:date>2008-10-20T11:37:01</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11294">
    <title>Re: Tuning bogogilter</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11294</link>
    <description>
Thanks David,

I cleared my wordlist and retrained it from my corpus.  Also found  
that the newer version I was using used the word 'Spam' instead of  
'Yes' in the X-Bogosity line.  So procmail wasn't doing the sorting  
properly.

So things are working more like I'd expect them to.

You said using the same spam corpus will skew the results...but it  
isn't clear to me in what way it will do that.  I assume you mean that  
it will mark words as being more spammy than they should be...is that  
right?

Thanks for your response.

Mike B.

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

_______________________________________________
Bogofilter mailing list
Bogofilter&lt; at &gt;bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter

</description>
    <dc:creator>barsalou</dc:creator>
    <dc:date>2008-10-20T04:21:12</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11293">
    <title>Re: Tuning bogofilter</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11293</link>
    <description>On Sat, 18 Oct 2008 12:09:24 -0800
barsalou wrote:


H'lo Mike,

We recommend against training over and over with the same messages as
it biases the wordlist.  Training with ham and spam yields a wordlist
that indicates how often individual words (tokens) occur in ham and in
spam.  For a simplified example:  if "xyzw" occurs in 10% of your ham
and in 20% of your spam, then a message with "xyzw" is twice as likely
to be spam as it is to be ham.  If you keep training with the same spam
you skew the results -- which is not recommended.

If you're seeing 0.52000 scores for both ham and spam, then there's
something wrong.  

Bogofilter has flags that will show you how/why it's scoring a message
as ham or spam.  Look in the FAQ for the writeup on "-vv" and "-vvv",
then give these flags a try with sample ham and spam messages to see
what you learn.  Also, bogoutil has a "-p" flag that will show the ham
and spam scores of tokens passed to it.  That is likely to be helpful.

HTH,

David
______________________________</description>
    <dc:creator>David Relson</dc:creator>
    <dc:date>2008-10-18T20:38:32</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11292">
    <title>Tuning bogofilter</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11292</link>
    <description>If I repeatedly use the same set of initial spam messages to train  
bogofilter, will that cause it to work less well?

I have a spam corpus to which I continually add messages.  Then using  
bogominitrain.pl, occasionally retrain.

I'm wondering if this could cause problem.

My concern is born out of looking at the bogosity header and that both  
my "ham" messages and "spam" messages get a spamicity of .52000

Thanks for any guidance.

Mike B.

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

_______________________________________________
Bogofilter mailing list
Bogofilter&lt; at &gt;bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter

</description>
    <dc:creator>barsalou</dc:creator>
    <dc:date>2008-10-18T20:09:24</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11291">
    <title>Re: Berkeley DB vs Sqlite3</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11291</link>
    <description>
Matthias&gt; which one has given you problems?

None (so far).

Matthias&gt; Not all of the bf_* utilities are required for sqlite3
Matthias&gt; databases though, most of them came into existence to help
Matthias&gt; users handle Berkeley DB idiosyncrasies.

I see.

Matthias&gt; If there are contrib/* scripts that don't work with sqlite3,
Matthias&gt; then please tell us which one you wanted to use and how it
Matthias&gt; failed (error messages perhaps) so we can ask the contributor
Matthias&gt; if he wants to update/fix them.

I was looking at e.g. man for bf_compact which says it is meant for
Berkeley, but it probably does not apply for sqlite3.

Matthias&gt; I do /not/ recommended the Berkeley DB version over an
Matthias&gt; established sqlite3. In your situation, I'd suggest to stick
Matthias&gt; with sqlite3.

Thank you for the hint.

Matthias&gt; The speed difference but SQLite3 does not, IMO, matter on
Matthias&gt; personal computers (Berkeley DB has been faster when I
Matthias&gt; compared them long ago, but sqlite3 has been optimized since</description>
    <dc:creator>Gour</dc:creator>
    <dc:date>2008-10-18T13:34:25</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11290">
    <title>Re: Berkeley DB vs Sqlite3</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11290</link>
    <description>

Hi Gour,

which one has given you problems?

Not all of the bf_* utilities are required for sqlite3 databases though,
most of them came into existence to help users handle Berkeley DB
idiosyncrasies.

If there are contrib/* scripts that don't work with sqlite3, then please
tell us which one you wanted to use and how it failed (error messages
perhaps) so we can ask the contributor if he wants to update/fix them.

I do /not/ recommended the Berkeley DB version over an established
sqlite3. In your situation, I'd suggest to stick with sqlite3.

The speed difference but SQLite3 does not, IMO, matter on personal
computers (Berkeley DB has been faster when I compared them long ago,
but sqlite3 has been optimized since).

sqlite3 is easier to handle (there are less points on the checklists to
care) - read through README.db and see for yourself.


Yes, it is. The procedure is:

- stop your mail system
- dump all your *.db files to text files with bogoutil -d.
- install the berkeleydb-based bogofilter
- load the tex</description>
    <dc:creator>Matthias Andree</dc:creator>
    <dc:date>2008-10-18T08:22:40</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11289">
    <title>Berkeley DB vs Sqlite3</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11289</link>
    <description>_______________________________________________
Bogofilter mailing list
Bogofilter&lt; at &gt;bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter
</description>
    <dc:creator>Gour</dc:creator>
    <dc:date>2008-10-18T07:05:15</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.bogofilter.general/11288">
    <title>Re: re-training with -Ns and -Sn</title>
    <link>http://permalink.gmane.org/gmane.mail.bogofilter.general/11288</link>
    <description>On Mon, 6 Oct 2008 12:26:01 -0500
Bill McClain wrote:


"-N" and "-S" are the options to undo the registration of an
incorrectly registered message.

"-Ns" is used when spam was incorrectly registered as ham.
Bogofilter's action for "-Ns" is to lower each token's ham count and
raise the spam count.  For "-Sn" the actions are lower spam count and
raise ham count.

To register an "Unsure" as ham, you should just use "-n" (to tell
bogofilter that the message is ham, _not_ spam).  To register an
"Unsure" as spam, use "-s" (to tell bogofilter that it _is_ spam).

HTH,

David
_______________________________________________
Bogofilter mailing list
Bogofilter&lt; at &gt;bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter

</description>
    <dc:creator>David Relson</dc:creator>
    <dc:date>2008-10-06T23:10:34</dc:date>
  </item>
  <textinput about="http://search.gmane.org/?group=$group=gmane.mail.bogofilter.general">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.mail.bogofilter.general</link>
  </textinput>
</rdf:RDF>
