<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.mail.spam.crm114">
    <title>gmane.mail.spam.crm114</title>
    <link>http://blog.gmane.org/gmane.mail.spam.crm114</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9613"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9609"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9605"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9604"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9601"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9600"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9594"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9593"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9592"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9588"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9587"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9578"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9577"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9576"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9574"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9570"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9574"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9570"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9574"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.mail.spam.crm114/9570"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9613">
    <title>Small bug found</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9613</link>
    <description>&lt;pre&gt;Markovian classifier tries to update the timestamp of the css file, but unfortunately has lost the filename to use. 

Here is a patch:


--- crm_markovian.c.orig        2013-05-13 12:51:52.000000000 +0300
+++ crm_markovian.c     2013-05-13 12:53:15.000000000 +0300
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -148,6 +148,8 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
   //             filename starts at i,  ends at j. null terminate it.
   htext[j] = '\000';

+  int filenameidx = i;
+
   //             and stat it to get it's length
   k = stat (&amp;amp;htext[i], &amp;amp;statbuf);

&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -724,7 +726,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
   {
     int hfd;                  //  hashfile fd
     FEATURE_HEADER_STRUCT foo;
-    hfd = open (&amp;amp;(htext[i]), O_RDWR);
+    hfd = open (&amp;amp;(htext[filenameidx]), O_RDWR);
     dontcare = read (hfd, &amp;amp;foo, sizeof(foo));
     lseek (hfd, 0, SEEK_SET);
     dontcare = write (hfd, &amp;amp;foo, sizeof(foo));



--
Kai.Risku-gEqm1mq8M50&amp;lt; at &amp;gt;public.gmane.org     GSM  +358-40-767 8282
Oy Arrak Software Ab   http://www.arrak.fi






------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
&lt;/pre&gt;</description>
    <dc:creator>Kai Risku</dc:creator>
    <dc:date>2013-05-14T05:58:28</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9609">
    <title>Tuning for throughput</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9609</link>
    <description>&lt;pre&gt;Greetings, denizens of the list.
I am investigating using libcrm114 for mass classification and I need to understand the run-time space and time characteristics. Has anyone else looked at this aspect?

I have running test cases with different volumes of mail that has already been classified. What I see is that the time to classify a message increases in a generally linear fashion as the size of the classifier grows until I get a "large" classifier (15,000 messages or more) where the time required kicks up sharply.  I am guessing that this is due to memory requirements and swapping starts to occur.

Other than the process memory requirements, are there any other known issues with libcrm114 that make its performance non-linear for larger volumes of mail?  If it is using a hash lookup then I expect see linear performance on insertion and lookup.

Thanks,
Martin


Martin Thomas

Research Tools Software Architect
Risk and Compliance Security Research
McAfee, Inc.

972.963.7680 Direct

martin_thomas-pYQlhKXXLhvQT0dZR+AlfA&amp;lt; at &amp;gt;public.gmane.org&amp;lt;mailto:martin_thomas-pYQlhKXXLhvQT0dZR+AlfA&amp;lt; at &amp;gt;public.gmane.org&amp;gt;

------------------------------------------------------------------------------
Own the Future-Intel(R) Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest. Compete 
for recognition, cash, and the chance to get your game on Steam. 
$5K grand prize plus 10 genre and skill prizes. Submit your demo 
by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2_______________________________________________
Crm114-general mailing list
Crm114-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f&amp;lt; at &amp;gt;public.gmane.org
https://lists.sourceforge.net/lists/listinfo/crm114-general
&lt;/pre&gt;</description>
    <dc:creator>Martin_Thomas-dDVk9ABzqyjQT0dZR+AlfA&lt; at &gt;public.gmane.org</dc:creator>
    <dc:date>2013-04-02T14:32:50</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9605">
    <title>shrinking css files for quicker training?</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9605</link>
    <description>&lt;pre&gt;Hi, all,

I've been mostly just letting my crm114 mailfilter installation just "sit"
for a number of years and it just keeps on working, of course.

However, along the way I think I may have expanded my css files too much,
perhaps too concerned about packing density.  I judge this from the fact
that the learning seems entirely too sluggish compared to what it used to
be.  

I assume so much has been trained by now that new training has less than the
desired impact.  This may have worsened due to the number of years the css
files have been in use, and just how much spam has changed over that time.
I would like some of the old training to "fall off the end" but I don't have
the time required to just start over right now.

I see nothing in the documentation to suggest that using cssmerge to move
back into a smaller file would not work, and I'm wondering what to expect
and how to optimize this.

I would rather avoid being forced into a bout of heavy training.  So I'm
thinking about reducing the size of the css files in relatively small steps,
(maybe 10% at a time?) and not worrying about power of 2 boundaries,
assuming that will work.  Or maybe I should just try going by a factor of 2
all at once?

I am currently at 1048577 buckets, and my current packing densities are 0.83
non-spam and 0.92 spam.  Maybe when I'm all done with this process I can
increase the size of my css files again to return to better densities.  But
if I were to reduce my css files by a factor of 8 in this process, I'd guess
off-hand I'd be running with the spam css at something like .99 density.  I
guess I'd find out along the way if this was a problem.

I suppose I could shrink and immediately expand the css in order to weed out
some of the old stuff.  Maybe I could then do this for the spam side only,
leaving the non-spam "experience" fully in place since non-spam is perhaps
under-trained anyway.

A few key settings from mailfilter.cf:

    :decision_length: /6000/
    :clf: /osb unique microgroom/
    :thick_threshold: /1.8/

And I'm using a quite old mailfilter, looks like a July 2006 build.  Some of
the newer developments were breaking my methodologies so I kind of had to
stick with mailfilter.

I'd appreciate any thoughts.

Of course I can just try some things, and see what happens, but for some of
these considerations it may be better not to be making random guesses.

-Kurt



------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
&lt;/pre&gt;</description>
    <dc:creator>Kurt Bigler</dc:creator>
    <dc:date>2012-12-17T23:29:06</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9604">
    <title>crm not converging well on new input</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9604</link>
    <description>&lt;pre&gt;hi crash, and everyone --

i'm running crm from the ubuntu apt repos, version 20100106.

for some reason i tossed my old css files and started from scratch
sometime last year.  after the initial training period, things settled
down, and spam vs. non-spam were sorted very nicely, with a handful
of "unsure" messages on a daily basis.  (i also greylist, which takes
a lot of the load off of crm and the rest of my system.)

5 or 6 weeks ago i subscribed to the git (git-u79uwXL29TY76Z2rM5mHXA&amp;lt; at &amp;gt;public.gmane.org) mailing
list, and as expected, a lot of it was classed as spam at first.  but
i train messages consistently, and i'd expect that to go away quickly.

but it didn't.  i've been consistently getting many messages from that
list classed as UNSURE.  this morning, for instance, i had 18 git
messages classified as UNSURE, out of perhaps 80 that arrived
overnight.

i'll add that i do subscribe to some other vger.kernel.org lists,
and don't have these problems with those -- though they tend to be
lower volume; it's somewhat possible i haven't noticed the issue
with them.

when a good message goes into my "unsure" folder, i run this:

    /usr/bin/crm /home/pgf/ncrm/mailreaver.crm \
-u /home/pgf/ncrm/ --fileprefix=/home/pgf/ncrm/ --good |
    grep ^X-CRM114-

and the result always looks something like this:

    X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
    X-CRM114-CacheID: sfid-20121217_014442_483413_6F434E5A 
    X-CRM114-Notice: Please train this message. 
    X-CRM114-Action: LEARNED AND CACHED GOOD

why might crm be taking so long to train on this new input source?

i'll include my css stats, below.

any ideas?

thanks,
paul


$ cssdiff spam.css nonspam.css
Sparse spectra file spam.css has 1048577 bins total
Sparse spectra file nonspam.css has 1048577 bins total

 File 1 total features            :       611764
 File 2 total features            :       581195

 Similarities between files       :       145867
 Differences between files        :       494454

 File 1 dominates file 2          :       465897
 File 2 dominates file 1          :       435328


$ cssutil -rb spam.css

 Sparse spectra file spam.css statistics: 

 Total available buckets          :      1048577 
 Total buckets in use             :       471530  
 Total in-use zero-count buckets  :            0  
 Total buckets with value &amp;gt;= max  :            0  
 Total hashed datums in file      :       611764
 Documents learned                :         2249  
 Features learned                 :       611767  
 Average datums per bucket        :         1.30
 Maximum length of overflow chain :           44  
 Average length of overflow chain :         2.27 
 Average packing density          :         0.45

$ cssutil -rb nonspam.css

 Sparse spectra file nonspam.css statistics: 

 Total available buckets          :      1048577 
 Total buckets in use             :       470676  
 Total in-use zero-count buckets  :        18763  
 Total buckets with value &amp;gt;= max  :            0  
 Total hashed datums in file      :       581195
 Documents learned                :         2199  
 Features learned                 :       581199  
 Average datums per bucket        :         1.23
 Maximum length of overflow chain :           33  
 Average length of overflow chain :         2.26 
 Average packing density          :         0.45



=---------------------
 paul fox, pgf-YqHaedqsyMbgnLo7xJiJXVwrdKVBSruJ&amp;lt; at &amp;gt;public.gmane.org (arlington, ma, where it's 27.3 degrees)

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
&lt;/pre&gt;</description>
    <dc:creator>Paul Fox</dc:creator>
    <dc:date>2012-12-17T14:35:57</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9601">
    <title>Classifying general email</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9601</link>
    <description>&lt;pre&gt;Hello,

I have an email account that receives a fairly high volume (500-800) of daily emails (500-800), and would like to categorize/classify these emails automatically into about 100 categories/folders.
The last two months I have been trying out POPFile with some limited success. (http://getpopfile.org/)

After I have been inspecting keywords and decision trees in POPFile, it would seem to me like a classifier using phrases for classification might classify this type of emails better than the Naive Bayes implementation in POPFile.

As I'm not a programmer, but trying to learn, I have been searching for preexisting tools that might work for what I want to achieve.
Searching the web I can see that leaves me with the two options: CRM114 or OSBF-lua as classifiers and as I understand CRM114 now uses the OSBF classifier as the default!

Are there any implementations/scripts out there that will allow multiple classes for general email sorting using CRM114 or OSBF-lua as the classification engine?
It seems from what I read that this should be possible, but I'm unable to find any practical implementations to test with.
As I understand both mailfilter.crm and mailreaver.crm use only 3 classifications: 1.spam 2.nonspam 3.unsure, so these would not be useful for me in this regard I presume.

I could use some advice in how to go about this the right way.
Are there any scripts or tools out there that will do general email classification with CRM114 or OSBF-lua that could be implemented with maildrop or procmail on a Linux OS?

Any ideas or pointers would be greatly appreciated.

Best

Lars Sorensen





------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
&lt;/pre&gt;</description>
    <dc:creator>Lars Bjærris</dc:creator>
    <dc:date>2012-12-12T13:17:29</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9600">
    <title>mailreaver: trained mail still classified 'UNSURE'</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9600</link>
    <description>&lt;pre&gt;Hello,

Given a sample mail in file '1':

MFOPTS="-u $HOME/mail/"
MAILFILTER="$HOME/mail/mailreaver.crm"
$MAILFILTER $MFOPTS &amp;lt; 1 &amp;gt; 2

2 now contains header

X-CRM114-Status: UNSURE (   6.88  )

The mail is ham, so I train it

$MAILFILTER $MFOPTS --good &amp;lt; 2 &amp;gt; 3

3 contains

X-CRM114-Action: LEARNED AND CACHED GOOD

Great. Now in my setup, I want to re-inject this trained message into
my mail. Historically (and now) I just save it to INBOX. But what
I would really like to do is have my procmail recipe file it
accordingly. Therefore I'd like to pipe it to e.g. sendmail to be
redelivered. I invoke mailreaver via procmail so this would involve
the trained mail being reclassified. Whenever I try this:

$MAILFILTER $MFOPTS &amp;lt; 3 &amp;gt; 4
grep -i ^x-crm 4
X-CRM114-Action: LEARNED AND CACHED GOOD
X-CRM114-Version: 20090807-BlameThorstenAndJenny ( TRE 0.8.0 (BSD) ) MR-B9CF6B05 
X-CRM114-CacheID: sfid-20121123_140912_163398_EE104312 
X-CRM114-Status: UNSURE (   6.75  )
X-CRM114-Notice: Please train this message. 

So despite being learned and cached good, it is classified as UNSURE
again. I've tried deleting the X-CRM114-* headers before injecting
back to procmail and I've tried leaving them in. I'm aware the 'cached'
bit is no use since the redelivered mail will have additional headers.
I don't want to rely on a cache hit, I'd like the classifier to have
trained itself such that the mail is considered good.

This is a short mail, about 40 lines including headers, which is not
atypical for a personal email to me from my friends.

Should I expect this to work?  Alternatively, I can resolve this by splitting
my procmailrc up into separate files and constructing a concatenation of them
such that I avoid running crm114 over the learned files again, but it will be a
bit of a pain and perhaps brittle.

As per the header above, I'm running version 20090807-6 (Debian package),
although for some reason my mailreaver.crm file differs from that in the
package. It's possible that it is from an earlier version:

$ md5sum /home/jon/mail/mailreaver.crm /usr/share/crm114/mailreaver.crm
09041a4b975f952432dfbcd1ba7152fe  /home/jon/mail/mailreaver.crm
c1b16573137df4d9a01f8162c971308a  /usr/share/crm114/mailreaver.crm

Looking at a diff, I see only whitespace changes.

here's my bucket data

$ cssutil -b -r spam.css

 Sparse spectra file spam.css statistics: 

 Total available buckets          :      1048577 
 Total buckets in use             :       760830  
 Total in-use zero-count buckets  :            0  
 Total buckets with value &amp;gt;= max  :            0  
 Total hashed datums in file      :      1048948
 Documents learned                :         3080  
 Features learned                 :       441932  
 Average datums per bucket        :         1.38
 Maximum length of overflow chain :          417  
 Average length of overflow chain :         6.33 
 Average packing density          :         0.73

$ cssutil -b -r nonspam.css

 Sparse spectra file nonspam.css statistics: 

 Total available buckets          :      1048577 
 Total buckets in use             :       897183  
 Total in-use zero-count buckets  :           20  
 Total buckets with value &amp;gt;= max  :            0  
 Total hashed datums in file      :      1125202
 Documents learned                :         2916  
 Features learned                 :       625412  
 Average datums per bucket        :         1.25
 Maximum length of overflow chain :          551  
 Average length of overflow chain :        11.70 
 Average packing density          :         0.86

I've attached mailfilter.cf should it be of use.


Thanks for any advice!
#  mailfilter.cf  -- Config file for mailfilter, mailreaver, mailtrainer
#  
#    You MUST edit the fileds for "Secret Password", "mime decoder", and
#    "cache_dupe_command".  Just those THREE things.  
#
#     Changes to all other values are optional.
#
#    Many of the options here have two or three alternatives; for your
#     convenience, we have put all of the reasonable alternatives 
#      on sequential lines.  Uncomment the one you want, and leave the
#       others commented out.  If you leave more than one uncommented, the
#       last one is the one that's used.  Don't do that; it's ugly.
#
#   After you edit this file, don't forget to edit 'rewrites.mfp' 

#     ---------&amp;gt;&amp;gt;&amp;gt;  You MUST set the following correctly! &amp;lt;&amp;lt;&amp;lt;-------
#
#    If you leave it as "DEFAULT-PASSWORD", you will not be able to 
#    access the mail-to-myself commanding system, as "DEFAULT-PASSWORD"
#    is specifically _disabled_ as a legal password.  Just pick something, eh?
#
:spw: /DEFAULT_PASSWORD/

# ----- If you want a verbose startup, turn this on.  Note that this is
#  ----- intentionally _after_ the password is set, so a verbose startup
#   ----- will not reveal your password.  
#
#:verbose_startup: /SET/
:verbose_startup: //

#
#     ---------&amp;gt;&amp;gt;&amp;gt;  You MUST set the following correctly! &amp;lt;&amp;lt;&amp;lt;-------
#
#     --- Some mail systems do mime decoding with "mimencode -d" or "-u".
#     --- Others (such as Red Hat 8.0) use "mewdecode" .
#     --- Yet others (such as Fedora Core 3) use "openssl base64 -d" .
#     --- Yet Others (i.e. *BSDs) can use "base64" .
#     --- See which one is on your system and use that one- comment
#     --- the others out.  If you can't figure out what your base64 mime
#     --- decoder is, or don't want mime decoding, set :do_base64: to /no/
#     --- but expect a significant accuracy decrease if you do this.
#
#:do_base64: /no/
:do_base64: /yes/
#
#:mime_decoder: /mewdecode/
#:mime_decoder: /mimencode -d/
#:mime_decoder: /mimencode -u/
#:mime_decoder: /base64 -d/
:mime_decoder: /openssl base64 -d/
#:mime_decoder: /normalizemime/


#     ---------&amp;gt;&amp;gt;&amp;gt;  You MUST set the following correctly! &amp;lt;&amp;lt;&amp;lt;-------
#
#    --- Linux (and Unix) systems use "hardlinks" to make a file 
#    --- appear in more than one place, while not actually using up 
#    --- extra disk space.  Sadly, it is the case that most 
#    --- Windows systems have no such feature.  So, you must set the
#    --- following for what kind of system you are actually using.
#    --  Note to other developers: here's where to put other system-dependent
#    --  syscall commands.
#
#    --- Use the default /ln/ for LINUX and UNIX systems (does a hard-link,
#    --- does not use up disk space or inodes).  Change this to the /copy/
#    --- command for WINDOWS systems (95, 98, NT, XP)
#
#    --- Mild security issue: to avoid a theoretical exploit where a user
#    --- gets their commands re-aliased, make sure you use the fully qualified
#    --- commandname (that is, starting in the root directory).
#
:cache_dupe_command: /\/bin\/ln/
#:cache_dupe_command: /copy/



###########################################################################
#
#                END of things you absolutely MUST set.  Feel free
#            to keep reading though...
#
###########################################################################

###########################################################################
#
#             START of things you might likely want to set.  These 
#            are probably OK for you, but many users change these things.
#
##########################################################################
          
#  ----------- define an optional target for where to send spam (that is,
#   ------------ emails that we want to "fail", or reject to another
#    ------------ address.  Note that this is NOT a "program fault" address,
#     ------------ but where to send "bad" email to in the general case.
#      ------------ You can specify tightly controlled conditions too,
#       ------------ (see the next stanza)  
#   ----------- To NOT forward this to another account, just leave the
#    ----------- address as the empty string, which is '//'.
#     ----------- This works fine especially if your mail reader program
#      ----------- can sort based on the ADV and UNS (or whatever you choose
#       ----------- to use as flagging strings) in the "Subject:" field.
#     ------- CAUTION- some systems are buggy and _REQUIRE_ a user-qDqigq+NkynUF6G2QusZNg&amp;lt; at &amp;gt;public.gmane.org
#    ----- in the following to forward spammy mail correctly.  WTF??? :-(
#
#:general_fails_to: /somebody-oHC15RC7JGQqcZcGjlUOXw&amp;lt; at &amp;gt;public.gmane.org/
:general_fails_to: //


#   -------- If you would prefer to send specific kinds of spam to 
#    -------- different mailboxes, here's where to do it.
#     -------- (be sure to uncomment the line!).  Again, these are
#      --------- not "program fault" conditions, just different filter results.
#
# :fail_priority_mail_to:  /where_priority_fails_go/
# :fail_blacklist_mail_to:  /where_blacklist_fails_go/
# :fail_SSM_mail_to:  /where_Classifier_fails_go_for_mailFILTER/
# :fail_classify_mail_to: /where_classifier_fails_go_for_mailREAVER/


#  ---------  Do we give nonspam, spam, and unsure an exitcode of 0 
#  ---------   (for most standalone apps) or something else?
#  ---------     Usually we use an exit code of 1 for "program fault",
#  ---------      but change it if you need to use 0/1 for good/spam 
#  ---------       Don't use an exit code greater than 128 (it breaks BASH!)
#  ---------     If you use exit codes (procmail doesn't) change it here.
:rejected_mail_exit_code: /0/
:accepted_mail_exit_code: /0/
:unsure_mail_exit_code: /0/
:program_fault_exit_code: /1/

#######################################################################
#
#         END of things you are likely to want to change.  
#
#         Anything following is starting to approach true customization.
#        Feel free to explore and poke around.
######################################################################

# -----------Do we want to add the optional headers to the mail?
# -----------If turned on, will add X-CRM114-Whatever: headers on each
# -----------incoming email.  (note- this does NOT turn off the cache-id header
#
:add_headers: /yes/
#:add_headers: /no/


# ---------  do we add the statistics report?
#:add_verbose_stats: /yes/
:add_verbose_stats: /no/


# ---------  do we add the mailtrainer report to the top of the message body
# ---------  after training?
#:add_mailtrainer_report: /yes/
:add_mailtrainer_report: /no/


#  ---------  Do we enable long-form explains (with lots of text)?
#  -- you can have no extra stuff, add it as text, or add it as an attachment.
#  -- (only available in mailfilter, not mailreaver)
#
:add_extra_stuff: /no/
# :add_extra_stuff: /text/
# :add_extra_stuff: /attachment/


#  ---------  Do we want to insert a "flagging" string on the subject line, 
#  ---------  perhaps to insert an 'ADV:'  ?  Whatever string we put here
#  ---------  will be inserted at the front of the subject if we think the
#  ---------  mail is spam.
#
 :spam_flag_subject_string: //
#:spam_flag_subject_string: /ADV:/

#  ---------  Do we want to insert a "flagging" string on the subject line
#  ---------  for good email?  Usually we don't.... so we set this to the
#  ---------  null string - that is, //
:good_flag_subject_string: //

#  ------------Similarly, do we want to insert a "flagging" string on 
#  -------------the subject line of an "unsure" email?  This way we know
#  --------------we need to train it even if "headers" is turned off.
:unsure_flag_subject_string: /~/

# ------------- Do we want Training ConFirmation flags on the results of
# ------------- a message to be learned?  Default is "TCF:".
#:confirm_flag_subject_string: /TCF:/
:confirm_flag_subject_string: //


# ---------  Do we want to do any "rewrites" to increase generality and
#  ---------- (usually) accuracy?  IF 'yes', be sure to edit rewrites.mfp!
#    --------- NOTE: this option is somewhat slow.  If your mailserver is
#      --------- maxed out on CPU, you might want to turn this off.
#
:rewrites_enabled: /yes/
#:rewrites_enabled: /no/


#  ---------  Do we copy incoming text into allmail.txt ?  default is yes, but
#   ---------  experienced users will probably set this to 'no' after testing
#    ---------  their configuration for functionality.
#
#:log_to_allmail.txt:  /yes/
:log_to_allmail.txt: /no/

#   -------  Another logging option - log all mail to somewhere else
#    -------  entirely.  Whatever pathname is given here will be prefixed
#     -------  by :fileprefix: 
#      -------  To not use this, set it to the null string .. //   
#       -------  Remember to backslash-escape path slashes!
:log_all_mail_to_file: //
#:log_all_mail_to_file: /my_personal_mail_log_file_name.txt/

#     --------- When we log messages, should we use a mail separator?
#      --------- And, if so, what?
:mail_separator: /\n-=-=-=-=-=-=- cut here -=-=-=-=-=-=-\n/

#
#     ---------- Message Cacheing for retraining - do we keep a cache of
#    ---------- messages we've classified recently "in the wild" as retrain
#   ---------- texts?  This uses up some disk space, but means that we can
#  ---------- use mailtrainer.crm on these messages to autotune the classifier.
# ---------- Default is to cache into the directory reaver_cache ; 
#  ---------- if you don't want this, set it to // .  If you don't use this,
#   ---------- you can't really use mailtrainer.crm, and you must keep your
#    ---------- headers scrupulously clean in all train messages.  Recommended
#     ---------- to leave this unchanged unless you are VERY short of disk.
#
:text_cache: /reaver_cache/
# :text_cache: //


#   ----- How do we invoke the trainer (as in, just the invocation 
#   ------ of CRM114 on mailtrainer.crm.  Usually this is just obvious,
#   ------- but if you don't have CRM114 installed in the search path, here's
#   -------- where you can set trainer invocation to be via whatever path
#   --------- you want it (the second example is if you haven't installed
#   ---------- CRM114 at all, but are running the crm114_tre static binary
#   ----------- right out of the local directory.)
#      
#     -- use this next one if you have installed CRM114 with "make install"
#     -- (This is preferred and is the default)
:trainer_invoke_command: /.\/mailtrainer.crm/
#
#     -- use this one if you can't do a "make install" and so must run the
#     --- crm114_tre binary directly out of the current working directory.
# :trainer_invoke_command: /.\/crm114_tre mailtrainer.crm /


#    ------  If we're cacheing for retraining, we're probably using
#     ------  mailtrainer.crm or some other variant.  In that case,
#      ------  you will want a "randomizer" to present the training
#       ------  examples to the classifier in some random but balanced order.
#        ------  You have two choices - you can either use the "sort"
#         ------  command on some random character in the filename (this
#          ------  is NOT recommended) or use the "shuffle.crm" program.
#           ------  We _strongly_ recommend using shuffle.crm; the default
#            ------  options to shuffle.crm will work fine.  Alternatively,
#             ------  you can use the "sort --key 1.2" on date-named files to
#              -----   achieve chronological training
:trainer_randomizer_command: /.\/shuffle.crm/
#:trainer_randomizer_command: /.\/crm114 shuffle.crm/
#:trainer_randomizer_command: /sort --key 1.2/


#  ---------  Do we log rejected mail to a file?  default yes, but most
#   ---------  users should set this to no after testing their 
#    ---------  configuration to verify that rejected mail goes to the 
#     ---------  reject address.  Only works in mailfilter.crm
#
#:log_rejections: /yes/
:log_rejections: /no/

#  ------- alternate rejection logging - set this pathname to non-null
#   ------  to log rejections elsewhere.  Only for mailreaver.crm.
#    -----   Set to NULL ( // ) to turn this off.
:log_rejections_to_file: //
#:log_rejections_to_file /this_is_my_rejected_email_log_file.txt/
 

#   ----------Do we want to enable "inoculation by email"?
#   --------(leave this off unless you want RFC inoculations)
#
:inoculations_enabled: /no/
#:inoculations_enabled: /yes/


#  --------- How many characters of the input do we really trust to be 
#  ---------- worthy of classification?  Usually the first few thousand
#  ----------- bytes of the message tell more than enough (though we've 
#  ------------ been "noticed" by spammers, who are now packing 4K of 
#  ------------- innocuous text into their headers.  No problemo... :) )
#
#:decision_length: /4096/
#:decision_length: /64000/
:decision_length: /16000/
#  -----  for entropy users ONLY - 4K is plenty!
#:decision_length: /4096/



#  ------------ Do we want to expand URLs (that is, fetching the contents
#  ------------- of a URL and inserting that after the url itself?)
#  -------------- By default this is off, but turn it on if you want
#  --------------- to experiment.  
:expand_urls: /no/
# :expand_urls: /yes/
#                 
#         WGET options - 30-second timeout, output to stdout. 
#         HACK - use the proper --user-agent="IEblahblah" for max effect!
:url_fetch_cmd:  /wget -T 30 -O -  /
#         and trim the URL text to not more than 16bytes of text.
:url_trim_cmd:  / head -c 16000 /


#######################################################################
#
#   -------------------  YOU REALLY SHOULD STOP HERE -------------------
#   ---------  values below this line are usually OK for almost all 
#   ---------  users to use unchanged - Gurus only beyond this point.
#
#######################################################################
#
#   If you want to change things here, go ahead, but realize you are
#   playing with things that can really hurt accuracy.  
#
#   This being open source, if you don't *think* about changing it,
#   what would be the use of it being open source?  That said, this
#   _is_ open source- you break it, you get to keep _both_ pieces!
#
#
#   ------------ CLF - The Classifier Flags ----------
#
#   ---------  Which classifier flags do we use?  Default for 20060101 has 
#   ---------  been changed to OSB UNIQUE MICROGROOM.  
#
#   ---------  A null setting gets you straight Markovian, without
#   ---------  microgrooming.   OSB uses less memory, is faster,
#   ---------  and is usually more accurate.  Correlative matching is 
#   ---------  100x - 1000x slower, but can match anything (binaries,
#   ---------  wide chars, unicode, virii, _anything_.  Winnow is a
#   ---------  non-statistical learning classificer with very nice 
#   ---------  accuracy (up to 2x SBPH).  Hyperspace is a pseudogaussian
#   ---------  KNN (K-nearest-neighbor) matcher.
#
#   ---------  This is also where we set whether to use microgrooming
#   ---------  or Arne optimization (they're currently mutually exclusive).
#   ---------  If you turn off microgrooming you get Arne optimization
#   ---------  automatically.
#
#   ---------  If you _change_ this, you _must_ empty out your .css or
#   ---------  .cow files and build fresh ones, because these
#   ---------  classifiers do NOT use compatible data storage formats!
#
#:clf: /microgroom/
#:clf: /osb/
#:clf: /osb microgroom/
:clf: /osb unique microgroom/
#:clf: /correlate/
#:clf: /winnow/
#:clf: /osbf/
#:clf: /osbf microgroom/
#:clf: /hyperspace/
#:clf: /hyperspace unique/
#
#
#
#     --------Thresholds for GOOD/UNSURE/SPAM thick-threshold training
#     -------
#     ------ A very small thick threshold (or zero!) works for Markovian.
#     ----- A thick threshold of 5 to 20 seems good for OSB, OSBF, 
#     ---- Hyperspace, Bit-Entropy, and Winnow.  If you want an asymmetric
#     --- threshold system, you can do that by having :good_threshold:
#     -- be different from :spam_threshold:.  The defaults are +/- 10.0
#
#
#   ---- Things rated equal to or better than this are GOOD email
#:good_threshold: /0.01/
#:good_threshold: /5.0/
:good_threshold: /10.0/
#:good_threshold: /20.0/
#
#   ---- Things rated less than or equal to this are SPAM
#:spam_threshold: /-0.01/
:spam_threshold: /-5.0/
#:spam_threshold: /-10.0/
#:spam_threshold: /-20.0/
  
#   ---- mailfilter uses a single threshold and operates symmetrically.
#   --- (this is only to provide backward compatibility)
:thick_threshold: /5.0/

#   ---- What regex do we use for LEARN/CLASSIFY?  the first is the
#   ---- "old standard".  Other ones are handy for different spam
#   ---- mixes.  The last one is for people who get a great deal of
#   ---- packed HTML spam, which is almost everybody in 2003, so it
#   ---- used to be the default.  But since spammers have shifted away
#   ---- from this, it isn't the default any longer.  IF you change
#   ---- this, you MUST rebuild your .css files with decent
#   ---- amounts of locally-grown spam and nonspam ( if you've been
#   ---- following instructions and using the "reaver" cache, this is
#   ---- easily done! )
#
:lcr: /[[:graph:]]+/
#:lcr: /[[:alnum:]]+/
#:lcr: /[-.,:[:alnum:]]+/
#:lcr: /[[:graph:]][-[:alnum:]]*[[:graph:]]?/
#:lcr: /[[:graph:]][-.,:[:alnum:]]*[[:graph:]]?/
#
#  this next one is pretty incomprehensible, and probably wrong...
#:lcr: /[[:print:]][/!?\#]?[-[[:alnum:]][[:punct:]]]*(?:[*'=;]|/?&amp;gt;|:/*)?
#
#
#     Expansions for antispamming.  You almost _always_ want these on,
#     unless you're debugging something really bizarre.

#  ---------  Do we enable spammus interruptus undo?  
:undo_interruptus: /no/
#:undo_interruptus: /yes/
#
#
#
# ------------ HIGHLY EXPERIMENTAL - automatic training!
#             enable this only if you really want to live VERY dangerously!
#              "Do you feel lucky today, punk?  Well, do ya?"
#
:automatic_training: /no/
#
#       ---- if you are living dangerously and have turned on autotraining,
#            you should also set the following to point to an address that
#            will get read on a quick basis, becuause this is where autotrain
#            verifications will go.
# 
#:autotrain_address: /root/
#
------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov_______________________________________________
Crm114-general mailing list
Crm114-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f&amp;lt; at &amp;gt;public.gmane.org
https://lists.sourceforge.net/lists/listinfo/crm114-general
&lt;/pre&gt;</description>
    <dc:creator>Jon Dowland</dc:creator>
    <dc:date>2012-11-24T10:56:13</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9594">
    <title>Twitter</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9594</link>
    <description>&lt;pre&gt;I'm writing a bot to find people to follow on Twitter who post
relevant material on a given topic. I need to be able to avoid
spammers, follow bots and Beliebers. What's the best classifier to use
on short text like a tweet or a Twitter bio? I'm trying to make this
bot behave more like a human, discovering content by crawling the
social graph rather than just following a fixed set of hash tags.

Chris
--
"Nothing says, 'I love you,' like 200 feet of parachute cord and a cargo net."
   - Fred, Scooby-Doo! Mystery Incorporated

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
&lt;/pre&gt;</description>
    <dc:creator>Chris Babcock</dc:creator>
    <dc:date>2012-09-16T18:44:37</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9593">
    <title>clean up libcrm114 release?</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9593</link>
    <description>&lt;pre&gt;Hello,
I would like to build a FreeBSD port for libcrm114 and have some
problems with the current .tar.gz file.
Is it possible to prepare a new libcrm114 release tarball with
- a Makefile for shared libraries,
- single crm114.h as a public interface,
- without compiled objects?

Several Makefile versions have been written already, e.g. on github:
https://github.com/pmundkur/libcrm114/blob/master/Makefile
https://github.com/jflatow/libcrm114/blob/master/Makefile
IMHO it would just be nice to have a canonical one to build upon.

&lt;/pre&gt;</description>
    <dc:creator>Martin Schütte</dc:creator>
    <dc:date>2012-06-04T11:39:05</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9592">
    <title>libcrm114 language bindings</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9592</link>
    <description>&lt;pre&gt;Hello,
I wrote a libcrm114 module for Perl and saw there are also modules for
other languages.

Maybe these could be mentioned in the Wiki:
- Erlang (https://github.com/jflatow/erlcrm114)
- Python (https://github.com/pmundkur/libcrm114)
- PHP (http://gymx.net/php-crm114/)
- Perl (https://metacpan.org/module/Text::AI::CRM114)

&lt;/pre&gt;</description>
    <dc:creator>Martin Schütte</dc:creator>
    <dc:date>2012-06-04T11:37:04</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9588">
    <title>priolist.mfp regex problem</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9588</link>
    <description>&lt;pre&gt;I have been successfully using priolist.mfp to match email addresses that I
want to black/whitelist for ages. Now I want to block on a particular phrase
such as:

yahoo messenger online now

which is from a particularly egregious sort of spam/scam my organization is
receiving. I have tried putting the following combinations, none of which have
worked:

-yahoo messenger online now
-yahoo\ messenger\ online\ now
-yahoo\smessenger\sonline\snow
-"yahoo messenger online now"

and probably various others which I cannot now reproduce. I really expected the
first one to work. I have finally stumbled upon a combination which does work:

-yahoo.messenger.online.now

Why would . work (I'm plenty familiar with regex and know it will match
anything) but \s to match the spaces or simply raw spaces not work?

Is there some unintentional parsing being done on the whitespace I put in which
is breaking things?

Thanks!

&lt;/pre&gt;</description>
    <dc:creator>Tracy Reed</dc:creator>
    <dc:date>2012-04-12T00:14:20</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9587">
    <title>libcrm and svm</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9587</link>
    <description>&lt;pre&gt;Hi Bill,

It's been a long time since I last wrote to the list. I'm interested in playing around with libcrm and the SVM classifier, but am a bit confused.

1. Where might I find the "best" version of the library? The link on the wiki seems pretty old. For me, "best" means reliable enough for doing personal spam filtering.
2. The HowTo mentions CRM114_SVM, but I see that there is also CRM114_LIBSVM and that the two choices follow different code paths in crm114_base. Which should I use, or why might I choose one vs. the other?

Best Regards,
   Steve


------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
&lt;/pre&gt;</description>
    <dc:creator>Steve Pellegrin</dc:creator>
    <dc:date>2012-03-29T01:55:14</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9578">
    <title>Confidential information scanning backup files</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9578</link>
    <description>&lt;pre&gt;I was discussing a security issue with my new employer today, about
scanning backups of servers that should not have confidential data on them
for precisely such data. In the short term, scanning the email would do,
especially if attachmants can be scanned. And I thought of CRM114 for the
task, instead of the very slow and painful tools that are often used now.

Is there a toolkit for such scanning? I'd much prefer to avoid take on the
full integration project, but if anyone's already got such a toolkit
assembled, even if it's a commercial toolkit, I'd love to review it for use
at my new workplace.
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d_______________________________________________
Crm114-general mailing list
Crm114-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f&amp;lt; at &amp;gt;public.gmane.org
https://lists.sourceforge.net/lists/listinfo/crm114-general
&lt;/pre&gt;</description>
    <dc:creator>Nico Kadel-Garcia</dc:creator>
    <dc:date>2012-02-28T03:22:22</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9577">
    <title>modest update to CLASSIFY_DETAILS.txt</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9577</link>
    <description>&lt;pre&gt;Hi, All.

I'm back to looking at CRM114, and I decided to update some of the
documentation, working from "the (possibly) slightly unstable latest
mainline version" -- thanks for the continued good work, and I hope the 
updates prove useful.  This is the first.

/Jskud

--- crm114-20120205-ORIG/CLASSIFY_DETAILS.txt2009-09-11 11:25:57.000000000 -0700
+++ crm114-20120205/CLASSIFY_DETAILS.txt2012-02-06 07:49:00.000000000 -0800
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -18,7 +18,8 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 The current distribution builds in this set of classifiers.  The
 classifiers are:
 
-1) SBPH Markovian (the default) This is an extension of Bayesian
+1) SBPH Markovian (the default) - This classifier uses
+   Sparse Binary Polynomial Hashing (SBPH), an extension of Bayesian
    classification, mapping features in the input text into a Markov
    Random Field.  This turns each token in the input into 2^(N-1)
    features, which gives high accuracy but at high computation
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -52,7 +53,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
    other classifiers.  It _will_ work against binary files, though,
    which none of the other classifiers will.
 
-5) Hyperspatial classification - this experimental classifier
+5) Hyperspatial classification - This experimental classifier
    tokenizes, but does not use Bayes law at all, nor statistical
    "clumping".  During learning, each example document generates a
    single point in a 4 billion dimensional hyperspace.  The
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -719,7 +720,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 
 
 
-The format of a SBPH or OSB Markovian .css file (and, for winnow a
+The format of a SBPH or OSB Markovian .css file (and, for Winnow a
 .cow file) is a 64-bit hash of a feature (whether the feature is a
 single word, a bigram, or a full SBPH does not matter) and a 32-bit
 representation of the value.  In .css files, the 32 bits is an

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
&lt;/pre&gt;</description>
    <dc:creator>Jskud.CRM114-hFAK3oOPH3QAvxtiuMwx3w&lt; at &gt;public.gmane.org</dc:creator>
    <dc:date>2012-02-07T06:27:59</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9576">
    <title>updates to CRM114_Mailfilter_HOWTO.txt</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9576</link>
    <description>&lt;pre&gt;Hi, All.

I'm back to looking at CRM114, and I decided to update some of the
documentation, working from "the (possibly) slightly unstable latest
mainline version" -- thanks for the continued good work, and I hope the 
updates prove useful.  This is the second of two.

I mostly reworked the formatting to make it consistent, and made the
step titles consistent as well, fixing a few obvious typos in the
process.

/Jskud

--- crm114-20120205-ORIG/CRM114_Mailfilter_HOWTO.txt2009-09-11 11:25:57.000000000 -0700
+++ crm114-20120205/CRM114_Mailfilter_HOWTO.txt2012-02-06 19:52:36.000000000 -0800
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -7,7 +7,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 The CRM114 &amp;amp; Mailfilter HOWTO
 
     -Bill Yerazunis, 2003-09-18
-(last update 2009-03-02)
+(last update 2012-02-06)
 
 
 This is the CRM114 Mailfilter HOWTO.  It describes how to set up CRM114
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -31,7 +31,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 
    ----------------------------------------------------------
 
-That said, I hope CRM114, Mailreaver, and Mailreaver is useful to you;
+That said, I hope CRM114, Mailfilter, and Mailreaver is useful to you;
 it's been very useful to me.  It's been keeping my mailbox clear of
 clutter for since 2002; I'm convinced it has better performance than
 I-the-human at killing spam without accidentally deleting important
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -64,12 +64,15 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 
      - Bill Yerazunis (wsy-MXLR4ItNKm8&amp;lt; at &amp;gt;public.gmane.org)
 
--------------------------------------------------------------------
 
-Step 0:  Scientes Inamicae  (Know Thy Enemy)
+------------------------------------------------------------------------
+------------------------------------------------------------------------
+
+
+      Step 0: Scientes Inamicae  (Know Thy Enemy)
 
-These are the major steps in using CRM114 Mailfilter.  The steps are
-pretty simple:
+These are the other major steps in using CRM114 Mailfilter.  The steps
+are pretty simple:
 
       1) Downloading what you need
 
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -85,7 +88,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
          (editing one file, most likely change is ONE line, and we tell
  you which one)
 
-      3) Setting up the needed auxilliary files
+      4) Setting up other needed files
 
  (not more than 2 files to edit of no more than 5 lines each,
  plus typing one or two commands)
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -115,10 +118,11 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
  don't need to know this, but you may find it useful.
 
 
+------------------------------------------------------------------------
+------------------------------------------------------------------------
 
--------------------------------------------------------------------------
 
-                  Step 1: Downloading.
+Step 1: Downloading What You Need
 
 Get yourself a copy of a CRM114 kit.  The kits can always be found by
 visiting the CRM114 homepage at:
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -168,16 +172,14 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 
 Download the kits you will need (at least one of .src.tar.gz or
 .i386.tar.gz or .i386.rpm) and then proceed to "Step 2: Setting Up the
-Executables"
-
-
-
---------------------------------------------------------------------------
+Executables".
 
 
+------------------------------------------------------------------------
+------------------------------------------------------------------------
 
 
-                       Step 2: Setting Up the Executables
+Step 2: Setting Up the Executables
 
 In this step, you will install four binaries into your system.
 The four binaries are:
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -262,8 +264,9 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 
 
 Congratulations!  You've now completed the installation of CRM114 and
-utilities from prebuilt binaries.  Proceed to "Step 3: Setting Up Needed
-Files.
+utilities from prebuilt binaries.  Proceed to "Step 3: Configuring
+Mailfilter or Mailreaver.
+
 
   -----
 
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -382,14 +385,14 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 
 
 Congratulations!  You've now completed the installation of CRM114 and
-utilities from source.  Move on to the next step - "Step 3: Setting Up
-Your .CSS Files" .
-
+utilities from source.  Move on to the next step - "Step 3: Configuring
+Mailfilter or Mailreaver".
 
 
 ------------------------------------------------------------------------
 ------------------------------------------------------------------------
 
+
 Step 3: Configuring Mailfilter or Mailreaver
 
 In this step you will tell Mailfilter or MailReaver what you want it
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -500,11 +503,11 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 Now, proceed to "Step 4: Setting Up Other Needed Files" .
 
 
---------------------------------------------------------------------
---------------------------------------------------------------------
+------------------------------------------------------------------------
+------------------------------------------------------------------------
 
 
-Step 4: Setting Up Other Needed Files
+Step 4: Setting Up Other Needed Files
 
 Now that the crm114 language is working, you need to set up your
 .css files,  your rewrites.mfp file, and your priolist.mfp file.
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -512,8 +515,8 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 All of these files need to exist (either by being there, or by
 being symlinked to) the directory where CRM114 will "run in"
 when an actual mail comes in.  Usually this is your per-user
-directory on the mail server (if your mail server is also your
-home directory, then it's there.).    If this is inconvenient,
+directory on the mail server (if your mail server also provides your
+home directory, then it's there.).  If this is inconvenient,
 you can use the --fileprefix option on the command line to
 tell CRM114 to "change over" to a different directory.  The files
 that need to be in the home (or --fileprefix) directory are:
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -582,9 +585,9 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 whitelists", you can now say "yes, and they're even _prioritized_
 blacklists and whitelists!".
 
+  -----
 
-
-       Step 4 Part 1 - Setting up the Rewrites file.
+  Step 4 Part 1 - Setting up the Rewrites file.
 
 To set up the rewrites.mfp file, edit the file "rewrites.mfp" and
 replace the placeholders (in this case, "wsy", "merl.com", and
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -628,23 +631,23 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 router, etc, add lines in rewrites.mfp for each email name, email
 address, server, router, and so forth.  This is something you really
 _should_ do, if you have more than one email path leading to the
-account that leads to an account that is being filtered by CRM114 (if
+account that leads to an account that is being filtered by CRM114.  (If
 you don't, a lot of learning will have to be repeated for each path,
 which will cost you accuracy and use up valuable feature slots in the
 .css files that you could use in more valuable ways otherwise.  On the
 other hand, if you have multiple email addresses that all channel
-through one CRM114 fileset, and the addresses recieve very different
+through one CRM114 fileset, and the addresses receive very different
 ratios of spam and nonspam (or, very differnt *types* of spam), then
 it _might_ be to your advantage to not use rewrites.mfp, (just replace
 it with an empty file), so that the extra statistical information of
-the incoming email address is not lost)
+the incoming email address is not lost.)
 
 If all this confuses you to no end, just make rewrites.mfp be an
-empty file and everything should decently well.
+empty file and everything should work decently well.
 
-       -----
+  -----
 
-       Step 4 Part 2 - Setting up the .CSS files
+  Step 4 Part 2 - Setting up the .CSS files
 
 
 You have a choice here.  You can either build your own files from your
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -662,7 +665,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 
 If your mail service runs on your local machine (say, you have just
 one machine - and I do hope you have a firewall in that case), then
-mailfilter will almost certainly "run" in your home directory- the
+mailfilter will almost certainly "run" in your home directory - the
 directory you're in when you log in.
 
 If your mail service runs on a mail server (not your local machine),
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -695,7 +698,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 Once you have these empty files you will have a high (50% or so)
 error rate for the first few hours, till you have 'taught' CRM114
 what your particular mix of spam and nonspam looks like.  Proceed
-below to "Step 4: Configuring Mailfilter".
+below to "Step 5: Engaging Mailfilter".
 
 Many people want to "preload" their spam collection into CRM114.  This
 used to be a bad idea.  CRM114 is optimized for TOE learning - "Train
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -741,8 +744,8 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 
   -----
 
-    Step 4 Part 2 Method C - BETA TEST - Using mailtrainer.crm to
-    Build .CSS Files
+  Step 4 Part 2 Method C - BETA TEST -
+       Using mailtrainer.crm to Build .CSS Files
 
 New in 20060101 is the "mailtrainer.crm" program.  This program
 accepts two directories of "archetype" good and spam email, and runs
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -767,7 +770,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 
   -----
 
-      Step 4 Part 2 Method D - ALPHA TEST -- MAKEFILE Build And
+   Step 4 Part 2 Method D - ALPHA TEST -- MAKEFILE Build And
         Preload .CSS Files From Fresh Spam and Nonspam
 
  CAUTION - this applies ONLY to kits 20060606 and later!!!  DO NOT DO
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -816,7 +819,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 installs post 20060606 .  Versions prior to that will hose you if
 you do this.
 
- --------
+  -----
 
   Step 4 Part 3 - Checking your installation
 
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -893,6 +896,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 Note: this works fine for the default classifiers like Markov, OSB,
 and OSB Unique, but _not_ for Winnow, Hyperspace, or Corellative
 classifiers; for OSBF classifiers use osbf-util instead of cssutil.
+See ./CLASSIFY_DETAILS.txt for a description of the classifiers.
 
 Type in:
 
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -959,10 +963,12 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 there are similarities.  That's pretty much typical- and it's a good sign
 that your filtering should be quite accurate.
 
-Now, move on to "Step 4: Configuring Mailfilter".
+Now, move on to "Step 5: Engaging Mailfilter".
+
+
+------------------------------------------------------------------------
+------------------------------------------------------------------------
 
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
 
 Step 5: Engaging Mailfilter
 
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -985,7 +991,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 
   -----
 
-      Step 5 Method A: For Procmail and Maildrop Users
+  Step 5 Method A: For Procmail and Maildrop Users
 
 For Procmail users just add a procmail recipe to .procmailrc to run
 CRM114 and mailfilter whenever your other procmail rules fail to
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1011,7 +1017,8 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 To use mailreaver instead of mailfilter, just put "mailreaver.crm"
 in instead of "mailfilter.crm" .
 
-If you get the test message, proceed to "Step 6: Training CRM114".
+If you get the test message, proceed to "Step 6: Training CRM114 and
+Mailfilter".
 
 -----
 
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1059,7 +1066,6 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 ----------------------------------------------------------------------------
 ----------------------------------------------------------------------------
 
-
 Advanced Topic: Huge Emails and Denial Of Service Avoidance
 
 CRM114 has a number of built-in anti-Denial-of-Service (anti-DoS)
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1089,10 +1095,9 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
    mail/crm-spam
 
 
-
   -----
 
-    Step 5 Method B: The .forward hook file
+  Step 5 Method B: The .forward hook file
 
 For .forward hook users you should be aware that you should NOT put a
 direct link to crm in /etc/smrsh; since crm can do arbitrary things,
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1118,7 +1123,8 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
   ----
 
 Once you have engaged CRM114 mailfilter, you now get to train it to
-recognize spam and nonspam.  Proceed to "Step 6: Training CRM114".
+recognize spam and nonspam.  Proceed to "Step 6: Training CRM114 and
+Mailfilter".
 
 Note: CRM114 contains a design decision that you may have to play
 with.  Instead of doing memory management games, which both consume
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1138,7 +1144,9 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 a buffer-shuffling dance to minimize time spent reclaiming and
 compactifying memory.
 
----------------------------------------------------------------------------
+
+------------------------------------------------------------------------
+------------------------------------------------------------------------
 
 
 Step 6: Training CRM114 and Mailfilter
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1162,16 +1170,15 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 interchangeably here; the instructions say "mailfilter.crm" but
 mailreaver.crm works exactly the same way from the user point of view.
 
-   * Mail-to-Myself with In-Line Commands to retrain  (Method A)
-   * shell commands to retrain  (Method B)
-   * Mutt direct interface    (Method C)
-   * Some Other Interface    (Method D)
-
+   * Method A: mail-to-myself with in-line commands to retrain
+   * Method B: shell commands to retrain
+   * Method C: Mutt direct interface
+   * Method D: some other interface
 
 
-Whatever Way You Train : try to train _approximately_ equal amounts of spam and
-nonspam.  If you are within 50% one way or the other, performance will
-be very good.
+Whatever Way You Train: try to train _approximately_ equal amounts of
+spam and nonspam.  If you are within 50% one way or the other,
+performance will be very good.
 
 If you are running mailfilter.crm:
 
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1198,8 +1205,9 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 error per thousand).
 
 
+  -----
 
-     Step 6 Method A: Mail-to-Myself
+  Step 6 Method A: Mail-to-Myself
 
 The first way is to use the in-line command feature.  Just forward
 the mistake back to yourself, with full headers (except edit out any
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1247,25 +1255,24 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 If you are a mailreaver user, you also have a priority system you can
 access, either by editing your priolist.mfp file directly or by
 sending youself email in the following forms (where mypwd is the
-command passworda_regex_pattern is what will be used for priority
+command password, and a_regex_pattern is what will be used for priority
 matching.  Priority matches can occur in both the headers and body of
 the text.)
 
     command mypwd maxprio +a_regex_pattern      - sets a maximum priority GOOD
     command mypwd maxprio -a_regex_pattern      - sets a maximum priority SPAM
-    command mypwd minprio +a_regex_pattern      - sets a maximum priority GOOD
-    command mypwd minprio -a_regex_pattern      - sets a maximum priority SPAM
+    command mypwd minprio +a_regex_pattern      - sets a minimum priority GOOD
+    command mypwd minprio -a_regex_pattern      - sets a minimum priority SPAM
     command mypwd delprio a_regex_pattern       - deletes the first priority
-                                                 list entry that fully matches
-                                                 the regex pattern
-
-
+                                                  list entry that fully matches
+                                                  the regex pattern
 
 
+  -----
 
-    Step 6 Method B: Shell commands to retrain
+  Step 6 Method B: Shell commands to retrain
 
-   &amp;gt;&amp;gt; For mailfilter users (mailreaver is different - skip to below! &amp;lt;&amp;lt;
+   &amp;gt;&amp;gt; For mailfilter users (mailreaver is different - skip to below! &amp;lt;&amp;lt;)
 
 The second way to train in spam and nonspam is to use mailfilter.crm's
 shell command line options.  When you find a spam that was mistakenly
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1286,7 +1293,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
  [[ If you are using mailreaver.crm instead of mailfilter.crm, and
  cacheing is enabled, you don't even need to pipe in the full text in,
  all that's needed is either the intact X-CRM114-CacheID: line or the
- Message-ID line containing an intact sfid.  That's another reason to
+ Message-ID line containing an intact SFID.  That's another reason to
  switch to mailreaver! :) ]]
 
               &amp;gt;&amp;gt; For mailreaver.crm users &amp;lt;&amp;lt;
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1294,7 +1301,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 You're in luck, assuming you have taken the default and left cacheing
 turned on.  All you need to pipe into mailreaver for training is any
 text or text fragment containing an intact X-CRM114-CacheID: line or
-the Message-ID line containing an intact sfid; mailreaver will go get
+the Message-ID line containing an intact SFID; mailreaver will go get
 the exact incoming text of the message and train it, so you don't need
 to worry about munged headers.
 
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1352,9 +1359,9 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
                          file; instead use the file so noted.
 
 
+  -----
 
-
-     Part 6 Method C: For Mutt Users
+  Step 6 Method C: For Mutt Users
 
 (Contributed by Mathieu Doidy and Joost van Baal:)
 
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1375,8 +1382,9 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
    * esc-h will tag a message, falsely classified as spam, as ham.
 
 
+  -----
 
-    Part 6 Method D: Some Other Method
+  Step 6 Method D: Some Other Method
 
 
 There are at least five other ways to retrain CRM114.  Some interface
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1458,12 +1466,11 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 of daily use and about a gigabyte of email).
 
 
-
------------------------------------------------------------------------
-
+------------------------------------------------------------------------
+------------------------------------------------------------------------
 
 
-     Step 7: Adding Priority Lists, Whitelists, and Blacklists
+Step 7: Adding Priority Lists, Whitelists, and Blacklists
 
 If you really want, you can add white, black, and priority lists
 to CRM114.  Most people don't need them, but there are always
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1497,10 +1504,11 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 
 Lastly (well, actually firstly, because prio-listing happens before
 whitelisting or blacklisting) any mail that matches any regex in
-priolist.mfp .  The format of priolist.mfp is that the first character
-on the line is a + or a -, which indicates "whitelist" or "blacklist",
-and the rest of the line is a regex.  These regexes are tested
-in the order given in the file.  An empty file is perfectly acceptable.
+priolist.mfp is handled.  The format of priolist.mfp is that the first
+character on the line is a + or a -, which indicates "whitelist" or
+"blacklist", and the rest of the line is a regex.  These regexes are
+tested in the order given in the file.  An empty file is perfectly
+acceptable.
 
 For examples of how to set up the whitelist, blacklist, and priolist
 files, see the included "whitelist.mfp.example", "blacklist.mfp.example",
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1513,10 +1521,11 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 add, otherwise you may get a rude surprise some day.
 
 
-----------------------------------------------------------------
+------------------------------------------------------------------------
+------------------------------------------------------------------------
 
 
-        Step 8: Useful Utilities
+Step 8: Useful Utilities
 
 You don't _need_ to know the stuff in this section to set up and use
 CRM114 and mailfilter or mailreaver, but it might be useful to you- or
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1539,7 +1548,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 
 
                    The cssutil utility:
-
+                    -------------------
 
 Usage is
 
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1560,9 +1569,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
                 -s css-size  - if no cssfile found, create new
                                cssfile with this many buckets.
                 -S css-size  - same as -s, but round up to next
-                               2^n + 1 boundary.
-
-
+                               2^k + 1 boundary.
 
 
        The cssdiff utility
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1572,8 +1579,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 
     ./cssdiff somefile.css anotherfile.css
 
-which writes out a summary of how two different .css files are.
-
+which writes out a summary of how different two .css files are.
 
 
                     The cssmerge utility
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1600,19 +1606,16 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
      -s NNNN      -new file length, if needed
 
 
-
-
-
-Enlarging a .css file
-                ---------------------
+    Enlarging a .css file
+                    ---------------------
 
 One of the advantages of CRM114 is that the .css files are relatively
 small and of fixed size; they don't grow out of control and never need
 trimming if you use &amp;lt;microgroom&amp;gt;, which is the default.
 
 The disadvantage of this is that if your spam/nonspam discrimination
-is too convoluted, it won't be able to sort them out ( in trek-speak
-this is a high-order nonlinearity in the discrimination function ).
+is too convoluted, it won't be able to sort them out (in trek-speak
+this is a high-order nonlinearity in the discrimination function).
 The fix in this situation is to increase the dimensionality of the
 feature space.  The number of dimensions is about 1/12 the number of
 bytes in the .css files; this works well at about a million dimensions
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1640,7 +1643,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 
 You can even combine steps 1 and 2, because newer versions of cssmerge
 will create a new file if needed (the -s N flag sets the number of slots
-in the new file; -S N does the same thing but rounds up to a 2^N+1
+in the new file; -S N does the same thing but rounds up to a 2^k+1
 boundary, which is recommended ).
 
 For example, here's how to increase the size of the spam.css file
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1657,6 +1660,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 --------------------------------------------------------------------
 
     APPENDIX 1
+
                Using mailtrainer.crm
 
 
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1902,12 +1906,11 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 improve your accuracy still more.
 
 
+------------------------------------------------------------------------
 
----------------------------------------------------------------------
-
-That's all!  If you have errors or updates (or find bugs!) please
-let me know; the best way is to join the CRM114-general mailing list; it's
-on the webpage:
+That's all!  If you have errors or updates (or find bugs!) please let me
+know; the best way is to join the CRM114-general mailing list; it's on
+the webpage:
 
    http://crm114.sourceforge.net
 
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -1920,3 +1923,5 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 Enjoy, and good luck.
 
        -Bill Yerazunis
+
+[]

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
&lt;/pre&gt;</description>
    <dc:creator>Jskud.CRM114-hFAK3oOPH3QAvxtiuMwx3w&lt; at &gt;public.gmane.org</dc:creator>
    <dc:date>2012-02-07T06:31:46</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9574">
    <title>CRM114 php extension</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9574</link>
    <description>&lt;pre&gt;

Hello Khalid

Something is broken, either with my package for libcrm114.a or something 
else. When I set up pecl-crm114 I get this during the configure step:

checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for a sed that does not truncate output... /usr/bin/sed
checking for cc... cc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether cc accepts -g... yes
checking for cc option to accept ISO C89... none needed
checking how to run the C preprocessor... cc -E
checking for icc... no
checking for suncc... no
checking whether cc understands -c and -o together... yes
checking for system library directory... lib
checking if compiler supports -R... no
checking if compiler supports -Wl,-rpath,... yes
checking build system type... i686-pc-linux-gnu
checking host system type... i686-pc-linux-gnu
checking target system type... i686-pc-linux-gnu
checking for PHP prefix... /usr
checking for PHP includes... -I/usr/include/php5 -I/usr/include/php5/main 
-I/usr/include/php5/TSRM -I/usr/include/php5/Zend -I/usr/include/php5/ext 
-I/usr/include/php5/ext/date/lib
checking for PHP extension directory... /usr/lib/php5/extensions
checking for PHP installed headers prefix... /usr/include/php5
checking if debug is enabled... no
checking if zts is enabled... no
checking for re2c... re2c
checking for re2c version... 0.13.5 (ok)
checking for gawk... gawk
checking for crm114 support... yes, shared
checking for libcrm114.a directory... yes, shared
checking whether to enable binary/text data block... yes
checking for tre/regex.h... found in /usr
checking for regfree in -ltre... yes
checking for libcrm114.a... found in /usr/lib
checking for crm114_learn_text in -lcrm114... no
configure: error: wrong libcrm114 version or lib not found

This is a snippet from config.log

configure:4468: checking for libcrm114.a
configure:4473: result: found in /usr/lib
configure:4599: checking for crm114_learn_text in -lcrm114
configure:4624: cc -o conftest -g -O2   -lcrm114 conftest.c -lcrm114   &amp;gt;&amp;amp;5
/usr/lib/gcc/i586-suse-linux/4.5/../../../libcrm114.a(crm114_bit_entropy.o): 
In function `stats_2_entropy':
/home/tsp/programs/libcrm/compile/libcrm114-20100726/crm114_bit_entropy.c:853: 
undefined reference to `logl'
/usr/lib/gcc/i586-suse-linux/4.5/../../../libcrm114.a(crm114_bit_entropy.o): 
In function `crm114__init_block_bit_entropy':
/home/tsp/programs/libcrm/compile/libcrm114-20100726/crm114_bit_entropy.c:1637:
undefined reference to `sqrt'
/usr/lib/gcc/i586-suse-linux/4.5/../../../libcrm114.a(crm114_bit_entropy.o): 
In function `crm114_classify_text_bit_entropy':
/home/tsp/programs/libcrm/compile/libcrm114-20100726/crm114_bit_entropy.c:2613: 
undefined reference to `pow'
/usr/lib/gcc/i586-suse-linux/4.5/../../../libcrm114.a(crm114_svm.o): In 
function `crm114_classify_features_svm':
/home/tsp/programs/libcrm/compile/libcrm114-20100726/crm114_svm.c:2593: 
undefined reference to `tanh'
/home/tsp/programs/libcrm/compile/libcrm114-20100726/crm114_svm.c:2594: 
undefined reference to `pow'

    ... &amp;lt;many more, all similar&amp;gt;

collect2: ld returned 1 exit status
configure:4624: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME ""
| #define PACKAGE_TARNAME ""
| #define PACKAGE_VERSION ""
| #define PACKAGE_STRING ""
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define HAVE_LIBTRE 1
| /* end confdefs.h.  */
|
| /* Override any GCC internal prototype to avoid an error.
|    Use char because int might match the return type of a GCC
|    builtin and then its argument prototype would still apply.  */
| #ifdef __cplusplus
| extern "C"
| #endif
| char crm114_learn_text ();
| int
| main ()
| {
| return crm114_learn_text ();
|   ;
|   return 0;
| }
configure:4633: result: no
configure:4745: error: wrong libcrm114 version or lib not found

When I remove /usr/lib/libcrm114.a and use the bundled code all works 
well. As a side note I can add that Bill's simple_demo.c compiles and 
links well with the packaged library. Any idea what's wrong here?

Regards,
Thomas

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
&lt;/pre&gt;</description>
    <dc:creator>Thomas Spahni</dc:creator>
    <dc:date>2012-01-08T18:36:37</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9570">
    <title>rewrites.mfp and IPv6 addresses ... seems to notblow up</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9570</link>
    <description>&lt;pre&gt;Hoi Zaeme,

So, setting up crm114 for someone else led me to a seriously overdue
cleanup of my own rewrites.mfp file.  I decided that it's probably
just doing string replacements to normalize the data, so it should
deal just fine with an IPv6 colon separated address.

So far, it seems to do so just fine, so, Bill, I think you can call
crm114 "IPv6 compliant"

Reto
&lt;/pre&gt;</description>
    <dc:creator>R A Lichtensteiger</dc:creator>
    <dc:date>2012-01-05T18:37:50</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9574">
    <title>CRM114 php extension</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9574</link>
    <description>&lt;pre&gt;

Hello Khalid

Something is broken, either with my package for libcrm114.a or something 
else. When I set up pecl-crm114 I get this during the configure step:

checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for a sed that does not truncate output... /usr/bin/sed
checking for cc... cc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether cc accepts -g... yes
checking for cc option to accept ISO C89... none needed
checking how to run the C preprocessor... cc -E
checking for icc... no
checking for suncc... no
checking whether cc understands -c and -o together... yes
checking for system library directory... lib
checking if compiler supports -R... no
checking if compiler supports -Wl,-rpath,... yes
checking build system type... i686-pc-linux-gnu
checking host system type... i686-pc-linux-gnu
checking target system type... i686-pc-linux-gnu
checking for PHP prefix... /usr
checking for PHP includes... -I/usr/include/php5 -I/usr/include/php5/main 
-I/usr/include/php5/TSRM -I/usr/include/php5/Zend -I/usr/include/php5/ext 
-I/usr/include/php5/ext/date/lib
checking for PHP extension directory... /usr/lib/php5/extensions
checking for PHP installed headers prefix... /usr/include/php5
checking if debug is enabled... no
checking if zts is enabled... no
checking for re2c... re2c
checking for re2c version... 0.13.5 (ok)
checking for gawk... gawk
checking for crm114 support... yes, shared
checking for libcrm114.a directory... yes, shared
checking whether to enable binary/text data block... yes
checking for tre/regex.h... found in /usr
checking for regfree in -ltre... yes
checking for libcrm114.a... found in /usr/lib
checking for crm114_learn_text in -lcrm114... no
configure: error: wrong libcrm114 version or lib not found

This is a snippet from config.log

configure:4468: checking for libcrm114.a
configure:4473: result: found in /usr/lib
configure:4599: checking for crm114_learn_text in -lcrm114
configure:4624: cc -o conftest -g -O2   -lcrm114 conftest.c -lcrm114   &amp;gt;&amp;amp;5
/usr/lib/gcc/i586-suse-linux/4.5/../../../libcrm114.a(crm114_bit_entropy.o): 
In function `stats_2_entropy':
/home/tsp/programs/libcrm/compile/libcrm114-20100726/crm114_bit_entropy.c:853: 
undefined reference to `logl'
/usr/lib/gcc/i586-suse-linux/4.5/../../../libcrm114.a(crm114_bit_entropy.o): 
In function `crm114__init_block_bit_entropy':
/home/tsp/programs/libcrm/compile/libcrm114-20100726/crm114_bit_entropy.c:1637:
undefined reference to `sqrt'
/usr/lib/gcc/i586-suse-linux/4.5/../../../libcrm114.a(crm114_bit_entropy.o): 
In function `crm114_classify_text_bit_entropy':
/home/tsp/programs/libcrm/compile/libcrm114-20100726/crm114_bit_entropy.c:2613: 
undefined reference to `pow'
/usr/lib/gcc/i586-suse-linux/4.5/../../../libcrm114.a(crm114_svm.o): In 
function `crm114_classify_features_svm':
/home/tsp/programs/libcrm/compile/libcrm114-20100726/crm114_svm.c:2593: 
undefined reference to `tanh'
/home/tsp/programs/libcrm/compile/libcrm114-20100726/crm114_svm.c:2594: 
undefined reference to `pow'

    ... &amp;lt;many more, all similar&amp;gt;

collect2: ld returned 1 exit status
configure:4624: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME ""
| #define PACKAGE_TARNAME ""
| #define PACKAGE_VERSION ""
| #define PACKAGE_STRING ""
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define HAVE_LIBTRE 1
| /* end confdefs.h.  */
|
| /* Override any GCC internal prototype to avoid an error.
|    Use char because int might match the return type of a GCC
|    builtin and then its argument prototype would still apply.  */
| #ifdef __cplusplus
| extern "C"
| #endif
| char crm114_learn_text ();
| int
| main ()
| {
| return crm114_learn_text ();
|   ;
|   return 0;
| }
configure:4633: result: no
configure:4745: error: wrong libcrm114 version or lib not found

When I remove /usr/lib/libcrm114.a and use the bundled code all works 
well. As a side note I can add that Bill's simple_demo.c compiles and 
links well with the packaged library. Any idea what's wrong here?

Regards,
Thomas

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
&lt;/pre&gt;</description>
    <dc:creator>Thomas Spahni</dc:creator>
    <dc:date>2012-01-08T18:36:37</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9570">
    <title>rewrites.mfp and IPv6 addresses ... seems to notblow up</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9570</link>
    <description>&lt;pre&gt;Hoi Zaeme,

So, setting up crm114 for someone else led me to a seriously overdue
cleanup of my own rewrites.mfp file.  I decided that it's probably
just doing string replacements to normalize the data, so it should
deal just fine with an IPv6 colon separated address.

So far, it seems to do so just fine, so, Bill, I think you can call
crm114 "IPv6 compliant"

Reto
&lt;/pre&gt;</description>
    <dc:creator>R A Lichtensteiger</dc:creator>
    <dc:date>2012-01-05T18:37:50</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9574">
    <title>CRM114 php extension</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9574</link>
    <description>&lt;pre&gt;

Hello Khalid

Something is broken, either with my package for libcrm114.a or something 
else. When I set up pecl-crm114 I get this during the configure step:

checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for a sed that does not truncate output... /usr/bin/sed
checking for cc... cc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether cc accepts -g... yes
checking for cc option to accept ISO C89... none needed
checking how to run the C preprocessor... cc -E
checking for icc... no
checking for suncc... no
checking whether cc understands -c and -o together... yes
checking for system library directory... lib
checking if compiler supports -R... no
checking if compiler supports -Wl,-rpath,... yes
checking build system type... i686-pc-linux-gnu
checking host system type... i686-pc-linux-gnu
checking target system type... i686-pc-linux-gnu
checking for PHP prefix... /usr
checking for PHP includes... -I/usr/include/php5 -I/usr/include/php5/main 
-I/usr/include/php5/TSRM -I/usr/include/php5/Zend -I/usr/include/php5/ext 
-I/usr/include/php5/ext/date/lib
checking for PHP extension directory... /usr/lib/php5/extensions
checking for PHP installed headers prefix... /usr/include/php5
checking if debug is enabled... no
checking if zts is enabled... no
checking for re2c... re2c
checking for re2c version... 0.13.5 (ok)
checking for gawk... gawk
checking for crm114 support... yes, shared
checking for libcrm114.a directory... yes, shared
checking whether to enable binary/text data block... yes
checking for tre/regex.h... found in /usr
checking for regfree in -ltre... yes
checking for libcrm114.a... found in /usr/lib
checking for crm114_learn_text in -lcrm114... no
configure: error: wrong libcrm114 version or lib not found

This is a snippet from config.log

configure:4468: checking for libcrm114.a
configure:4473: result: found in /usr/lib
configure:4599: checking for crm114_learn_text in -lcrm114
configure:4624: cc -o conftest -g -O2   -lcrm114 conftest.c -lcrm114   &amp;gt;&amp;amp;5
/usr/lib/gcc/i586-suse-linux/4.5/../../../libcrm114.a(crm114_bit_entropy.o): 
In function `stats_2_entropy':
/home/tsp/programs/libcrm/compile/libcrm114-20100726/crm114_bit_entropy.c:853: 
undefined reference to `logl'
/usr/lib/gcc/i586-suse-linux/4.5/../../../libcrm114.a(crm114_bit_entropy.o): 
In function `crm114__init_block_bit_entropy':
/home/tsp/programs/libcrm/compile/libcrm114-20100726/crm114_bit_entropy.c:1637:
undefined reference to `sqrt'
/usr/lib/gcc/i586-suse-linux/4.5/../../../libcrm114.a(crm114_bit_entropy.o): 
In function `crm114_classify_text_bit_entropy':
/home/tsp/programs/libcrm/compile/libcrm114-20100726/crm114_bit_entropy.c:2613: 
undefined reference to `pow'
/usr/lib/gcc/i586-suse-linux/4.5/../../../libcrm114.a(crm114_svm.o): In 
function `crm114_classify_features_svm':
/home/tsp/programs/libcrm/compile/libcrm114-20100726/crm114_svm.c:2593: 
undefined reference to `tanh'
/home/tsp/programs/libcrm/compile/libcrm114-20100726/crm114_svm.c:2594: 
undefined reference to `pow'

    ... &amp;lt;many more, all similar&amp;gt;

collect2: ld returned 1 exit status
configure:4624: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME ""
| #define PACKAGE_TARNAME ""
| #define PACKAGE_VERSION ""
| #define PACKAGE_STRING ""
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define HAVE_LIBTRE 1
| /* end confdefs.h.  */
|
| /* Override any GCC internal prototype to avoid an error.
|    Use char because int might match the return type of a GCC
|    builtin and then its argument prototype would still apply.  */
| #ifdef __cplusplus
| extern "C"
| #endif
| char crm114_learn_text ();
| int
| main ()
| {
| return crm114_learn_text ();
|   ;
|   return 0;
| }
configure:4633: result: no
configure:4745: error: wrong libcrm114 version or lib not found

When I remove /usr/lib/libcrm114.a and use the bundled code all works 
well. As a side note I can add that Bill's simple_demo.c compiles and 
links well with the packaged library. Any idea what's wrong here?

Regards,
Thomas

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
&lt;/pre&gt;</description>
    <dc:creator>Thomas Spahni</dc:creator>
    <dc:date>2012-01-08T18:36:37</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9570">
    <title>rewrites.mfp and IPv6 addresses ... seems to notblow up</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9570</link>
    <description>&lt;pre&gt;Hoi Zaeme,

So, setting up crm114 for someone else led me to a seriously overdue
cleanup of my own rewrites.mfp file.  I decided that it's probably
just doing string replacements to normalize the data, so it should
deal just fine with an IPv6 colon separated address.

So far, it seems to do so just fine, so, Bill, I think you can call
crm114 "IPv6 compliant"

Reto
&lt;/pre&gt;</description>
    <dc:creator>R A Lichtensteiger</dc:creator>
    <dc:date>2012-01-05T18:37:50</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.mail.spam.crm114/9563">
    <title>CRM114 php extension</title>
    <link>http://comments.gmane.org/gmane.mail.spam.crm114/9563</link>
    <description>&lt;pre&gt;Hello everybody,

Happy new year :-)

I just released a php extension of the C-callable library crm114.
I use it for a blogs platform, to detect the spam articles, since 2 weeks.

Your library work fine.

Are you agree to let me maintain a official php extension of the crm114
system to web developers ?

Here is the futur project website : http://gymx.net/php-crm114/
The libcrm114 is staticely compiled with the php module.

Based on your documentation, I have implemented the major functions.

I'm open to discuss with you for any improvements or modifications.

Regards,

Khalid Ahsein
------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev_______________________________________________
Crm114-general mailing list
Crm114-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f&amp;lt; at &amp;gt;public.gmane.org
https://lists.sourceforge.net/lists/listinfo/crm114-general
&lt;/pre&gt;</description>
    <dc:creator>Khalid Ahsein</dc:creator>
    <dc:date>2012-01-03T18:25:14</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.mail.spam.crm114">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.mail.spam.crm114</link>
  </textinput>
</rdf:RDF>
