<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.mail.spam.crm114">
    <title>gmane.mail.spam.crm114</title>
    <link>http://blog.gmane.org/gmane.mail.spam.crm114</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9621"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9620"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9619"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9618"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9617"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9616"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9615"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9614"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9613"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9612"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9611"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9610"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9609"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9608"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9607"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9606"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9605"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9604"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9603"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.mail.spam.crm114/9602"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9621">
    <title>Re: [SPAM] Re: Small bug found</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9621</link>
    <description>&lt;pre&gt;
Morning, Crash. (My sleep cycle got all mucked up by a software deployment.)

It's the "Alienvault" ad stuck to the bottom of your emails that's
been generating the spam warnikngs.

Anyway, I  recently got a note that Sourceforge is re-arranging their
git repositories. You and other developers may need to to reset your
upstream git repositories if you haven't already. And worse, over at
Sourceforge, using the "search" function on Sourceforge to look for
"CRM114" now produces pages and pages of other tools with "CRM" in the
name, none of which is CRM114. Looks like they've applied some kind of
fuzzy matching to the search functions, and it's so fuzzy it's
actually moldy.....

I think it may be time to migrate from Sourceforge over to GitHub for
your main public repository. The advertising at github is  much less
burdensome to your browsers and bandwidth, it's a *lot* faster to use,
the download URL's are easily discovered and published, and it has an
excellent model of other developers being able to make their own
forked copies in their own personal github repositories, with their
own branches, and requesting git "pulls" from their branches or tags
or master. I'm using it now and am quite pleased with it.

------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
&lt;/pre&gt;</description>
    <dc:creator>Nico Kadel-Garcia</dc:creator>
    <dc:date>2013-05-15T06:34:49</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9620">
    <title>Re: [SPAM] Re:  Small bug found</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9620</link>
    <description>&lt;pre&gt;

Hmmm... not so fast.   Mainline's makefile is now unhappy on Fedora 17
(I got a new work machine, and didn't do a rebuild as the mail server
is yet a _different_ machine).

Time to actually figure things out!

  - Bill


------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
&lt;/pre&gt;</description>
    <dc:creator>wsy-MXLR4ItNKm8&lt; at &gt;public.gmane.org</dc:creator>
    <dc:date>2013-05-14T20:07:14</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9619">
    <title>Re: [SPAM] Re:  Small bug found</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9619</link>
    <description>&lt;pre&gt;I'd back the idea of a release, Bill, if anything to send out a (life) signal in fact.

BTW glad to see something is still alive out there :)

cheers

On May 14, 2013, at 21:00, wsy-MXLR4ItNKm8&amp;lt; at &amp;gt;public.gmane.org wrote:



------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
&lt;/pre&gt;</description>
    <dc:creator>Corrado</dc:creator>
    <dc:date>2013-05-14T19:46:31</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9618">
    <title>Re: [SPAM] Re:  Small bug found</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9618</link>
    <description>&lt;pre&gt;
Ooops, you're right.  

#ifndef is "not defined".

Anyway, it's now in mainline.  Not much change from before; I 
wonder whether it's worth pushing it out simply to show that the
project is still alive.

Paolo, can you ping me on this?

   - Bill

------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
&lt;/pre&gt;</description>
    <dc:creator>wsy-MXLR4ItNKm8&lt; at &gt;public.gmane.org</dc:creator>
    <dc:date>2013-05-14T19:00:36</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9617">
    <title>Re: [SPAM] Re:  Small bug found</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9617</link>
    <description>&lt;pre&gt;Nope, this is *not* windows, I found the bug on a linux machine (#if_NOT_def CRM_WINDOWS). 

BTW, your emails are raising all kinds of spam-alerts because there seem to be a piece of spam attached to the footer of your last two emails. Check the mail archives for evidence as I removed the crap in order to be able to reply.. :)

--
Kai.Risku-gEqm1mq8M50&amp;lt; at &amp;gt;public.gmane.org     GSM  +358-40-767 8282
Oy Arrak Software Ab   http://www.arrak.fi

-----Original Message-----
From: wsy-MXLR4ItNKm8&amp;lt; at &amp;gt;public.gmane.org [mailto:wsy-MXLR4ItNKm8&amp;lt; at &amp;gt;public.gmane.org] 
Sent: Tuesday, May 14, 2013 19:54
To: Kai Risku
Cc: crm114-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f&amp;lt; at &amp;gt;public.gmane.org
Subject: [Crm114-general] [SPAM] Re: Small bug found

"Kai Risku" &amp;lt;Kai.Risku-gEqm1mq8M50&amp;lt; at &amp;gt;public.gmane.org&amp;gt; writes:

Got it.  Interesting... and I never saw the bug because the whole
problem only happens in Windows and I don't have a Windows machine.

But yes, patch accepted.  Thanks!

   - Bill

&amp;lt;&amp;lt;&amp;lt; crm_markovian.c:717  CODE STARTS HERE&amp;gt;&amp;gt;&amp;gt;

  //    Because mmap/munmap doesn't set atime, nor set the "modified"           
  //    flag, some network filesystems will fail to mark the file as            
  //    modified and so their cacheing will make a mistake.                     
  //                                                                            
  //    The fix is to do a trivial read/write on the .css ile, to force         
  //    the filesystem to repropagate it's caches.                              
  //                                                                            
#ifndef CRM_WINDOWS
  {
    int hfd;                  //  hashfile fd                                   
    FEATURE_HEADER_STRUCT foo;
    hfd = open (&amp;amp;(htext[css_filename_start]), O_RDWR);
    dontcare = read (hfd, &amp;amp;foo, sizeof(foo));
    lseek (hfd, 0, SEEK_SET);
    dontcare = write (hfd, &amp;amp;foo, sizeof(foo));
    close (hfd);
  }
#endif  // !CRM_WINDOWS      

&amp;lt;&amp;lt;&amp;lt;CODE ENDS HERE &amp;gt;&amp;gt;&amp;gt;




------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
&lt;/pre&gt;</description>
    <dc:creator>Kai Risku</dc:creator>
    <dc:date>2013-05-14T17:31:00</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9616">
    <title>[SPAM] Re:  Small bug found</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9616</link>
    <description>&lt;pre&gt;
Got it.  Interesting... and I never saw the bug because the whole
problem only happens in Windows and I don't have a Windows machine.

But yes, patch accepted.  Thanks!

   - Bill

&amp;lt;&amp;lt;&amp;lt; crm_markovian.c:717  CODE STARTS HERE&amp;gt;&amp;gt;&amp;gt;

  //    Because mmap/munmap doesn't set atime, nor set the "modified"           
  //    flag, some network filesystems will fail to mark the file as            
  //    modified and so their cacheing will make a mistake.                     
  //                                                                            
  //    The fix is to do a trivial read/write on the .css ile, to force         
  //    the filesystem to repropagate it's caches.                              
  //                                                                            
#ifndef CRM_WINDOWS
  {
    int hfd;                  //  hashfile fd                                   
    FEATURE_HEADER_STRUCT foo;
    hfd = open (&amp;amp;(htext[css_filename_start]), O_RDWR);
    dontcare = read (hfd, &amp;amp;foo, sizeof(foo));
    lseek (hfd, 0, SEEK_SET);
    dontcare = write (hfd, &amp;amp;foo, sizeof(foo));
    close (hfd);
  }
#endif  // !CRM_WINDOWS      

&amp;lt;&amp;lt;&amp;lt;CODE ENDS HERE &amp;gt;&amp;gt;&amp;gt;


------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d_______________________________________________
Crm114-general mailing list
Crm114-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f&amp;lt; at &amp;gt;public.gmane.org
https://lists.sourceforge.net/lists/listinfo/crm114-general
&lt;/pre&gt;</description>
    <dc:creator>wsy-MXLR4ItNKm8&lt; at &gt;public.gmane.org</dc:creator>
    <dc:date>2013-05-14T16:53:55</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9615">
    <title>Re: [SPAM] Re:  Small bug found</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9615</link>
    <description>&lt;pre&gt;The version is 20100106-BlameMichelson

A quick grep seems to indicate that the other classifiers actually do a strdup to save the filename instead on indexing into the htext buffer. I don't know if there is a risk that the htext buffer gets overwritten, but my patch seems to work properly for me.

I also just noticed that the strdup:ed filenames in the other classifiers never get freed, so you actually have a small memory leak there.. :)

--
Kai.Risku-gEqm1mq8M50&amp;lt; at &amp;gt;public.gmane.org     GSM  +358-40-767 8282
Oy Arrak Software Ab   http://www.arrak.fi


-----Original Message-----
From: wsy-MXLR4ItNKm8&amp;lt; at &amp;gt;public.gmane.org [mailto:wsy-MXLR4ItNKm8&amp;lt; at &amp;gt;public.gmane.org] 
Sent: Tuesday, May 14, 2013 17:07
To: Kai Risku
Cc: crm114-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f&amp;lt; at &amp;gt;public.gmane.org
Subject: [Crm114-general] [SPAM] Re: Small bug found

"Kai Risku" &amp;lt;Kai.Risku-gEqm1mq8M50&amp;lt; at &amp;gt;public.gmane.org&amp;gt; writes:

Interesting... I haven't updated the mainline in a few years (work has sorta focussed over onto the callable library.

Let me pop a distro open.   Which version are you running?  

   - Bill





------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
&lt;/pre&gt;</description>
    <dc:creator>Kai Risku</dc:creator>
    <dc:date>2013-05-14T15:59:51</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9614">
    <title>[SPAM] Re:  Small bug found</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9614</link>
    <description>&lt;pre&gt;
Interesting... I haven't updated the mainline in a few years (work
has sorta focussed over onto the callable library.

Let me pop a distro open.   Which version are you running?  

   - Bill



------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d_______________________________________________
Crm114-general mailing list
Crm114-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f&amp;lt; at &amp;gt;public.gmane.org
https://lists.sourceforge.net/lists/listinfo/crm114-general
&lt;/pre&gt;</description>
    <dc:creator>wsy-MXLR4ItNKm8&lt; at &gt;public.gmane.org</dc:creator>
    <dc:date>2013-05-14T14:06:59</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9613">
    <title>Small bug found</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9613</link>
    <description>&lt;pre&gt;Markovian classifier tries to update the timestamp of the css file, but unfortunately has lost the filename to use. 

Here is a patch:


--- crm_markovian.c.orig        2013-05-13 12:51:52.000000000 +0300
+++ crm_markovian.c     2013-05-13 12:53:15.000000000 +0300
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -148,6 +148,8 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
   //             filename starts at i,  ends at j. null terminate it.
   htext[j] = '\000';

+  int filenameidx = i;
+
   //             and stat it to get it's length
   k = stat (&amp;amp;htext[i], &amp;amp;statbuf);

&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -724,7 +726,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
   {
     int hfd;                  //  hashfile fd
     FEATURE_HEADER_STRUCT foo;
-    hfd = open (&amp;amp;(htext[i]), O_RDWR);
+    hfd = open (&amp;amp;(htext[filenameidx]), O_RDWR);
     dontcare = read (hfd, &amp;amp;foo, sizeof(foo));
     lseek (hfd, 0, SEEK_SET);
     dontcare = write (hfd, &amp;amp;foo, sizeof(foo));



--
Kai.Risku-gEqm1mq8M50&amp;lt; at &amp;gt;public.gmane.org     GSM  +358-40-767 8282
Oy Arrak Software Ab   http://www.arrak.fi






------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
&lt;/pre&gt;</description>
    <dc:creator>Kai Risku</dc:creator>
    <dc:date>2013-05-14T05:58:28</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9612">
    <title>Re: Tuning for throughput</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9612</link>
    <description>&lt;pre&gt;

Oh- one more question:

Did you turn on microgrooming?

    - Bill

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
&lt;/pre&gt;</description>
    <dc:creator>wsy-MXLR4ItNKm8&lt; at &gt;public.gmane.org</dc:creator>
    <dc:date>2013-04-02T15:37:42</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9611">
    <title>Re: Tuning for throughput</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9611</link>
    <description>&lt;pre&gt;

Yes, we have.


It all depends on the classifier, and how much space is left in
various fixed-size hash-tables.  Now, speaking ONLY of classification
(not training!) here's my recollections, which may be wrong:

The Bayes, Markov, and Winnow classifiers use a fixed-size table with in-place
hash overflows; when you get too many features, you start spending
more and more time in chasing the overflow chains.  So yeah, you
can get a "hockey-stick" graph.  The fix is to use a bigger hash
table ("-S" on the command line if you're doing it that way; memory
is cheap).

The Hyperspace and Correlation classifiers takes time linear in the
number of examples, times the length of the input stream.

The LZ compressive classifier takes time proportional to the length
of the input stream and (approximately) the log of the size of
the example set.

The Bit Entropy, SVM, and Neural Network classifier takes time
proportional to the number of bits in the input file.  Nothing else
matters during classification.



The trick is to get a fixed-size footprint in memory, the hash
system uses in-table overflows.  When the table fills up, lookups
become no better than sequential searches.

So, you DO have to pick your table size "wisely".   If it slows down
with your real data, double the table size.  RAM is cheap. :-)

    - Bill

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
&lt;/pre&gt;</description>
    <dc:creator>wsy-MXLR4ItNKm8&lt; at &gt;public.gmane.org</dc:creator>
    <dc:date>2013-04-02T15:32:01</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9610">
    <title>Re: Tuning for throughput</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9610</link>
    <description>&lt;pre&gt;Bill,
Thanks for the response and such great work!

I thought you might have :-)

I am using the libcrm114 c-callable library so I will have to add this
option to my command line handler.  I am seeing a hockey-stick effect
but I am seeing it earlier on a machine with less memory so I guessed it
was a swapping issue.

Also, when I follow the code, it seems that the Bayes classifier in
libcrm114 (don't know about CRM114 itself) uses Markov (the method for 
classify_features calls crm114_classify_features_markov after checking
if OSB_BAYES was selected).  The accuracy is great, so  now I am 
interested in tweaking/understanding the space/time aspect.

 

Thanks again,
Martin


------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
&lt;/pre&gt;</description>
    <dc:creator>Martin_Thomas-dDVk9ABzqyjQT0dZR+AlfA&lt; at &gt;public.gmane.org</dc:creator>
    <dc:date>2013-04-02T16:01:01</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9609">
    <title>Tuning for throughput</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9609</link>
    <description>&lt;pre&gt;Greetings, denizens of the list.
I am investigating using libcrm114 for mass classification and I need to understand the run-time space and time characteristics. Has anyone else looked at this aspect?

I have running test cases with different volumes of mail that has already been classified. What I see is that the time to classify a message increases in a generally linear fashion as the size of the classifier grows until I get a "large" classifier (15,000 messages or more) where the time required kicks up sharply.  I am guessing that this is due to memory requirements and swapping starts to occur.

Other than the process memory requirements, are there any other known issues with libcrm114 that make its performance non-linear for larger volumes of mail?  If it is using a hash lookup then I expect see linear performance on insertion and lookup.

Thanks,
Martin


Martin Thomas

Research Tools Software Architect
Risk and Compliance Security Research
McAfee, Inc.

972.963.7680 Direct

martin_thomas-pYQlhKXXLhvQT0dZR+AlfA&amp;lt; at &amp;gt;public.gmane.org&amp;lt;mailto:martin_thomas-pYQlhKXXLhvQT0dZR+AlfA&amp;lt; at &amp;gt;public.gmane.org&amp;gt;

------------------------------------------------------------------------------
Own the Future-Intel(R) Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest. Compete 
for recognition, cash, and the chance to get your game on Steam. 
$5K grand prize plus 10 genre and skill prizes. Submit your demo 
by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2_______________________________________________
Crm114-general mailing list
Crm114-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f&amp;lt; at &amp;gt;public.gmane.org
https://lists.sourceforge.net/lists/listinfo/crm114-general
&lt;/pre&gt;</description>
    <dc:creator>Martin_Thomas-dDVk9ABzqyjQT0dZR+AlfA&lt; at &gt;public.gmane.org</dc:creator>
    <dc:date>2013-04-02T14:32:50</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9608">
    <title>Re: shrinking css files for quicker training?</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9608</link>
    <description>&lt;pre&gt;I always thought that was part of the idea of it.  Unforch I never made it
to the reaver level of the software.  There was something that kept me stuck
in mailfilter land.  I will approach that again, and perhaps that should be
my next step, as I *also* take on retraining, whenever I can manage it, and
meanwhile I will limp along driven by old corpses [sick [sic]].

-Kurt


On 12/18/12 7:49 AM, "Eugene Crosser" &amp;lt;crosser-Sb1GGz3bhaFAfugRpC6u6w&amp;lt; at &amp;gt;public.gmane.org&amp;gt; wrote:




------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
&lt;/pre&gt;</description>
    <dc:creator>Kurt Bigler</dc:creator>
    <dc:date>2012-12-25T06:34:57</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9607">
    <title>Re: shrinking css files for quicker training?</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9607</link>
    <description>&lt;pre&gt;Wouldn't it be relatively easy and "safe" to use the contents of the
reaver_cache as the corpus for initial training?

Eugene

On 12/18/2012 07:00 PM, wsy-MXLR4ItNKm8&amp;lt; at &amp;gt;public.gmane.org wrote:


------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d_______________________________________________
Crm114-general mailing list
Crm114-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f&amp;lt; at &amp;gt;public.gmane.org
https://lists.sourceforge.net/lists/listinfo/crm114-general
&lt;/pre&gt;</description>
    <dc:creator>Eugene Crosser</dc:creator>
    <dc:date>2012-12-18T15:49:24</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9606">
    <title>Re: shrinking css files for quicker training?</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9606</link>
    <description>&lt;pre&gt;
Kurt Bigler &amp;lt;kkb-qsUyPSV3HvqUK90frp/yKA&amp;lt; at &amp;gt;public.gmane.org&amp;gt; writes:


Hmmm... try cssutil -r &amp;lt;cssfile&amp;gt;  and post the results here.

It may be the case that you do indeed have too much stuff in there and
the relay chains are just getting too long.


Ahhh- "aging" - the Castle Anthrax of text recognition.

That is a problem, which is *not* circumvented by any system I know of.

All of those wonderful examples of spam from five years ago are
completely obsolete, and possibly misleading.

My advice is (sadly) that you can't really "age" it - the easiest 
thing to do (which will at least give you more space and make too-long
chains shorter again) is to use cssmerge to create a bigger file
(disks are bigger now, dude!)

Other than that, you can start again.

Some (a few) versions of cssutil also had the ability to 
zeroize small bin values.  That might be useful at least for
speedup- however, accuracy will suffer as you are lobotomizing
out low-count samples.

Make a copy of your .css files before you do that!

Failing that, you're in for a retrain.  :-(

   - Bill

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
&lt;/pre&gt;</description>
    <dc:creator>wsy-MXLR4ItNKm8&lt; at &gt;public.gmane.org</dc:creator>
    <dc:date>2012-12-18T15:00:27</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9605">
    <title>shrinking css files for quicker training?</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9605</link>
    <description>&lt;pre&gt;Hi, all,

I've been mostly just letting my crm114 mailfilter installation just "sit"
for a number of years and it just keeps on working, of course.

However, along the way I think I may have expanded my css files too much,
perhaps too concerned about packing density.  I judge this from the fact
that the learning seems entirely too sluggish compared to what it used to
be.  

I assume so much has been trained by now that new training has less than the
desired impact.  This may have worsened due to the number of years the css
files have been in use, and just how much spam has changed over that time.
I would like some of the old training to "fall off the end" but I don't have
the time required to just start over right now.

I see nothing in the documentation to suggest that using cssmerge to move
back into a smaller file would not work, and I'm wondering what to expect
and how to optimize this.

I would rather avoid being forced into a bout of heavy training.  So I'm
thinking about reducing the size of the css files in relatively small steps,
(maybe 10% at a time?) and not worrying about power of 2 boundaries,
assuming that will work.  Or maybe I should just try going by a factor of 2
all at once?

I am currently at 1048577 buckets, and my current packing densities are 0.83
non-spam and 0.92 spam.  Maybe when I'm all done with this process I can
increase the size of my css files again to return to better densities.  But
if I were to reduce my css files by a factor of 8 in this process, I'd guess
off-hand I'd be running with the spam css at something like .99 density.  I
guess I'd find out along the way if this was a problem.

I suppose I could shrink and immediately expand the css in order to weed out
some of the old stuff.  Maybe I could then do this for the spam side only,
leaving the non-spam "experience" fully in place since non-spam is perhaps
under-trained anyway.

A few key settings from mailfilter.cf:

    :decision_length: /6000/
    :clf: /osb unique microgroom/
    :thick_threshold: /1.8/

And I'm using a quite old mailfilter, looks like a July 2006 build.  Some of
the newer developments were breaking my methodologies so I kind of had to
stick with mailfilter.

I'd appreciate any thoughts.

Of course I can just try some things, and see what happens, but for some of
these considerations it may be better not to be making random guesses.

-Kurt



------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
&lt;/pre&gt;</description>
    <dc:creator>Kurt Bigler</dc:creator>
    <dc:date>2012-12-17T23:29:06</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9604">
    <title>crm not converging well on new input</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9604</link>
    <description>&lt;pre&gt;hi crash, and everyone --

i'm running crm from the ubuntu apt repos, version 20100106.

for some reason i tossed my old css files and started from scratch
sometime last year.  after the initial training period, things settled
down, and spam vs. non-spam were sorted very nicely, with a handful
of "unsure" messages on a daily basis.  (i also greylist, which takes
a lot of the load off of crm and the rest of my system.)

5 or 6 weeks ago i subscribed to the git (git-u79uwXL29TY76Z2rM5mHXA&amp;lt; at &amp;gt;public.gmane.org) mailing
list, and as expected, a lot of it was classed as spam at first.  but
i train messages consistently, and i'd expect that to go away quickly.

but it didn't.  i've been consistently getting many messages from that
list classed as UNSURE.  this morning, for instance, i had 18 git
messages classified as UNSURE, out of perhaps 80 that arrived
overnight.

i'll add that i do subscribe to some other vger.kernel.org lists,
and don't have these problems with those -- though they tend to be
lower volume; it's somewhat possible i haven't noticed the issue
with them.

when a good message goes into my "unsure" folder, i run this:

    /usr/bin/crm /home/pgf/ncrm/mailreaver.crm \
-u /home/pgf/ncrm/ --fileprefix=/home/pgf/ncrm/ --good |
    grep ^X-CRM114-

and the result always looks something like this:

    X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
    X-CRM114-CacheID: sfid-20121217_014442_483413_6F434E5A 
    X-CRM114-Notice: Please train this message. 
    X-CRM114-Action: LEARNED AND CACHED GOOD

why might crm be taking so long to train on this new input source?

i'll include my css stats, below.

any ideas?

thanks,
paul


$ cssdiff spam.css nonspam.css
Sparse spectra file spam.css has 1048577 bins total
Sparse spectra file nonspam.css has 1048577 bins total

 File 1 total features            :       611764
 File 2 total features            :       581195

 Similarities between files       :       145867
 Differences between files        :       494454

 File 1 dominates file 2          :       465897
 File 2 dominates file 1          :       435328


$ cssutil -rb spam.css

 Sparse spectra file spam.css statistics: 

 Total available buckets          :      1048577 
 Total buckets in use             :       471530  
 Total in-use zero-count buckets  :            0  
 Total buckets with value &amp;gt;= max  :            0  
 Total hashed datums in file      :       611764
 Documents learned                :         2249  
 Features learned                 :       611767  
 Average datums per bucket        :         1.30
 Maximum length of overflow chain :           44  
 Average length of overflow chain :         2.27 
 Average packing density          :         0.45

$ cssutil -rb nonspam.css

 Sparse spectra file nonspam.css statistics: 

 Total available buckets          :      1048577 
 Total buckets in use             :       470676  
 Total in-use zero-count buckets  :        18763  
 Total buckets with value &amp;gt;= max  :            0  
 Total hashed datums in file      :       581195
 Documents learned                :         2199  
 Features learned                 :       581199  
 Average datums per bucket        :         1.23
 Maximum length of overflow chain :           33  
 Average length of overflow chain :         2.26 
 Average packing density          :         0.45



=---------------------
 paul fox, pgf-YqHaedqsyMbgnLo7xJiJXVwrdKVBSruJ&amp;lt; at &amp;gt;public.gmane.org (arlington, ma, where it's 27.3 degrees)

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
&lt;/pre&gt;</description>
    <dc:creator>Paul Fox</dc:creator>
    <dc:date>2012-12-17T14:35:57</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9603">
    <title>Re: Classifying general email</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9603</link>
    <description>&lt;pre&gt;Hello Bill,

Thanks for you reply.

I'm not a programmer, but I can find my way around Linux and do basic bash scripting.
I looked at both mailfilter.crm and mailreaver.crm and was, with my current knowledge of the crm language, 
a bit overwhelmed at the prospect of modifying anyone of those to my needs.

So I would much prefer any command-line scripts that I could modify to test this out.

Best
Lars Sorensen

On Dec 12, 2012, at 3:33 PM, wsy-MXLR4ItNKm8&amp;lt; at &amp;gt;public.gmane.org wrote:



------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
&lt;/pre&gt;</description>
    <dc:creator>Lars Bjærris</dc:creator>
    <dc:date>2012-12-12T15:38:08</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9602">
    <title>Re: Classifying general email</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9602</link>
    <description>&lt;pre&gt;
Yes, CRM114 can do multi-class sorting; one of the test cases actually
does that (four classes, I believe).

Now, a question: do you want to do this from command-line, or are you a
C programmer?  The reason I ask is that we have two user-compatible but
NOT binary-compatible CRM114's now.

- There's the command-line version, which has it's own language;

- There's the C-callable library (written in ANSI C)- you call it from
  a program you write.  (yes, there's example code, including, if I 
  recall correctly - four-class examples)

Which would you prefer?
  
   - Bill


------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
&lt;/pre&gt;</description>
    <dc:creator>wsy-MXLR4ItNKm8&lt; at &gt;public.gmane.org</dc:creator>
    <dc:date>2012-12-12T14:33:18</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.mail.spam.crm114/9601">
    <title>Classifying general email</title>
    <link>http://permalink.gmane.org/gmane.mail.spam.crm114/9601</link>
    <description>&lt;pre&gt;Hello,

I have an email account that receives a fairly high volume (500-800) of daily emails (500-800), and would like to categorize/classify these emails automatically into about 100 categories/folders.
The last two months I have been trying out POPFile with some limited success. (http://getpopfile.org/)

After I have been inspecting keywords and decision trees in POPFile, it would seem to me like a classifier using phrases for classification might classify this type of emails better than the Naive Bayes implementation in POPFile.

As I'm not a programmer, but trying to learn, I have been searching for preexisting tools that might work for what I want to achieve.
Searching the web I can see that leaves me with the two options: CRM114 or OSBF-lua as classifiers and as I understand CRM114 now uses the OSBF classifier as the default!

Are there any implementations/scripts out there that will allow multiple classes for general email sorting using CRM114 or OSBF-lua as the classification engine?
It seems from what I read that this should be possible, but I'm unable to find any practical implementations to test with.
As I understand both mailfilter.crm and mailreaver.crm use only 3 classifications: 1.spam 2.nonspam 3.unsure, so these would not be useful for me in this regard I presume.

I could use some advice in how to go about this the right way.
Are there any scripts or tools out there that will do general email classification with CRM114 or OSBF-lua that could be implemented with maildrop or procmail on a Linux OS?

Any ideas or pointers would be greatly appreciated.

Best

Lars Sorensen





------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
&lt;/pre&gt;</description>
    <dc:creator>Lars Bjærris</dc:creator>
    <dc:date>2012-12-12T13:17:29</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.mail.spam.crm114">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.mail.spam.crm114</link>
  </textinput>
</rdf:RDF>
