<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user">
    <title>gmane.comp.apache.mahout.user</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16794"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16793"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16792"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16791"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16790"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16789"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16788"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16787"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16786"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16785"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16784"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16783"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16782"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16781"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16780"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16779"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16778"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16777"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16776"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16775"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16794">
    <title>Which database should I use with Mahout</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16794</link>
    <description>&lt;pre&gt;Hi,
I would like to use Mahout to make recommendations on my web site. Since the data is going to be big, hopefully, I plan to use hadoop implementations of the recommender algorithms.

I'm currently storing the data in mysql. Should I continue with it or should I switch to a nosql database such as mongodb or something else?

Thanks
Ahmet&lt;/pre&gt;</description>
    <dc:creator>Ahmet Ylmaz</dc:creator>
    <dc:date>2013-05-19T16:52:07</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16793">
    <title>Re: More Cross-recommender thoughts</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16793</link>
    <description>&lt;pre&gt;Anonymizing the id's is a good start, especially if you have a relatively
small subset of the entire social graph and if the graph is publicly
visible in any case.  If you have a complete crawl of the graph, then many
id's will recoverable by reference back to the public version of the graph.
 Since this would only be doable by referencing public information, that
doesn't sound like much of a problem.

On Fri, May 17, 2013 at 2:45 PM, Pat Ferrel &amp;lt;pat&amp;lt; at &amp;gt;occamsmachete.com&amp;gt; wrote:

&lt;/pre&gt;</description>
    <dc:creator>Ted Dunning</dc:creator>
    <dc:date>2013-05-17T23:08:12</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16792">
    <title>Set Random Forest Max Depth?</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16792</link>
    <description>&lt;pre&gt;Is there a way to set the maximum depth of a Random Forest?  Browsing the
code, it looks like the answer is no since the maxDepth() function is
merely an informative property of the Node class (and derived classes).
 Just wanted to confirm with more experienced foresters before giving up on
trying to reduce the depth level.

Thanks,
       Adam
&lt;/pre&gt;</description>
    <dc:creator>Adam Baron</dc:creator>
    <dc:date>2013-05-17T22:44:26</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16791">
    <title>More Cross-recommender thoughts</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16791</link>
    <description>&lt;pre&gt;I'm doing an experiment creating a recommender from a Pinterest crawl I have going. I have at least three actions that relate to recommendations:

Goal: recommend people you (a pinterest user) might want to follow

Actions mined by crawling: 
follows (user, user)
followed by (user, user)
repinned (user, user) here a user repinned something from another user

training a recommender on one of the three actions and measuring the precision of recommendations leads to different measures of predictive power for each of the three actions. So holding out follows each of the three actions predicts the held out follows with different precision.

Since all three actions (an a few others I can think of) have user IDs of the primary user in common, they exist in the same user space and I can create a cross recommender ensemble  (see previous cross recommender emails) so R_f  + aR_fb + bR_rp = R. 

Now it is a matter of learning the weights a and b by finding the ones that create the highest ensemble precision score. Grad&lt;/pre&gt;</description>
    <dc:creator>Pat Ferrel</dc:creator>
    <dc:date>2013-05-17T21:45:04</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16790">
    <title>Issue with ClusterDumper (0.7) and custom dictionaries</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16790</link>
    <description>&lt;pre&gt;I'm not sure if I'm doing something wrong here, or if ClusterDumper does
not support my (fairly simple) use case

I had a repository of 500K documents, for which I generated the input
vectors and a dictionary using some custom code (not seq2sparse etc).

I hashed the features with max size 5M (because I didn't know how many
features were in the dataset and wanted to minimize collisions).

The kmeans ran fine and generate sensible looking results, but when I tried
to run ClusterDumper I got the following error:

#bash&amp;gt; bin/mahout clusterdump -dt sequencefile -d
 completed/5159bba4e4b0718d03c8cf79_/EmailContentAnalytics_dict_5159bba4e4b0718d03c8cf79/part-*
-i test-kmeans/clusters-19 -b 10 -n 10 -sp 10 -o ~/test-kmeans-out
Running on hadoop, using /usr/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /opt/mahout-distribution-0.7/mahout-examples-0.7-job.jar
13/05/17 08:26:41 INFO common.AbstractJob: Command line arguments:
{--dictionary=[completed/5159bba4e4b0718d03c8cf79_/EmailContentAnalytics_dict_5159bba4e4b0718d0&lt;/pre&gt;</description>
    <dc:creator>Alex Piggott</dc:creator>
    <dc:date>2013-05-17T15:33:48</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16789">
    <title>Re: How to extend FileDataModel</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16789</link>
    <description>&lt;pre&gt;Yes Jia, that is correct.

If you want to save additional data (context) then you have to extend the current data models. The current algorithms won't use the additional data. If you want to be able to integrate I would recommend that you search for "contextaware recommender" e.g. http://recsyswiki.com/wiki/Context-aware_recommendation

/Manuel

Am 16.05.2013 um 20:42 schrieb huangjia:


&lt;/pre&gt;</description>
    <dc:creator>Manuel Blechschmidt</dc:creator>
    <dc:date>2013-05-17T08:20:38</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16788">
    <title>Re: How to extend FileDataModel</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16788</link>
    <description>&lt;pre&gt;Thanks to both of you!


Manuel.
It seems that your code directly use GenericDataModel without including
extra fields such as category and time.
So I still need to extend the GenericDataModel if I want to include tags,
right?

Thanks again,

Jia


On Thu, May 16, 2013 at 12:56 AM, Manuel Blechschmidt &amp;lt;
Manuel.Blechschmidt&amp;lt; at &amp;gt;gmx.de&amp;gt; wrote:



&lt;/pre&gt;</description>
    <dc:creator>huangjia</dc:creator>
    <dc:date>2013-05-16T18:42:29</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16787">
    <title>KMeansDriver to hdfs</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16787</link>
    <description>&lt;pre&gt;Hi everyone,

I am running Hadoop 1.0.4 with Mahout 0.7

I am currently trying to write a program in Java that will use the Mahout
Jar library and connect it to the hdfs I have setup on the machine

I am using JobConf as Configuration file and write the data in hdfs that I
want processed by mahout.

Here is the code I run on Eclipse with all necessary jar in the build path.

KMeansDriver km = new KMeansDriver();

km.setConf(conf);

Path output = new Path("output");

km.run(new String[] { "-i", "testdata/points/hiveDataPoints.txt", "-c",
                "testdata/clusters", "-dm",

"org.apache.mahout.common.distance.EuclideanDistanceMeasure",
                "-o", "output", "-x", "10", "-cl", "-cd", "0.001", "-ow" });

I get the f
following errors:

3/05/16 14:34:04 INFO common.AbstractJob: Command line arguments:
{--clustering=null, --clusters=[testdata/clusters],
--convergenceDelta=[0.001],
--distanceMeasure=[org.apache.mahout.common.distance.EuclideanDistanceMeasure],
--endPhase=[2147483647], --input=[test&lt;/pre&gt;</description>
    <dc:creator>Cyril Bogus</dc:creator>
    <dc:date>2013-05-16T18:40:44</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16786">
    <title>Re: WELCOME to user&lt; at &gt;mahout.apache.org</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16786</link>
    <description>&lt;pre&gt;Hi !

We would like to pass customized distance metrics to Canopy and KMeans. We
would like to define weights, parameters, objects, so that when Canopy or
KMeans computes a distance on two given vectors it can access these extra
parameters. We are using Mahout 0.7.

Basically we are looking for the same features than
WeightedDistanceMeasure, apart from the fact that we do not know how to
specify the "weightsFile" and more importantly, we need to make it work
with customized Configuration objects, i.e., if Mahout executes internally
a simple "new Configuration()" with the hope of getting the appropriates
values, this solution will not work.

Our first attempt to pass instanciated DistanceMeasure objects to
CanopyDriver.run() or KMeansDriver.run() did not work as it seems the
distance-measure objects are re-created with the constructor without
argument on each nodes, so it is not possible to customize them, nor to use
getter/setters, not to use static variables because their values will get
lost when the objec&lt;/pre&gt;</description>
    <dc:creator>H H</dc:creator>
    <dc:date>2013-05-16T16:55:29</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16785">
    <title>Re: How to extend FileDataModel</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16785</link>
    <description>&lt;pre&gt;Hi Jia,
checkout the following example which is used to build a data model from a similar dataset like yours:

https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/FacebookRecommender.java#L139

Dataset:

https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/resources/DemoFriendsLikes.csv

Hope that helps
    Manuel

Am 16.05.2013 um 03:19 schrieb huangjia:


&lt;/pre&gt;</description>
    <dc:creator>Manuel Blechschmidt</dc:creator>
    <dc:date>2013-05-16T07:56:46</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16784">
    <title>Re: How to extend FileDataModel</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16784</link>
    <description>&lt;pre&gt;Why not? it's just the object reference that is local to the function.
The Map itself is not, and on the heap like everything else in the
JVM.

On Thu, May 16, 2013 at 2:19 AM, huangjia &amp;lt;cucumberguagua&amp;lt; at &amp;gt;gmail.com&amp;gt; wrote:

&lt;/pre&gt;</description>
    <dc:creator>Sean Owen</dc:creator>
    <dc:date>2013-05-16T06:52:31</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16783">
    <title>How to extend FileDataModel</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16783</link>
    <description>&lt;pre&gt;Hi,

I want to build a recommendation model based on Mahout. My dataset format
is in the format of

userID, itemID, rating timestamp tag1 tag2 tag3. Thus, I think I need to
extend the FileDataModel.

I looked into *JesterDataModel* as an example. However, I have a problem
with the logic flow. In its *buildModel()* method, an empty map "data" is
first constructed. It is then thrown into processFile. I assume that "data"
is modified in this method, since later it is used to construct the
GenericDataModel. However, "data" is a local variable instead of a class
variable, so how is it modified?

processFile(iterator, data, timestamps, false);
return new GenericDataModel(GenericDataModel.toDataMap(data, true));


&lt;/pre&gt;</description>
    <dc:creator>huangjia</dc:creator>
    <dc:date>2013-05-16T01:19:34</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16782">
    <title>Re: Problems with KMeans Clustering - Radius calculation returns incorrect ZERO value in some cases.</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16782</link>
    <description>&lt;pre&gt;Also, if only a single point is assigned to the cluster, the radius of the
cluster is, by definition, zero.

That isn't a bug.


On Wed, May 15, 2013 at 10:15 AM, Jeff Eastman &amp;lt;
jeastman&amp;lt; at &amp;gt;windwardsolutions.com&amp;gt; wrote:

&lt;/pre&gt;</description>
    <dc:creator>Ted Dunning</dc:creator>
    <dc:date>2013-05-16T01:05:43</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16781">
    <title>Re: Problems with KMeans Clustering - Radius calculation returns incorrect ZERO value in some cases.</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16781</link>
    <description>&lt;pre&gt;What you have observed is correct. During the final iteration, points 
are observed by each cluster and these observations are used to 
calculate the new cluster center and radius. As that center moves less 
than the convergence delta from the previous center, the iterations 
stop. During the subsequent classification phase, each point is assigned 
to its most likely cluster and this assignment may not always be to the 
same cluster due to this final cluster center movement.

Decreasing the convergence delta and thus running more iterations may 
help to resolve this problem; however, there are situations with KMeans 
where the end state can oscillate between two or more very similar 
clusterings. I think the only way to predictably use a 
post-classification radius is to recalculate it at the end.




On 5/14/13 2:19 PM, Erinn Schorsch wrote:

&lt;/pre&gt;</description>
    <dc:creator>Jeff Eastman</dc:creator>
    <dc:date>2013-05-15T17:15:13</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16780">
    <title>Re: Mahout Recommendation algorithm &amp; Hadoop</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16780</link>
    <description>&lt;pre&gt;Mahout Recommendation already uses Hadoop.
https://cwiki.apache.org/confluence/display/MAHOUT/Itembased+Collaborative+Filtering
https://cwiki.apache.org/confluence/display/MAHOUT/Collaborative+Filtering+with+ALS-WR

-Nkechi


On Wed, May 15, 2013 at 5:26 AM, Oliversweb &amp;lt;b.b&amp;lt; at &amp;gt;ntlworld.com&amp;gt; wrote:

&lt;/pre&gt;</description>
    <dc:creator>Nkechi Nnadi</dc:creator>
    <dc:date>2013-05-15T17:08:23</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16779">
    <title>Re: Which text classification algo is best for the usecase?</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16779</link>
    <description>&lt;pre&gt;If you're interested in solutions based on R/Matlab/Octave maybe you'll
find interesting this resources:
https://class.coursera.org/ml/lecture/preview - whole course about machine
learning and they're using mathematical tools to solve their problems,
http://www.stanford.edu/class/cs246/handouts.html - another very good
course about working with massive datasets

Have a nice day! :)

Jacek.


2013/5/15 Stuti Awasthi &amp;lt;stutiawasthi&amp;lt; at &amp;gt;hcl.com&amp;gt;

&lt;/pre&gt;</description>
    <dc:creator>Jacek Wasilewski</dc:creator>
    <dc:date>2013-05-15T10:23:14</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16778">
    <title>Mahout Recommendation algorithm &amp; Hadoop</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16778</link>
    <description>&lt;pre&gt;I have created a solution around the Mahout Recommendation algorithm (users
that buy x also buy y) which currently reads a flat file (10m+ rows) but I
am wondering if; or how, Hadoop could help improve this or is the use of
Hadoop for the other areas of Mahout?




--
View this message in context: http://lucene.472066.n3.nabble.com/Mahout-Recommendation-algorithm-Hadoop-tp4063437.html
Sent from the Mahout User List mailing list archive at Nabble.com.

&lt;/pre&gt;</description>
    <dc:creator>Oliversweb</dc:creator>
    <dc:date>2013-05-15T09:26:37</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16777">
    <title>Mahout Cluster attributes</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16777</link>
    <description>&lt;pre&gt;Hi,

After running a clustering process is there a way to just retrieve the cluster attributes (clusterid, number of records, centroid and radius eg. CL-12124070{ n=4559664 c=[8.470, 68.606] r=[6.303, 49.436]}) ,without doing a full cluster dump. Using  ‘clusterdump’ provides this but then also adds the weights and points, which in my case is unwanted.


Regards,
Willem

________________________________
NOTE: This e-mail message is subject to the MTN Group disclaimer see http://www.mtn.co.za/SUPPORT/LEGAL/Pages/EmailDisclaimer.aspx&lt;/pre&gt;</description>
    <dc:creator>Willem Conradie  [ MTN – Innovation Centre ]</dc:creator>
    <dc:date>2013-05-15T05:57:21</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16776">
    <title>RE: Which text classification algo is best for the usecase?</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16776</link>
    <description>&lt;pre&gt;Sure. I will check this. 

Regards
Anand.C

-----Original Message-----
From: Stuti Awasthi [mailto:stutiawasthi&amp;lt; at &amp;gt;hcl.com] 
Sent: Wednesday, May 15, 2013 11:01 AM
To: user&amp;lt; at &amp;gt;mahout.apache.org
Subject: RE: Which text classification algo is best for the usecase?

Yes , there are scalability issues with R. Have you looked at RHadoop. I haven’t tried it but you can look at it if you have already worked with R and Hadoop.

Thanks
Stuti

-----Original Message-----
From: Chandra Mohan, Ananda Vel Murugan [mailto:Ananda.Murugan&amp;lt; at &amp;gt;honeywell.com] 
Sent: Wednesday, May 15, 2013 10:57 AM
To: user&amp;lt; at &amp;gt;mahout.apache.org
Subject: RE: Which text classification algo is best for the usecase?

Hi, 

I used R for text classification. I tried SVM and Maximum entropy. They gave decent results. But when my dataset became huge, they were not scalable. 

Algorithms like Bagging, Boosting etc need lot of processing power. Most of time, my R code would fail with memory error. 

Regards,
Anand.C

-----Original Me&lt;/pre&gt;</description>
    <dc:creator>Chandra Mohan, Ananda Vel Murugan</dc:creator>
    <dc:date>2013-05-15T05:36:39</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16775">
    <title>RE: java.lang.IllegalArgumentException: Label not found: EMI whi running mahout testnb</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16775</link>
    <description>&lt;pre&gt;Hi Stuti, 

Yes they are in HDFS. 

I think I almost nailed down the problem. I checked my dataset. There is only one data with "EMI" label. So when I split the dataset into test and training set, I think it is not in training set. I suspect this might be a problem. To confirm, I am removing this data from my dataset and I am going to run the mahout commands again. 

Regards,
Anand.C

-----Original Message-----
From: Stuti Awasthi [mailto:stutiawasthi&amp;lt; at &amp;gt;hcl.com] 
Sent: Wednesday, May 15, 2013 10:56 AM
To: user&amp;lt; at &amp;gt;mahout.apache.org
Subject: RE: java.lang.IllegalArgumentException: Label not found: EMI whi running mahout testnb

Hi ChandraMohan,

Yes, I am also looking at text classification using Mahout. I have also tried this link and it worked for me. Just a basic question, I hope you have your files in HDFS and not in local.
This was the first mistake I did in running 20 Newsgroup example.

Thanks
Stuti Awasthi

-----Original Message-----
From: Chandra Mohan, Ananda Vel Murugan [mailto:Ananda.Murugan&amp;lt; at &amp;gt;honeywell.c&lt;/pre&gt;</description>
    <dc:creator>Chandra Mohan, Ananda Vel Murugan</dc:creator>
    <dc:date>2013-05-15T05:32:23</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.apache.mahout.user/16774">
    <title>RE: Which text classification algo is best for the usecase?</title>
    <link>http://permalink.gmane.org/gmane.comp.apache.mahout.user/16774</link>
    <description>&lt;pre&gt;Yes , there are scalability issues with R. Have you looked at RHadoop. I haven’t tried it but you can look at it if you have already worked with R and Hadoop.

Thanks
Stuti

-----Original Message-----
From: Chandra Mohan, Ananda Vel Murugan [mailto:Ananda.Murugan&amp;lt; at &amp;gt;honeywell.com] 
Sent: Wednesday, May 15, 2013 10:57 AM
To: user&amp;lt; at &amp;gt;mahout.apache.org
Subject: RE: Which text classification algo is best for the usecase?

Hi, 

I used R for text classification. I tried SVM and Maximum entropy. They gave decent results. But when my dataset became huge, they were not scalable. 

Algorithms like Bagging, Boosting etc need lot of processing power. Most of time, my R code would fail with memory error. 

Regards,
Anand.C

-----Original Message-----
From: Stuti Awasthi [mailto:stutiawasthi&amp;lt; at &amp;gt;hcl.com]
Sent: Wednesday, May 15, 2013 10:53 AM
To: user&amp;lt; at &amp;gt;mahout.apache.org
Subject: RE: Which text classification algo is best for the usecase?

Thanks Jacek,

I will try to look at these algorithms also.. Tha&lt;/pre&gt;</description>
    <dc:creator>Stuti Awasthi</dc:creator>
    <dc:date>2013-05-15T05:30:42</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.comp.apache.mahout.user">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.comp.apache.mahout.user</link>
  </textinput>
</rdf:RDF>
