<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel">
    <title>gmane.comp.jakarta.lucene.hadoop.devel</title>
    <link>http://blog.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50713"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50708"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50706"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50692"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50668"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50666"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50659"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50658"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50655"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50654"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50626"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50623"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50595"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50589"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50585"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50582"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50581"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50576"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50560"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50559"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50713">
    <title>[jira] Created: (HADOOP-4769) DataNode should validate the data integrity of a block before updating it</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50713</link>
    <description>DataNode should validate the data integrity of a block before updating it
-------------------------------------------------------------------------

                 Key: HADOOP-4769
                 URL: https://issues.apache.org/jira/browse/HADOOP-4769
             Project: Hadoop Core
          Issue Type: Bug
            Reporter: Tsz Wo (Nicholas), SZE


DataNode currently only validate the existence of the block and meta files before update.  The data integrity of a block should also be validated.

</description>
    <dc:creator>Tsz Wo (Nicholas), SZE (JIRA</dc:creator>
    <dc:date>2008-12-04T03:44:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50708">
    <title>[jira] Created: (HADOOP-4768) Dynamic Priority Scheduler that allows queue shares to be controlled dynamically by a currency</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50708</link>
    <description>Dynamic Priority Scheduler that allows queue shares to be controlled dynamically by a currency
----------------------------------------------------------------------------------------------

                 Key: HADOOP-4768
                 URL: https://issues.apache.org/jira/browse/HADOOP-4768
             Project: Hadoop Core
          Issue Type: New Feature
          Components: contrib/capacity-sched, contrib/fair-share
    Affects Versions: 0.20.0
            Reporter: Thomas Sandholm
             Fix For: 0.20.0


Contribution based on work presented at the Hadoop User Group meeting in Santa Clara in September and the HadoopCamp in New Orleans in November.

From README:
This package implements dynamic priority scheduling for MapReduce jobs.

Overview
--------
The purpose of this scheduler is to allow users to increase and decrease
their queue priorities continuosly to meet the requirements of their
current workloads. The scheduler is aware of the current demand and makes
it more expensive to boost the priority under peak usage times. Thus
users who move their workload to low usage times are rewarded with
discounts. Priorities can only be boosted within a limited quota.
All users are given a quota or a budget which is deducted periodically
in configurable accounting intervals. How much of the budget is 
deducted is determined by a per-user spending rate, which may
be modified at any time directly by the user. The cluster slots 
share allocated to a particular user is computed as that users
spending rate over the sum of all spending rates in the same accounting
period.

Configuration
-------------
This scheduler has been designed as a meta-scheduler on top of 
existing MapReduce schedulers, which are responsible for enforcing
shares computed by the dynamic scheduler in the cluster. Thie configuration
of this MapReduce scheduler does not have to change when deploying
the dynamic scheduler.

Hadoop Configuration (e.g. hadoop-site.xml):
mapred.jobtracker.taskScheduler      This needs to be set to 
                                     org.apache.hadoop.mapred.DynamicPriorityScheduler
                                     to use the dynamic scheduler.
mapred.queue.names                   All queues managed by the dynamic scheduler must be listed
                                     here (comma separated no spaces)
Scheduler Configuration:
mapred.dynamic-scheduler.scheduler   The Java path of the MapReduce scheduler that should
                                     enforce the allocated shares.
                                     Has been tested with:
                                     org.apache.hadoop.mapred.FairScheduler
                                     and
                                     org.apache.hadoop.mapred.CapacityTaskScheduler

mapred.dynamic-scheduler.budgetfile  The full OS path of the file from which the
                                     budgets are read. The synatx of this file is:
                                     &lt;queueName&gt; &lt;budget&gt;
                                     separated by newlines where budget can be specified
                                     as a Java float

mapred.dynamic-scheduler.spendfile   The full OS path of the file from which the
                                     user/queue spending rate is read. It allows
                                     the queue name to be placed into the path
                                     at runtime, e.g.:
                                     /home/%QUEUE%/.spending
                                     Only the user(s) who submit jobs to the
                                     specified queue should have write access
                                     to this file. The syntax of the file is
                                     just:
                                     &lt;spending rate&gt;
                                     where the spending rate is specified as a
                                     Java float. If no spending rate is specified
                                     the rate defaults to budget/1000.
mapred.dynamic-scheduler.alloc       Allocation interval, when the scheduler rereads the
                                     spending rates and recalculates the cluster shares.
                                     Specified as seconds between allocations.
                                     Default is 20 seconds.
mapred.dynamic-scheduler.budgetset   Boolean which is true if the budget should be deducted 
                                     by the scheduler and the updated budget written to the
                                     budget file. Default is true. Setting this to false is
                                     useful if there is a tool that controls budgets and
                                     spending rates externally to the scheduler.
Runtime Configuration:
mapred.scheduler.shares              The shares that should be allocated to the specified queue.
                                     The configuration property is a comma separated list of
                                     strings where the odd positioned elements are the 
                                     queue names and the even positioned elements are the shares
                                     as Java floats of the preceding queue name. It is updated
                                     for all the queues atomically in each allocation pass. MapReduce
                                     schedulers such as the Fair and CapacityTask schedulers
                                     are expected to read from this property periodically.
                                     Example property value: "queue1,45.0,queue2,55.0"

</description>
    <dc:creator>Thomas Sandholm (JIRA</dc:creator>
    <dc:date>2008-12-04T01:59:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50706">
    <title>[jira] Created: (HADOOP-4767) Provide sample fair scheduler config file in conf/ and set config file property to point to this by default</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50706</link>
    <description>Provide sample fair scheduler config file in conf/ and set config file property to point to this by default
-----------------------------------------------------------------------------------------------------------

                 Key: HADOOP-4767
                 URL: https://issues.apache.org/jira/browse/HADOOP-4767
             Project: Hadoop Core
          Issue Type: New Feature
          Components: contrib/fair-share
            Reporter: Matei Zaharia
            Priority: Minor


The capacity scheduler includes a config file template in hadoop/conf, so it would make sense to create a similar one for the fair scheduler and mention it in the README.

</description>
    <dc:creator>Matei Zaharia (JIRA</dc:creator>
    <dc:date>2008-12-04T01:39:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50692">
    <title>[jira] Created: (HADOOP-4766) Hadoop performance degrades significantly as more and more jobs complete</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50692</link>
    <description>Hadoop performance degrades significantly as more and more jobs complete
------------------------------------------------------------------------

                 Key: HADOOP-4766
                 URL: https://issues.apache.org/jira/browse/HADOOP-4766
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.19.0, 0.18.2
            Reporter: Runping Qi



When I ran the gridmix 2 benchmark load on a fresh cluster of 500 nodes with hadoop trunk, 
the gridmix load, consisting of 202 map/reduce jobs of various sizes, completed in 32 minutes. 
Then I ran the same set of the jobs on the same cluster, yhey completed in 43 minutes.
When I ran them the third times, it took (almost) forever --- the job tracker became non-responsive.

The job  tracker's heap size was set to 2GB. 
The cluster is configured to keep up to 500 jobs in memory.

The job tracker kept one cpu busy all the time. Look like it was due to GC.

I believe the release 0.18/0.19 have the similar behavior.



I believe 0.18 and 0.18 also have the similar behavior.



</description>
    <dc:creator>Runping Qi (JIRA</dc:creator>
    <dc:date>2008-12-03T23:41:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50668">
    <title>[jira] Created: (HADOOP-4765) hadoop fsck should ignore lost+found</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50668</link>
    <description>hadoop fsck should ignore lost+found
------------------------------------

                 Key: HADOOP-4765
                 URL: https://issues.apache.org/jira/browse/HADOOP-4765
             Project: Hadoop Core
          Issue Type: Improvement
          Components: dfs
    Affects Versions: 0.20.0
         Environment: All
            Reporter: Lohit Vijayarenu
            Priority: Minor


hadoop fsck / would check state of entire filesystem. It would be good to have an option to ignore lost+found directory. Better yet, a way to specify ignore list.

</description>
    <dc:creator>Lohit Vijayarenu (JIRA</dc:creator>
    <dc:date>2008-12-03T18:21:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50666">
    <title>[jira] Created: (HADOOP-4764) add replication factor for hdfs directory</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50666</link>
    <description>add replication factor for hdfs directory
-----------------------------------------

                 Key: HADOOP-4764
                 URL: https://issues.apache.org/jira/browse/HADOOP-4764
             Project: Hadoop Core
          Issue Type: New Feature
          Components: dfs
            Reporter: Ruyue Ma


If we can set replication factor for directory. we can modify the DFSClent.create() method, pass 0 for the default block replication. Namenode check create request, if blockreplication is 0, it will give its parent dir replication factor to the file blockreplication factor. This will simplify the administration work. You know we can set /Test or /Tmp dir's replication factor 2 or 1, then all their children files and dirs replication factor is 2 or 1 defaultly.
   

</description>
    <dc:creator>Ruyue Ma (JIRA</dc:creator>
    <dc:date>2008-12-03T17:39:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50659">
    <title>[jira] Updated: (HADOOP-1811) Improve debugabillity</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50659</link>
    <description>
     [ https://issues.apache.org/jira/browse/HADOOP-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lei Xu updated HADOOP-1811:
---------------------------

    Component/s:     (was: mapred)


</description>
    <dc:creator>Lei Xu (JIRA</dc:creator>
    <dc:date>2008-12-03T17:05:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50658">
    <title>[jira] Created: (HADOOP-4763) Support manually fsck in DataNode</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50658</link>
    <description>Support manually fsck in DataNode
---------------------------------

                 Key: HADOOP-4763
                 URL: https://issues.apache.org/jira/browse/HADOOP-4763
             Project: Hadoop Core
          Issue Type: Improvement
          Components: dfs
            Reporter: Lei Xu


Now DataNode only support scan all blocks periodically.  Our site need a tool to check some blocks and files manually. 

My current design is to add a parameter to DFSck to indicate deeply and manually fsck request, then let NameNode collect property block identifies and sent them to associated DataNode. 
I'll let DataBlockScanner runs in two ways: periodically ( original one ) and manually. 

Any suggestions on this are welcome. 

</description>
    <dc:creator>Lei Xu (JIRA</dc:creator>
    <dc:date>2008-12-03T17:03:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50655">
    <title>[jira] Created: (HADOOP-4762) Fix Eclipse classpath following introduction of gridmix 2</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50655</link>
    <description>Fix Eclipse classpath following introduction of gridmix 2
---------------------------------------------------------

                 Key: HADOOP-4762
                 URL: https://issues.apache.org/jira/browse/HADOOP-4762
             Project: Hadoop Core
          Issue Type: Bug
          Components: build
            Reporter: Tom White
            Assignee: Tom White


Need to add src/benchmarks/gridmix2/src/java to classpath.

</description>
    <dc:creator>Tom White (JIRA</dc:creator>
    <dc:date>2008-12-03T16:39:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50654">
    <title>[jira] Created: (HADOOP-4761) Tool to give the block location and to check if the block at a given datanode is corrupt or not</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50654</link>
    <description>Tool to give the block location and to check if the block at a given datanode is corrupt or not
-----------------------------------------------------------------------------------------------

                 Key: HADOOP-4761
                 URL: https://issues.apache.org/jira/browse/HADOOP-4761
             Project: Hadoop Core
          Issue Type: New Feature
          Components: dfs
            Reporter: Ramya R
            Priority: Minor
             Fix For: 0.20.0


It would be useful if we could have a command line tool which would list out the location of all the replicas of a block given a block-id/filename. Also, an utility to check if the block at a given datanode is corrupt or not would be of great help in managing the cluster.

</description>
    <dc:creator>Ramya R (JIRA</dc:creator>
    <dc:date>2008-12-03T15:35:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50626">
    <title>[jira] Created: (HADOOP-4760) Configuration addResource(InputStream) fails with HDFS streams</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50626</link>
    <description>Configuration addResource(InputStream) fails with HDFS streams
--------------------------------------------------------------

                 Key: HADOOP-4760
                 URL: https://issues.apache.org/jira/browse/HADOOP-4760
             Project: Hadoop Core
          Issue Type: Bug
          Components: conf
         Environment: all
            Reporter: Alejandro Abdelnur


When adding an {{InputStream}} via {{addResource(InputStream)}} to a {{Configuration}} instance, if the stream is a HDFS stream the {{loadResource(..)}} method fails with {{IOException}} indicating that the stream has already been closed.


</description>
    <dc:creator>Alejandro Abdelnur (JIRA</dc:creator>
    <dc:date>2008-12-03T09:11:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50623">
    <title>[jira] Created: (HADOOP-4759) HADOOP-4654 to be fixed for branches &gt; 0.19</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50623</link>
    <description>HADOOP-4654 to be fixed for branches &gt; 0.19
-------------------------------------------

                 Key: HADOOP-4759
                 URL: https://issues.apache.org/jira/browse/HADOOP-4759
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.19.0
            Reporter: Amareshwari Sriramadasu
            Assignee: Amareshwari Sriramadasu
             Fix For: 0.19.1


Since HADOOP-4654 is fixed only for branch 18.3. This jira looks at the issue reported for 0.19 and above branches 

</description>
    <dc:creator>Amareshwari Sriramadasu (JIRA</dc:creator>
    <dc:date>2008-12-03T07:04:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50595">
    <title>[jira] Created: (HADOOP-4758) Add a 'tee' for metrics contexts</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50595</link>
    <description>Add a 'tee' for metrics contexts
--------------------------------

                 Key: HADOOP-4758
                 URL: https://issues.apache.org/jira/browse/HADOOP-4758
             Project: Hadoop Core
          Issue Type: Improvement
          Components: metrics
            Reporter: Chris Douglas
            Priority: Minor
             Fix For: 0.20.0


It is currently not possible to configure multiple metrics contexts. It would be helpful if one could support more than one type of collector.

</description>
    <dc:creator>Chris Douglas (JIRA</dc:creator>
    <dc:date>2008-12-03T03:18:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50589">
    <title>[jira] Created: (HADOOP-4757) Replace log4j.properties with log4j.xml</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50589</link>
    <description>Replace log4j.properties with log4j.xml
---------------------------------------

                 Key: HADOOP-4757
                 URL: https://issues.apache.org/jira/browse/HADOOP-4757
             Project: Hadoop Core
          Issue Type: Improvement
            Reporter: Alex Loddengaard
            Assignee: Alex Loddengaard
            Priority: Minor


Certain log4j features are not configurable via the log4j.properties file, not to mention the log4j.properties file is messy.  For this reason, we should transition to a XML file to configure log4j.  The .properties vs. .xml argument has only come up in HADOOP-211, where the XML configuration was vetoed for Tomcat integration issues.

</description>
    <dc:creator>Alex Loddengaard (JIRA</dc:creator>
    <dc:date>2008-12-03T01:06:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50585">
    <title>[jira] Created: (HADOOP-4756) Create a command line tool to access JMX exported properties from a NameNode server</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50585</link>
    <description>Create a command line tool to access JMX exported properties from a NameNode server
-----------------------------------------------------------------------------------

                 Key: HADOOP-4756
                 URL: https://issues.apache.org/jira/browse/HADOOP-4756
             Project: Hadoop Core
          Issue Type: New Feature
          Components: dfs
            Reporter: Boris Shkolnik


Create a command line tool that will easy script access to JMX exported properties of the NameNode.

</description>
    <dc:creator>Boris Shkolnik (JIRA</dc:creator>
    <dc:date>2008-12-03T00:14:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50582">
    <title>[jira] Created: (HADOOP-4755) We should ignore serialization "bad practice" warnings for jsp pages for findbugs</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50582</link>
    <description>We should ignore serialization "bad practice" warnings for jsp pages for findbugs
---------------------------------------------------------------------------------

                 Key: HADOOP-4755
                 URL: https://issues.apache.org/jira/browse/HADOOP-4755
             Project: Hadoop Core
          Issue Type: Bug
            Reporter: Boris Shkolnik
            Priority: Minor


some jsp pages will get SE_BAD_FIELD warnings from FindBugs. 
These warnings should be ignored. We currently get them for dfshealth.jsp and dfsnodelist.jsp files in webapps.
To do this we need to add class and bug pattern to findbugsExcludeFile.xml.

</description>
    <dc:creator>Boris Shkolnik (JIRA</dc:creator>
    <dc:date>2008-12-03T00:00:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50581">
    <title>[jira] Created: (HADOOP-4754) Revert extra debugging</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50581</link>
    <description>Revert extra debugging
----------------------

                 Key: HADOOP-4754
                 URL: https://issues.apache.org/jira/browse/HADOOP-4754
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
            Reporter: Chris Douglas
            Priority: Trivial
             Fix For: 0.20.0
         Attachments: 4754-0.patch

It appears that HADOOP-3647 is actually caused by a bad disk, unobserved given HADOOP-4277. The extra debugging can be removed.

</description>
    <dc:creator>Chris Douglas (JIRA</dc:creator>
    <dc:date>2008-12-02T23:54:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50576">
    <title>[jira] Created: (HADOOP-4753) gridmix2 code can be condensed</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50576</link>
    <description>gridmix2 code can be condensed
------------------------------

                 Key: HADOOP-4753
                 URL: https://issues.apache.org/jira/browse/HADOOP-4753
             Project: Hadoop Core
          Issue Type: Improvement
          Components: benchmarks
    Affects Versions: 0.20.0
            Reporter: Chris Douglas
            Assignee: Chris Douglas
            Priority: Minor
             Fix For: 0.20.0


The gridmix2 benchmark code, particularly GridMixRunner, contains a lot of duplication that can be condensed into a more manageable state.

</description>
    <dc:creator>Chris Douglas (JIRA</dc:creator>
    <dc:date>2008-12-02T23:30:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50560">
    <title>[jira] Created: (HADOOP-4752) Major performance drop on slower machines</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50560</link>
    <description>Major performance drop on slower machines
-----------------------------------------

                 Key: HADOOP-4752
                 URL: https://issues.apache.org/jira/browse/HADOOP-4752
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/fuse-dfs
    Affects Versions: 0.18.2
            Reporter: Marc-Olivier Fleury


When running fuse_dfs on machines that have different CPU characteristics, I noticed that the performance of fuse_dfs is very sensitive to the machine power. 

The command I used was simply a cat over a rather large amount of data stored on HDFS. Here are the comparative times for the different types of machines:

Intel(R) Pentium(R) 4 CPU 2.40GHz :                                2 min 40 s 
Intel(R) Pentium(R) 4 CPU 3.06GHz:                                 1 min 50 s 
2 x Intel(R) Pentium(R) 4 CPU 3.00GHz:                           0 min 40 s 
2 x Intel(R) Xeon(TM) MP CPU 3.33GHz:                           0 min 28 s 
Intel(R) Core(TM)2 Quad CPU    Q6600  &lt; at &gt; 2.40GHz      0 min 15 s

I tried to find other explanations for the drop in performance, such as network configuration, or data locality, but the faster machines are the ones that are "further away" from the others considering the network configuration, and that don't run datanodes.

top shows that the CPU usage of fuse_dfs is between 80-90% on the slower machines, and about 40% on the fastest one.

This leads me to the conclusion that fuse_dfs consumes a lot of CPU resources, much more than expected.

Any help or insight concerning this issue will be greatly appreciated, since these difference actually result in days of computations for a given job.

Thank you

</description>
    <dc:creator>Marc-Olivier Fleury (JIRA</dc:creator>
    <dc:date>2008-12-02T21:40:45</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50559">
    <title>[jira] Created: (HADOOP-4751) FileSystem.listStatus should report the current length of a file if it has been sync'd</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50559</link>
    <description>FileSystem.listStatus should report the current length of a file if it has been sync'd
--------------------------------------------------------------------------------------

                 Key: HADOOP-4751
                 URL: https://issues.apache.org/jira/browse/HADOOP-4751
             Project: Hadoop Core
          Issue Type: Improvement
          Components: dfs
    Affects Versions: 0.19.0
            Reporter: Jim Kellerman
             Fix For: 0.19.1, 0.20.0


If a writer calls sync(), it should be possible to get the current file size via FileSystem.listStatus, rather than listStatus returning 0 for the file size until the file lease times out or the file is closed.

</description>
    <dc:creator>Jim Kellerman (JIRA</dc:creator>
    <dc:date>2008-12-02T21:10:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50542">
    <title>Shutdown hook complexities</title>
    <link>http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.devel/50542</link>
    <description>I've been experiencing some problems in implementing a graceful  
shutdown in a server app I'm writing. The app keeps some HDFS files  
open at all times and needs to close them before completely shutting  
down. However, what keeps happening is that I get some "Filesystem  
closed" message.

I've tracked this down to the fact that Filesystem registers its own  
shutdown hook handler for the purpose of closing all filesystems.  
Apparently, this shutdown hook gets run prior to  (or concurrently  
with) my server shutdown hook, and as a result, when I try to close  
my files I can't.

I've looked around, and I can't seem to find a Java Runtime method  
for controlling the order of shutdown hooks, and I don't think HDFS  
gives me any way to suppress this shutdown hook behavior. Does it  
make sense for this to be configurable? I'd be fine with the  
responsibility of having to close all the filesystems myself as long  
as I got control.

-Bryan

</description>
    <dc:creator>Bryan Duxbury</dc:creator>
    <dc:date>2008-12-02T19:00:46</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.comp.jakarta.lucene.hadoop.devel">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.comp.jakarta.lucene.hadoop.devel</link>
  </textinput>
</rdf:RDF>
