<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.comp.file-systems.ext4">
    <title>gmane.comp.file-systems.ext4</title>
    <link>http://blog.gmane.org/gmane.comp.file-systems.ext4</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10361"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10360"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10359"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10358"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10357"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10355"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10354"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10353"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10352"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10351"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10350"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10349"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10348"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10347"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10346"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10345"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10344"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10343"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10342"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10341"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10361">
    <title>Re: BUG: scheduling while atomic: kswapd0/273/0x00000002</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10361</link>
    <description>OK. This problem was caused by the patch which I made.
I will make a patch which fixes it immediately.
Thank you for your report.

Best Regards,
Toshiyuki Okajima

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Toshiyuki Okajima</dc:creator>
    <dc:date>2008-12-02T00:29:40</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10360">
    <title>[patch 1/1] ext4: allocate -&gt;s_blockgroup_lock separately</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10360</link>
    <description>From: Pekka Enberg &lt;penberg&lt; at &gt;cs.helsinki.fi&gt;

As spotted by kmemtrace, struct ext4_sb_info is 17664 bytes on 64-bit
which makes it a very bad fit for SLAB allocators.  The culprit of the
wasted memory is -&gt;s_blockgroup_lock which can be as big as 16 KB when
NR_CPUS &gt;= 32.

To fix that, allocate -&gt;s_blockgroup_lock, which fits nicely in a order 2
page in the worst case, separately.  This shinks down struct ext4_sb_info
enough to fit a 2 KB slab cache so now we allocate 16 KB + 2 KB instead of
32 KB saving 14 KB of memory.

Acked-by: Andreas Dilger &lt;adilger&lt; at &gt;sun.com&gt;
Signed-off-by: Pekka Enberg &lt;penberg&lt; at &gt;cs.helsinki.fi&gt;
Cc: &lt;linux-ext4&lt; at &gt;vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm&lt; at &gt;linux-foundation.org&gt;
---

 fs/ext4/ext4_sb.h |    4 ++--
 fs/ext4/super.c   |   10 +++++++++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff -puN fs/ext4/ext4_sb.h~ext4-allocate-s_blockgroup_lock-separately fs/ext4/ext4_sb.h
--- a/fs/ext4/ext4_sb.h~ext4-allocate-s_blockgroup_lock-separately
+++ a/fs/ext4/ext4_sb.h
&lt; at &gt;&lt; at &gt; -62,7 +62,7 &lt; at &gt;&lt; at &gt; struct ext4_sb_info {
 struct percpu_counter s_freeinodes_counter;
 struct percpu_counter s_dirs_counter;
 struct percpu_counter s_dirtyblocks_counter;
-struct blockgroup_lock s_blockgroup_lock;
+struct blockgroup_lock *s_blockgroup_lock;
 struct proc_dir_entry *s_proc;
 
 /* root of the per fs reservation window tree */
&lt; at &gt;&lt; at &gt; -150,7 +150,7 &lt; at &gt;&lt; at &gt; struct ext4_sb_info {
 static inline spinlock_t *
 sb_bgl_lock(struct ext4_sb_info *sbi, unsigned int block_group)
 {
-return bgl_lock_ptr(&amp;sbi-&gt;s_blockgroup_lock, block_group);
+return bgl_lock_ptr(sbi-&gt;s_blockgroup_lock, block_group);
 }
 
 #endif/* _EXT4_SB */
diff -puN fs/ext4/super.c~ext4-allocate-s_blockgroup_lock-separately fs/ext4/super.c
--- a/fs/ext4/super.c~ext4-allocate-s_blockgroup_lock-separately
+++ a/fs/ext4/super.c
&lt; at &gt;&lt; at &gt; -497,6 +497,7 &lt; at &gt;&lt; at &gt; static void ext4_put_super(struct super_
 ext4_blkdev_remove(sbi);
 }
 sb-&gt;s_fs_info = NULL;
+kfree(sbi-&gt;s_blockgroup_lock);
 kfree(sbi);
 return;
 }
&lt; at &gt;&lt; at &gt; -1886,6 +1887,13 &lt; at &gt;&lt; at &gt; static int ext4_fill_super(struct super_
 sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
 if (!sbi)
 return -ENOMEM;
+
+sbi-&gt;s_blockgroup_lock =
+kzalloc(sizeof(struct blockgroup_lock), GFP_KERNEL);
+if (!sbi-&gt;s_blockgroup_lock) {
+kfree(sbi);
+return -ENOMEM;
+}
 sb-&gt;s_fs_info = sbi;
 sbi-&gt;s_mount_opt = 0;
 sbi-&gt;s_resuid = EXT4_DEF_RESUID;
&lt; at &gt;&lt; at &gt; -2194,7 +2202,7 &lt; at &gt;&lt; at &gt; static int ext4_fill_super(struct super_
  &amp;sbi-&gt;s_inode_readahead_blks);
 #endif
 
-bgl_lock_init(&amp;sbi-&gt;s_blockgroup_lock);
+bgl_lock_init(sbi-&gt;s_blockgroup_lock);
 
 for (i = 0; i &lt; db_count; i++) {
 block = descriptor_loc(sb, logical_sb_block, i);
_
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>akpm&lt; at &gt;linux-foundation.org</dc:creator>
    <dc:date>2008-12-01T22:27:04</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10359">
    <title>Re: [ext4] Documentation patch</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10359</link>
    <description>
Well, the original text was written by David Kleikamp, so it might be
the case that jfs doesn't handle the stale data block case well.  I
haven't checked.  However, the sense of that paragraph got mangled
badly in commit 93e3270c, and what's there clearly doesn't make any
sense.

I agree the best thing to do is to nuke that whole paragraph.  The one
thing that's worth mentioning (as a replacement paragraph) is that
ext4 (and many other filesystems) has barreiers on by default, and
ext3 has barriers off by default, so that's something that should be
taken into account when doing head-to-head comparisons.

- Ted

P.S.  Speaking of barriers, there was a rumor floating around that
someone was working on patches so at least in the case of RAID 0 and
RAID 1, that the LVM and MD stack would actually pass barier requests
down to the block device layer.  Whatever happened to that?  Is that
bug in the LVM layer going to get fixed any time soon?
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Theodore Tso</dc:creator>
    <dc:date>2008-12-01T20:58:28</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10358">
    <title>Re: EXT4 ENOSPC Bug</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10358</link>
    <description>Hi Andreas, Hi all,

On Monday 01 December 2008 20:42:04 Andreas Dilger wrote:
Yes, no errors.

No. The Problem occured sometimes shortly after boot (Once even before X was 
up) sometimes only after days of full usage.
Boot starts postgres, thats the only thing that propably always ran. But I 
have seen it while postgres was idle - i.e. just some polls and stats.

Its not a single file, but the whole filesystem, having problems.

Sure, no problem.
At least if the patch is not expensive during run time. The system produces 
test-data for testing our development software and I cant take it down for too 
long.
Thats the reason why I can use something like ext4 on such a system at all ;-)

It sometimes takes quite a while to reproduce the problem though. The system 
ran stable for 4 days so far.
Same kernel as the last time the error occured though.

Any specific debug options I should enable?

Andres
</description>
    <dc:creator>Andres Freund</dc:creator>
    <dc:date>2008-12-01T20:16:04</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10357">
    <title>Re: EXT4 ENOSPC Bug</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10357</link>
    <description>
Som information that would be useful:
- did you run "e2fsck -f" to see if there were any errors in the filesystem?
- do you run any specific applications that seem to trigger the problem
  (e.g.  Vuze (formerly azureus) as was reported by another user)
- do the applications writing to this file have any unusual IO pattern
  (e.g. mmap IO, lots of write+truncate+write on the same file, etc)

We discussed the creation of a debugging patch to help diagnose this
problem.  It looks like you have already compiled your own kernel, so
I assume it would be possible for you to run with an additional patch?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Andreas Dilger</dc:creator>
    <dc:date>2008-12-01T19:42:04</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10355">
    <title>Re: [ext4] Documentation patch</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10355</link>
    <description>
Agreed; that whole bit which mentions other filesystem comparisons
should probably be stricken, unless it can be
proven/demonstrated/substantiated that ext3 really does "offer higher
data integrity guarantees than most" at this point.

data=ordered ensures that stale data won't be exposed on a crash; xfs
won't do this (it'd be a security bug) and I'd be surprised if jfs or
reiserfs do either.    And it probably *should* be mentioned that
data=writeback bears this risk.

And until ext3 turns on barriers by default, I don't think it's fair to
talk too much about integrity guarantees.  :)

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Eric Sandeen</dc:creator>
    <dc:date>2008-12-01T17:58:52</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10354">
    <title>Re: BUG: scheduling while atomic: kswapd0/273/0x00000002</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10354</link>
    <description>* Aneesh Kumar K.V &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt; [2008-12-01 21:27:55]:


Yes, I should have seen the stack. The same problem seems to exist for
ext3 as well (with log_wait_commit). kswapd() passes GFP_KERNEL as
gfp_mask in scan_control and that confuses the journalling layer,
since we hold the lock on page in shrink_page_list().

</description>
    <dc:creator>Balbir Singh</dc:creator>
    <dc:date>2008-12-01T17:37:37</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10353">
    <title>Re: [RFC PATCH 1/1] Allow ext4 to run without a journal, take 2.</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10353</link>
    <description>
I have a new version of the patch that has several problems corrected
and one or two bugs fixed.  One of us ran into a memory leak on
Wednesday (after I had already left for the long weekend), though, so
I'll resubmit the patch after we get that fixed.  (The memory leak is
only when using ext4 without a journal, with a journal it works fine.
Looks like the no-journal case is missing a code path with a free
somewhere but I haven't had a chance to look at it yet.)
</description>
    <dc:creator>Frank Mayhar</dc:creator>
    <dc:date>2008-12-01T17:33:38</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10352">
    <title>Re: [ext4] Documentation patch</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10352</link>
    <description>
data=ordered comes closest to what xfs does for quite a long time..

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Christoph Hellwig</dc:creator>
    <dc:date>2008-12-01T16:57:00</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10351">
    <title>[ext4] Documentation patch</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10351</link>
    <description>My first patch, hope it goes through ok! 
Applies to 2.6.28-rc6-00184-gd9d060a.

Compare ext4's journalling with anything but ext3's. These use the same
journalling modes, most other Linux filesystems do only metadata
journalling.

Signed-off-by: Harald Arnesen &lt;harald&lt; at &gt;skogtun.org&gt;

========================================================================
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 174eaff..5bbe79e 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
&lt; at &gt;&lt; at &gt; -61,7 +61,7 &lt; at &gt;&lt; at &gt; Note: More extensive information for getting started with ext4 can be
   - When comparing performance with other filesystems, remember that
     ext3/4 by default offers higher data integrity guarantees than most.
     So when comparing with a metadata-only journalling filesystem, such
-    as ext3, use `mount -o data=writeback'.  And you might as well use
+    as jfs or xfs, use `mount -o data=writeback'.  And you might as well use
     `mount -o nobh' too along with it.  Making the journal larger than
     the mke2fs default often helps performance with metadata-intensive
     workloads.
</description>
    <dc:creator>Harald Arnesen</dc:creator>
    <dc:date>2008-12-01T16:46:56</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10350">
    <title>Re: BUG: scheduling while atomic: kswapd0/273/0x00000002</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10350</link>
    <description>
No. The problem is we cannot call jbd2_log_wait_commit
from blkdev_releasepage because jbd2_log_wait_commit
does a wait_event 

549                 spin_unlock(&amp;journal-&gt;j_state_lock);
550                 wait_event(journal-&gt;j_wait_done_commit,
551                                 !tid_gt(tid, journal-&gt;j_commit_sequence));
552                 spin_lock(&amp;journal-&gt;j_state_lock);

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Aneesh Kumar K.V</dc:creator>
    <dc:date>2008-12-01T15:57:55</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10349">
    <title>Re: BUG: scheduling while atomic: kswapd0/273/0x00000002</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10349</link>
    <description>* Aneesh Kumar K.V &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt; [2008-12-01 20:56:56]:


Did you enable jbd_debug enabled by any chance? 

</description>
    <dc:creator>Balbir Singh</dc:creator>
    <dc:date>2008-12-01T15:53:41</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10348">
    <title>BUG: scheduling while atomic: kswapd0/273/0x00000002</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10348</link>
    <description>Hi,

With the latest patch queue i am getting the below error

-aneesh


BUG: scheduling while atomic: kswapd0/273/0x00000002
1 lock held by kswapd0/273:
 #0:  (&amp;ei-&gt;client_lock){----}, at: [&lt;c04a014f&gt;] blkdev_releasepage+0x27/0x5a
Modules linked in: autofs4 hidp rfkill input_polldev sbs sbshc battery ac parport_pc lp parport i6300esb i2c_i801 i2c_core tg3 libphy e752x_edac edac_core qla2xxx scsi_transport_fc dm_multipath dm_mirror dm_region_hash dm_log dm_mod loop xt_tcpudp ip6t_REJECT ipv6 ipt_REJECT x_tables sunrpc rfcomm bnep l2cap bluetooth bridge stp sg rtc_cmos rtc_core rtc_lib pcspkr button ata_piix libata mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: ip_tables]
Pid: 273, comm: kswapd0 Not tainted 2.6.28-rc6-autotest #1
Call Trace:
 [&lt;c0426277&gt;] __schedule_bug+0x5e/0x65
 [&lt;c067d302&gt;] schedule+0x85/0x7db
 [&lt;c04454d6&gt;] ? mark_held_locks+0x49/0x64
 [&lt;c0445650&gt;] ? trace_hardirqs_on_caller+0xe0/0x101
 [&lt;c044567c&gt;] ? trace_hardirqs_on+0xb/0xd
 [&lt;c067fdc8&gt;] ? _spin_unlock_irqrestore+0x42/0x58
 [&lt;c043b548&gt;] ? prepare_to_wait+0x3f/0x45
 [&lt;c04e65dc&gt;] jbd2_log_wait_commit+0x8e/0xd8
 [&lt;c043b405&gt;] ? autoremove_wake_function+0x0/0x33
 [&lt;c04e2ca1&gt;] jbd2_journal_try_to_free_buffers+0x175/0x185
 [&lt;c04c71d2&gt;] ext4_release_metadata+0x54/0x66
 [&lt;c04c717e&gt;] ? ext4_release_metadata+0x0/0x66
 [&lt;c04a0166&gt;] blkdev_releasepage+0x3e/0x5a
 [&lt;c045e987&gt;] try_to_release_page+0x32/0x3f
 [&lt;c0467c5c&gt;] shrink_page_list+0x436/0x57e
 [&lt;c044349b&gt;] ? put_lock_stats+0xd/0x21
 [&lt;c044353c&gt;] ? lock_release_holdtime+0x8d/0x93
 [&lt;c067fe00&gt;] ? _spin_unlock_irq+0x22/0x42
 [&lt;c0445650&gt;] ? trace_hardirqs_on_caller+0xe0/0x101
 [&lt;c0467f8e&gt;] shrink_list+0x1ea/0x47c
 [&lt;c044349b&gt;] ? put_lock_stats+0xd/0x21
 [&lt;c0468432&gt;] shrink_zone+0x212/0x27f
 [&lt;c0468f5b&gt;] kswapd+0x341/0x4c6
 [&lt;c0466cc5&gt;] ? isolate_pages_global+0x0/0x18c
 [&lt;c043b405&gt;] ? autoremove_wake_function+0x0/0x33
 [&lt;c0468c1a&gt;] ? kswapd+0x0/0x4c6
 [&lt;c043b344&gt;] kthread+0x3b/0x63
 [&lt;c043b309&gt;] ? kthread+0x0/0x63
 [&lt;c04048f3&gt;] kernel_thread_helper+0x7/0x10
BUG: scheduling while atomic: kswapd0/273/0x0000000
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Aneesh Kumar K.V</dc:creator>
    <dc:date>2008-12-01T15:26:56</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10347">
    <title>Re: [PATCH -V2 1/4] ext4: Use new buffer_head flag to check uninitgroup bitmaps initialization</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10347</link>
    <description>Hi,

All the 4 patches get applied after akpm-jbd2-locking-fix
in the patch queue.

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Aneesh Kumar K.V</dc:creator>
    <dc:date>2008-12-01T13:54:13</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10346">
    <title>[PATCH -V5 RESEND] ext4: Use EXT4_GROUP_INFO_NEED_INIT_BIT during resize</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10346</link>
    <description>The new groups added during resize are flagged as
need_init group. Make sure we properly initialize these
groups. When we have block size &lt; page size and we are adding
new groups the page may still be marked uptodate even though
we haven't initialized the group. While forcing the init
of buddy cache we need to make sure other groups part of the
same page of buddy cache is not using the cache.
group_info-&gt;alloc_sem is added to ensure the same.

FILENAME: aneesh-3-use_EXT4_GROUP_INFO_NEED_INIT_BIT-during-resize
V5 changes:
The change in this version is to make sure we we are adding new groups
we don't have a parllel ext4_mb_init_cache happening on other groups
which are part of the same page in buddy cache. We do that by taking
write lock on the group_info alloc_sem semaphore.

Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso&lt; at &gt;mit.edu&gt;
---
 fs/ext4/balloc.c  |   21 +++--
 fs/ext4/ext4.h    |    7 +-
 fs/ext4/mballoc.c |  261 +++++++++++++++++++++++++++++++++++++++++------------
 fs/ext4/mballoc.h |    3 +
 fs/ext4/resize.c  |   49 ++---------
 5 files changed, 230 insertions(+), 111 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index fa69cb3..39df492 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
&lt; at &gt;&lt; at &gt; -381,6 +381,7 &lt; at &gt;&lt; at &gt; void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
 ext4_debug("Adding block(s) %llu-%llu\n", block, block + count - 1);
 
 ext4_get_group_no_and_offset(sb, block, &amp;block_group, &amp;bit);
+grp = ext4_get_group_info(sb, block_group);
 /*
  * Check to see if we are freeing blocks across a group
  * boundary.
&lt; at &gt;&lt; at &gt; -425,7 +426,11 &lt; at &gt;&lt; at &gt; void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
 err = ext4_journal_get_write_access(handle, gd_bh);
 if (err)
 goto error_return;
-
+/*
+ * make sure we don't allow a parallel init on other groups in the
+ * same buddy cache
+ */
+down_write(&amp;grp-&gt;alloc_sem);
 for (i = 0, blocks_freed = 0; i &lt; count; i++) {
 BUFFER_TRACE(bitmap_bh, "clear bit");
 if (!ext4_clear_bit_atomic(sb_bgl_lock(sbi, block_group),
&lt; at &gt;&lt; at &gt; -450,6 +455,13 &lt; at &gt;&lt; at &gt; void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
 sbi-&gt;s_flex_groups[flex_group].free_blocks += blocks_freed;
 spin_unlock(sb_bgl_lock(sbi, flex_group));
 }
+/*
+ * request to reload the buddy with the
+ * new bitmap information
+ */
+set_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &amp;(grp-&gt;bb_state));
+ext4_mb_update_group_info(grp, blocks_freed);
+up_write(&amp;grp-&gt;alloc_sem);
 
 /* We dirtied the bitmap block */
 BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
&lt; at &gt;&lt; at &gt; -461,13 +473,6 &lt; at &gt;&lt; at &gt; void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
 if (!err)
 err = ret;
 sb-&gt;s_dirt = 1;
-/*
- * request to reload the buddy with the
- * new bitmap information
- */
-grp = ext4_get_group_info(sb, block_group);
-set_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &amp;(grp-&gt;bb_state));
-ext4_mb_update_group_info(grp, blocks_freed);
 
 error_return:
 brelse(bitmap_bh);
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0bbc052..a5fa26e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
&lt; at &gt;&lt; at &gt; -1065,12 +1065,13 &lt; at &gt;&lt; at &gt; extern int __init init_ext4_mballoc(void);
 extern void exit_ext4_mballoc(void);
 extern void ext4_mb_free_blocks(handle_t *, struct inode *,
 unsigned long, unsigned long, int, unsigned long *);
-extern int ext4_mb_add_more_groupinfo(struct super_block *sb,
+extern int ext4_mb_add_groupinfo(struct super_block *sb,
 ext4_group_t i, struct ext4_group_desc *desc);
 extern void ext4_mb_update_group_info(struct ext4_group_info *grp,
 ext4_grpblk_t add);
-
-
+extern int ext4_mb_get_buddy_cache_lock(struct super_block *, ext4_group_t);
+extern void ext4_mb_put_buddy_cache_lock(struct super_block *,
+ext4_group_t, int);
 /* inode.c */
 int ext4_forget(handle_t *handle, int is_metadata, struct inode *inode,
 struct buffer_head *bh, ext4_fsblk_t blocknr);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 9cc93af..81b05cf 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
&lt; at &gt;&lt; at &gt; -886,18 +886,20 &lt; at &gt;&lt; at &gt; static int ext4_mb_init_cache(struct page *page, char *incore)
 ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 struct ext4_buddy *e4b)
 {
-struct ext4_sb_info *sbi = EXT4_SB(sb);
-struct inode *inode = sbi-&gt;s_buddy_cache;
 int blocks_per_page;
 int block;
 int pnum;
 int poff;
 struct page *page;
 int ret;
+struct ext4_group_info *grp;
+struct ext4_sb_info *sbi = EXT4_SB(sb);
+struct inode *inode = sbi-&gt;s_buddy_cache;
 
 mb_debug("load group %u\n", group);
 
 blocks_per_page = PAGE_CACHE_SIZE / sb-&gt;s_blocksize;
+grp = ext4_get_group_info(sb, group);
 
 e4b-&gt;bd_blkbits = sb-&gt;s_blocksize_bits;
 e4b-&gt;bd_info = ext4_get_group_info(sb, group);
&lt; at &gt;&lt; at &gt; -905,6 +907,15 &lt; at &gt;&lt; at &gt; ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 e4b-&gt;bd_group = group;
 e4b-&gt;bd_buddy_page = NULL;
 e4b-&gt;bd_bitmap_page = NULL;
+e4b-&gt;alloc_semp = &amp;grp-&gt;alloc_sem;
+
+/* Take the read lock on the group alloc
+ * sem. This would make sure a parallel
+ * ext4_mb_init_group happening on other
+ * groups mapped by the page is blocked
+ * till we are done with allocation
+ */
+down_read(e4b-&gt;alloc_semp);
 
 /*
  * the buddy cache inode stores the block bitmap
&lt; at &gt;&lt; at &gt; -920,6 +931,14 &lt; at &gt;&lt; at &gt; ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 page = find_get_page(inode-&gt;i_mapping, pnum);
 if (page == NULL || !PageUptodate(page)) {
 if (page)
+/*
+ * drop the page reference and try
+ * to get the page with lock. If we
+ * are not uptodate that implies
+ * somebody just created the page but
+ * is yet to initialize the same. So
+ * wait for it to initialize.
+ */
 page_cache_release(page);
 page = find_or_create_page(inode-&gt;i_mapping, pnum, GFP_NOFS);
 if (page) {
&lt; at &gt;&lt; at &gt; -985,6 +1004,9 &lt; at &gt;&lt; at &gt; ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 page_cache_release(e4b-&gt;bd_buddy_page);
 e4b-&gt;bd_buddy = NULL;
 e4b-&gt;bd_bitmap = NULL;
+
+/* Done with the buddy cache */
+up_read(e4b-&gt;alloc_semp);
 return ret;
 }
 
&lt; at &gt;&lt; at &gt; -994,6 +1016,8 &lt; at &gt;&lt; at &gt; static void ext4_mb_release_desc(struct ext4_buddy *e4b)
 page_cache_release(e4b-&gt;bd_bitmap_page);
 if (e4b-&gt;bd_buddy_page)
 page_cache_release(e4b-&gt;bd_buddy_page);
+/* Done with the buddy cache */
+up_read(e4b-&gt;alloc_semp);
 }
 
 
&lt; at &gt;&lt; at &gt; -1694,6 +1718,173 &lt; at &gt;&lt; at &gt; static int ext4_mb_good_group(struct ext4_allocation_context *ac,
 return 0;
 }
 
+/*
+ * lock the group_info alloc_sem of all the groups
+ * belonging to the same buddy cache page. This
+ * make sure other parallel operation on the buddy
+ * cache doesn't happen  whild holding the buddy cache
+ * lock
+ */
+int ext4_mb_get_buddy_cache_lock(struct super_block *sb, ext4_group_t group)
+{
+int i;
+int block, pnum;
+int blocks_per_page;
+int groups_per_page;
+ext4_group_t first_group;
+struct ext4_group_info *grp;
+
+blocks_per_page = PAGE_CACHE_SIZE / sb-&gt;s_blocksize;
+/*
+ * the buddy cache inode stores the block bitmap
+ * and buddy information in consecutive blocks.
+ * So for each group we need two blocks.
+ */
+block = group * 2;
+pnum = block / blocks_per_page;
+first_group = pnum * blocks_per_page / 2;
+
+groups_per_page = blocks_per_page &gt;&gt; 1;
+if (groups_per_page == 0)
+groups_per_page = 1;
+/* read all groups the page covers into the cache */
+for (i = 0; i &lt; groups_per_page; i++) {
+
+if ((first_group + i) &gt;= EXT4_SB(sb)-&gt;s_groups_count)
+break;
+grp = ext4_get_group_info(sb, first_group + i);
+/* take all groups write allocation
+ * semaphore. This make sure there is
+ * no block allocation going on in any
+ * of that groups
+ */
+down_write(&amp;grp-&gt;alloc_sem);
+}
+return i;
+}
+
+void ext4_mb_put_buddy_cache_lock(struct super_block *sb,
+ext4_group_t group, int locked_group)
+{
+int i;
+int block, pnum;
+int blocks_per_page;
+ext4_group_t first_group;
+struct ext4_group_info *grp;
+
+blocks_per_page = PAGE_CACHE_SIZE / sb-&gt;s_blocksize;
+/*
+ * the buddy cache inode stores the block bitmap
+ * and buddy information in consecutive blocks.
+ * So for each group we need two blocks.
+ */
+block = group * 2;
+pnum = block / blocks_per_page;
+first_group = pnum * blocks_per_page / 2;
+/* release locks on all the groups */
+for (i = 0; i &lt; locked_group; i++) {
+
+grp = ext4_get_group_info(sb, first_group + i);
+/* take all groups write allocation
+ * semaphore. This make sure there is
+ * no block allocation going on in any
+ * of that groups
+ */
+up_write(&amp;grp-&gt;alloc_sem);
+}
+
+}
+
+static int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
+{
+
+int ret;
+void *bitmap;
+int blocks_per_page;
+int block, pnum, poff;
+int num_grp_locked = 0;
+struct ext4_group_info *this_grp;
+struct ext4_sb_info *sbi = EXT4_SB(sb);
+struct inode *inode = sbi-&gt;s_buddy_cache;
+struct page *page = NULL, *bitmap_page = NULL;
+
+mb_debug("init group %lu\n", group);
+blocks_per_page = PAGE_CACHE_SIZE / sb-&gt;s_blocksize;
+this_grp = ext4_get_group_info(sb, group);
+/*
+ * This ensures we don't add group
+ * to this buddy cache via resize
+ */
+num_grp_locked =  ext4_mb_get_buddy_cache_lock(sb, group);
+if (!EXT4_MB_GRP_NEED_INIT(this_grp)) {
+/*
+ * somebody initialized the group
+ * return without doing anything
+ */
+ret = 0;
+goto err;
+}
+/*
+ * the buddy cache inode stores the block bitmap
+ * and buddy information in consecutive blocks.
+ * So for each group we need two blocks.
+ */
+block = group * 2;
+pnum = block / blocks_per_page;
+poff = block % blocks_per_page;
+page = find_or_create_page(inode-&gt;i_mapping, pnum, GFP_NOFS);
+if (page) {
+BUG_ON(page-&gt;mapping != inode-&gt;i_mapping);
+ret = ext4_mb_init_cache(page, NULL);
+if (ret) {
+unlock_page(page);
+goto err;
+}
+unlock_page(page);
+}
+if (page == NULL || !PageUptodate(page)) {
+ret = -EIO;
+goto err;
+}
+mark_page_accessed(page);
+bitmap_page = page;
+bitmap = page_address(page) + (poff * sb-&gt;s_blocksize);
+
+/* init buddy cache */
+block++;
+pnum = block / blocks_per_page;
+poff = block % blocks_per_page;
+page = find_or_create_page(inode-&gt;i_mapping, pnum, GFP_NOFS);
+if (page == bitmap_page) {
+/*
+ * If both the bitmap and buddy are in
+ * the same page we don't need to force
+ * init the buddy
+ */
+unlock_page(page);
+} else if (page) {
+BUG_ON(page-&gt;mapping != inode-&gt;i_mapping);
+ret = ext4_mb_init_cache(page, bitmap);
+if (ret) {
+unlock_page(page);
+goto err;
+}
+unlock_page(page);
+}
+if (page == NULL || !PageUptodate(page)) {
+ret = -EIO;
+goto err;
+}
+mark_page_accessed(page);
+err:
+ext4_mb_put_buddy_cache_lock(sb, group, num_grp_locked);
+if (bitmap_page)
+page_cache_release(bitmap_page);
+if (page)
+page_cache_release(page);
+return ret;
+}
+
 static noinline_for_stack int
 ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
 {
&lt; at &gt;&lt; at &gt; -1777,7 +1968,7 &lt; at &gt;&lt; at &gt; ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
 group = 0;
 
 /* quick check to skip empty groups */
-grp = ext4_get_group_info(ac-&gt;ac_sb, group);
+grp = ext4_get_group_info(sb, group);
 if (grp-&gt;bb_free == 0)
 continue;
 
&lt; at &gt;&lt; at &gt; -1790,10 +1981,9 &lt; at &gt;&lt; at &gt; ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
  * we need full data about the group
  * to make a good selection
  */
-err = ext4_mb_load_buddy(sb, group, &amp;e4b);
+err = ext4_mb_init_group(sb, group);
 if (err)
 goto out;
-ext4_mb_release_desc(&amp;e4b);
 }
 
 /*
&lt; at &gt;&lt; at &gt; -2244,7 +2434,7 &lt; at &gt;&lt; at &gt; ext4_mb_store_history(struct ext4_allocation_context *ac)
 
 
 /* Create and initialize ext4_group_info data for the given group. */
-static int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
+int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
   struct ext4_group_desc *desc)
 {
 int i, len;
&lt; at &gt;&lt; at &gt; -2302,6 +2492,7 &lt; at &gt;&lt; at &gt; static int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
 }
 
 INIT_LIST_HEAD(&amp;meta_group_info[i]-&gt;bb_prealloc_list);
+init_rwsem(&amp;meta_group_info[i]-&gt;alloc_sem);
 meta_group_info[i]-&gt;bb_free_root.rb_node = NULL;;
 
 #ifdef DOUBLE_CHECK
&lt; at &gt;&lt; at &gt; -2329,54 +2520,6 &lt; at &gt;&lt; at &gt; static int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
 } /* ext4_mb_add_groupinfo */
 
 /*
- * Add a group to the existing groups.
- * This function is used for online resize
- */
-int ext4_mb_add_more_groupinfo(struct super_block *sb, ext4_group_t group,
-       struct ext4_group_desc *desc)
-{
-struct ext4_sb_info *sbi = EXT4_SB(sb);
-struct inode *inode = sbi-&gt;s_buddy_cache;
-int blocks_per_page;
-int block;
-int pnum;
-struct page *page;
-int err;
-
-/* Add group based on group descriptor*/
-err = ext4_mb_add_groupinfo(sb, group, desc);
-if (err)
-return err;
-
-/*
- * Cache pages containing dynamic mb_alloc datas (buddy and bitmap
- * datas) are set not up to date so that they will be re-initilaized
- * during the next call to ext4_mb_load_buddy
- */
-
-/* Set buddy page as not up to date */
-blocks_per_page = PAGE_CACHE_SIZE / sb-&gt;s_blocksize;
-block = group * 2;
-pnum = block / blocks_per_page;
-page = find_get_page(inode-&gt;i_mapping, pnum);
-if (page != NULL) {
-ClearPageUptodate(page);
-page_cache_release(page);
-}
-
-/* Set bitmap page as not up to date */
-block++;
-pnum = block / blocks_per_page;
-page = find_get_page(inode-&gt;i_mapping, pnum);
-if (page != NULL) {
-ClearPageUptodate(page);
-page_cache_release(page);
-}
-
-return 0;
-}
-
-/*
  * Update an existing group.
  * This function is used for online resize
  */
&lt; at &gt;&lt; at &gt; -4583,11 +4726,6 &lt; at &gt;&lt; at &gt; void ext4_mb_free_blocks(handle_t *handle, struct inode *inode,
 err = ext4_journal_get_write_access(handle, gd_bh);
 if (err)
 goto error_return;
-
-err = ext4_mb_load_buddy(sb, block_group, &amp;e4b);
-if (err)
-goto error_return;
-
 #ifdef AGGRESSIVE_CHECK
 {
 int i;
&lt; at &gt;&lt; at &gt; -4601,6 +4739,8 &lt; at &gt;&lt; at &gt; void ext4_mb_free_blocks(handle_t *handle, struct inode *inode,
 /* We dirtied the bitmap block */
 BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
 err = ext4_journal_dirty_metadata(handle, bitmap_bh);
+if (err)
+goto error_return;
 
 if (ac) {
 ac-&gt;ac_b_ex.fe_group = block_group;
&lt; at &gt;&lt; at &gt; -4609,6 +4749,9 &lt; at &gt;&lt; at &gt; void ext4_mb_free_blocks(handle_t *handle, struct inode *inode,
 ext4_mb_store_history(ac);
 }
 
+err = ext4_mb_load_buddy(sb, block_group, &amp;e4b);
+if (err)
+goto error_return;
 if (metadata) {
 /* blocks being freed are metadata. these blocks shouldn't
  * be used until this transaction is committed */
diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
index b5dff1f..a931b6b 100644
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
&lt; at &gt;&lt; at &gt; -20,6 +20,7 &lt; at &gt;&lt; at &gt;
 #include &lt;linux/version.h&gt;
 #include &lt;linux/blkdev.h&gt;
 #include &lt;linux/marker.h&gt;
+#include &lt;linux/mutex.h&gt;
 #include "ext4_jbd2.h"
 #include "ext4.h"
 #include "group.h"
&lt; at &gt;&lt; at &gt; -130,6 +131,7 &lt; at &gt;&lt; at &gt; struct ext4_group_info {
 #ifdef DOUBLE_CHECK
 void*bb_bitmap;
 #endif
+struct rw_semaphore alloc_sem;
 unsigned shortbb_counters[];
 };
 
&lt; at &gt;&lt; at &gt; -250,6 +252,7 &lt; at &gt;&lt; at &gt; struct ext4_buddy {
 struct super_block *bd_sb;
 __u16 bd_blkbits;
 ext4_group_t bd_group;
+struct rw_semaphore *alloc_semp;
 };
 #define EXT4_MB_BITMAP(e4b)((e4b)-&gt;bd_bitmap)
 #define EXT4_MB_BUDDY(e4b)((e4b)-&gt;bd_buddy)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index db03e05..90a0eaa 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
&lt; at &gt;&lt; at &gt; -747,6 +747,7 &lt; at &gt;&lt; at &gt; int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
 struct inode *inode = NULL;
 handle_t *handle;
 int gdb_off, gdb_num;
+int num_grp_locked = 0;
 int err, err2;
 
 gdb_num = input-&gt;group / EXT4_DESC_PER_BLOCK(sb);
&lt; at &gt;&lt; at &gt; -787,6 +788,7 &lt; at &gt;&lt; at &gt; int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
 }
 }
 
+
 if ((err = verify_group_input(sb, input)))
 goto exit_put;
 
&lt; at &gt;&lt; at &gt; -855,6 +857,7 &lt; at &gt;&lt; at &gt; int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
          * using the new disk blocks.
          */
 
+num_grp_locked = ext4_mb_get_buddy_cache_lock(sb, input-&gt;group);
 /* Update group descriptor block for new group */
 gdp = (struct ext4_group_desc *)((char *)primary-&gt;b_data +
  gdb_off * EXT4_DESC_SIZE(sb));
&lt; at &gt;&lt; at &gt; -871,9 +874,11 &lt; at &gt;&lt; at &gt; int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
  * We can allocate memory for mb_alloc based on the new group
  * descriptor
  */
-err = ext4_mb_add_more_groupinfo(sb, input-&gt;group, gdp);
-if (err)
+err = ext4_mb_add_groupinfo(sb, input-&gt;group, gdp);
+if (err) {
+ext4_mb_put_buddy_cache_lock(sb, input-&gt;group, num_grp_locked);
 goto exit_journal;
+}
 
 /*
  * Make the new blocks and inodes valid next.  We do this before
&lt; at &gt;&lt; at &gt; -915,6 +920,7 &lt; at &gt;&lt; at &gt; int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
 
 /* Update the global fs size fields */
 sbi-&gt;s_groups_count++;
+ext4_mb_put_buddy_cache_lock(sb, input-&gt;group, num_grp_locked);
 
 ext4_journal_dirty_metadata(handle, primary);
 
&lt; at &gt;&lt; at &gt; -1082,45 +1088,6 &lt; at &gt;&lt; at &gt; int ext4_group_extend(struct super_block *sb, struct ext4_super_block *es,
 if ((err = ext4_journal_stop(handle)))
 goto exit_put;
 
-/*
- * Mark mballoc pages as not up to date so that they will be updated
- * next time they are loaded by ext4_mb_load_buddy.
- *
- * XXX Bad, Bad, BAD!!!  We should not be overloading the
- * Uptodate flag, particularly on thte bitmap bh, as way of
- * hinting to ext4_mb_load_buddy() that it needs to be
- * overloaded.  A user could take a LVM snapshot, then do an
- * on-line fsck, and clear the uptodate flag, and this would
- * not be a bug in userspace, but a bug in the kernel.  FIXME!!!
- */
-{
-struct ext4_sb_info *sbi = EXT4_SB(sb);
-struct inode *inode = sbi-&gt;s_buddy_cache;
-int blocks_per_page;
-int block;
-int pnum;
-struct page *page;
-
-/* Set buddy page as not up to date */
-blocks_per_page = PAGE_CACHE_SIZE / sb-&gt;s_blocksize;
-block = group * 2;
-pnum = block / blocks_per_page;
-page = find_get_page(inode-&gt;i_mapping, pnum);
-if (page != NULL) {
-ClearPageUptodate(page);
-page_cache_release(page);
-}
-
-/* Set bitmap page as not up to date */
-block++;
-pnum = block / blocks_per_page;
-page = find_get_page(inode-&gt;i_mapping, pnum);
-if (page != NULL) {
-ClearPageUptodate(page);
-page_cache_release(page);
-}
-}
</description>
    <dc:creator>Aneesh Kumar K.V</dc:creator>
    <dc:date>2008-12-01T13:43:29</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10345">
    <title>[PATCH -V2 4/4] ext4: Init the complete page while building buddy cache</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10345</link>
    <description>We need to init the complete page during buddy cache init
by setting the contents to '1'. This ensures that during
resize when we increment the s_groups_count we don't hit
the erros as below.

EXT4-fs error (device sdb1): ext4_mb_mark_diskspace_used:
Allocating block 1040385 in system zone of 127 group

Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt;
---
 fs/ext4/mballoc.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 8f33691..01c86f3 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
&lt; at &gt;&lt; at &gt; -846,6 +846,8 &lt; at &gt;&lt; at &gt; static int ext4_mb_init_cache(struct page *page, char *incore)
 
 err = 0;
 first_block = page-&gt;index * blocks_per_page;
+/* init the page  */
+memset(page_address(page), 0xff, PAGE_CACHE_SIZE);
 for (i = 0; i &lt; blocks_per_page; i++) {
 int group;
 struct ext4_group_info *grinfo;
</description>
    <dc:creator>Aneesh Kumar K.V</dc:creator>
    <dc:date>2008-12-01T13:43:46</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10344">
    <title>[PATCH -V2 1/4] ext4: Use new buffer_head flag to check uninit group bitmaps initialization</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10344</link>
    <description>For uninit block group, the ondisk bitmap is not initialized. That implies
we cannot depend on the uptodate flag on the bitmap buffer_head to
find bitmap validity. Use a new buffer_head flag which would be set after
we properly initialize the bitmap. This also prevent the initializing
the uninit group bitmap initialization every time we do a
ext4_read_block_bitmap.

Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt;
---
 fs/ext4/balloc.c  |   25 +++++++++++++++++++++++--
 fs/ext4/ext4.h    |   18 ++++++++++++++++++
 fs/ext4/ialloc.c  |   24 ++++++++++++++++++++++--
 fs/ext4/mballoc.c |   24 ++++++++++++++++++++++--
 4 files changed, 85 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 3cae4c6..c00023f 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
&lt; at &gt;&lt; at &gt; -320,20 +320,41 &lt; at &gt;&lt; at &gt; ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
     block_group, bitmap_blk);
 return NULL;
 }
-if (buffer_uptodate(bh) &amp;&amp;
-    !(desc-&gt;bg_flags &amp; cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))
+
+if (bitmap_uptodate(bh))
 return bh;
 
 lock_buffer(bh);
+if (bitmap_uptodate(bh)) {
+unlock_buffer(bh);
+return bh;
+}
 spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
 if (desc-&gt;bg_flags &amp; cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
 ext4_init_block_bitmap(sb, bh, block_group, desc);
+set_bitmap_uptodate(bh);
 set_buffer_uptodate(bh);
 spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
 unlock_buffer(bh);
 return bh;
 }
 spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
+if (buffer_uptodate(bh)) {
+/*
+ * if not uninit if bh is uptodate,
+ * bitmap is also uptodate
+ */
+set_bitmap_uptodate(bh);
+unlock_buffer(bh);
+return bh;
+}
+/*
+ * submit the buffer_head for read. We can
+ * safely mark the bitmap as uptodate now.
+ * We do it here so the bitmap uptodate bit
+ * get set with buffer lock held.
+ */
+set_bitmap_uptodate(bh);
 if (bh_submit_read(bh) &lt; 0) {
 put_bh(bh);
 ext4_error(sb, __func__,
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index fc4148d..9039b8a 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
&lt; at &gt;&lt; at &gt; -19,6 +19,7 &lt; at &gt;&lt; at &gt;
 #include &lt;linux/types.h&gt;
 #include &lt;linux/blkdev.h&gt;
 #include &lt;linux/magic.h&gt;
+#include &lt;linux/jbd2.h&gt;
 #include "ext4_i.h"
 
 /*
&lt; at &gt;&lt; at &gt; -1360,6 +1361,23 &lt; at &gt;&lt; at &gt; extern int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode,
 extern int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 __u64 start, __u64 len);
 
+/*
+ * Add new method to test wether block and inode bitmaps are properly
+ * initialized. With uninit_bg reading the block from disk is not enough
+ * to mark the bitmap uptodate. We need to also zero-out the bitmap
+ */
+#define BH_BITMAP_UPTODATE BH_JBDPrivateStart
+
+static inline int bitmap_uptodate(struct buffer_head *bh)
+{
+return (buffer_uptodate(bh) &amp;&amp;
+test_bit(BH_BITMAP_UPTODATE, &amp;(bh)-&gt;b_state));
+}
+static inline void set_bitmap_uptodate(struct buffer_head *bh)
+{
+set_bit(BH_BITMAP_UPTODATE, &amp;(bh)-&gt;b_state);
+}
+
 #endif/* __KERNEL__ */
 
 #endif/* _EXT4_H */
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index d1ccae5..1627868 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
&lt; at &gt;&lt; at &gt; -115,20 +115,40 &lt; at &gt;&lt; at &gt; ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
     block_group, bitmap_blk);
 return NULL;
 }
-if (buffer_uptodate(bh) &amp;&amp;
-    !(desc-&gt;bg_flags &amp; cpu_to_le16(EXT4_BG_INODE_UNINIT)))
+if (bitmap_uptodate(bh))
 return bh;
 
 lock_buffer(bh);
+if (bitmap_uptodate(bh)) {
+unlock_buffer(bh);
+return bh;
+}
 spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
 if (desc-&gt;bg_flags &amp; cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
 ext4_init_inode_bitmap(sb, bh, block_group, desc);
+set_bitmap_uptodate(bh);
 set_buffer_uptodate(bh);
 spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
 unlock_buffer(bh);
 return bh;
 }
 spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
+if (buffer_uptodate(bh)) {
+/*
+ * if not uninit if bh is uptodate,
+ * bitmap is also uptodate
+ */
+set_bitmap_uptodate(bh);
+unlock_buffer(bh);
+return bh;
+}
+/*
+ * submit the buffer_head for read. We can
+ * safely mark the bitmap as uptodate now.
+ * We do it here so the bitmap uptodate bit
+ * get set with buffer lock held.
+ */
+set_bitmap_uptodate(bh);
 if (bh_submit_read(bh) &lt; 0) {
 put_bh(bh);
 ext4_error(sb, __func__,
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 259eef7..5d4105d 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
&lt; at &gt;&lt; at &gt; -794,22 +794,42 &lt; at &gt;&lt; at &gt; static int ext4_mb_init_cache(struct page *page, char *incore)
 if (bh[i] == NULL)
 goto out;
 
-if (buffer_uptodate(bh[i]) &amp;&amp;
-    !(desc-&gt;bg_flags &amp; cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))
+if (bitmap_uptodate(bh[i]))
 continue;
 
 lock_buffer(bh[i]);
+if (bitmap_uptodate(bh[i])) {
+unlock_buffer(bh[i]);
+continue;
+}
 spin_lock(sb_bgl_lock(EXT4_SB(sb), first_group + i));
 if (desc-&gt;bg_flags &amp; cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
 ext4_init_block_bitmap(sb, bh[i],
 first_group + i, desc);
+set_bitmap_uptodate(bh[i]);
 set_buffer_uptodate(bh[i]);
 spin_unlock(sb_bgl_lock(EXT4_SB(sb), first_group + i));
 unlock_buffer(bh[i]);
 continue;
 }
 spin_unlock(sb_bgl_lock(EXT4_SB(sb), first_group + i));
+if (buffer_uptodate(bh[i])) {
+/*
+ * if not uninit if bh is uptodate,
+ * bitmap is also uptodate
+ */
+set_bitmap_uptodate(bh[i]);
+unlock_buffer(bh[i]);
+continue;
+}
 get_bh(bh[i]);
+/*
+ * submit the buffer_head for read. We can
+ * safely mark the bitmap as uptodate now.
+ * We do it here so the bitmap uptodate bit
+ * get set with buffer lock held.
+ */
+set_bitmap_uptodate(bh[i]);
 bh[i]-&gt;b_end_io = end_buffer_read_sync;
 submit_bh(READ, bh[i]);
 mb_debug("read bitmap for group %u\n", first_group + i);
</description>
    <dc:creator>Aneesh Kumar K.V</dc:creator>
    <dc:date>2008-12-01T13:43:43</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10343">
    <title>[PATCH -V2 3/4] ext4: Don't allow new groups to be added during block allocation</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10343</link>
    <description>After we mark the blocks in the buddy cache as allocated,
we should ensure that we don't reinit the buddy cache untill
the block bitmap is updated. The patch achieve this by holding
the group_info alloc_semaphore till ext4_mb_release_context

Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt;
---
 fs/ext4/mballoc.c |   10 ++++++++--
 fs/ext4/mballoc.h |    5 +++++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 24fc3ab..8f33691 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
&lt; at &gt;&lt; at &gt; -1055,7 +1055,8 &lt; at &gt;&lt; at &gt; __releases(e4b-&gt;alloc_semp)
 if (e4b-&gt;bd_buddy_page)
 page_cache_release(e4b-&gt;bd_buddy_page);
 /* Done with the buddy cache */
-up_read(e4b-&gt;alloc_semp);
+if (e4b-&gt;alloc_semp)
+up_read(e4b-&gt;alloc_semp);
 __release(e4b-&gt;alloc_semp);
 }
 
&lt; at &gt;&lt; at &gt; -1375,7 +1376,9 &lt; at &gt;&lt; at &gt; static void ext4_mb_use_best_found(struct ext4_allocation_context *ac,
 get_page(ac-&gt;ac_bitmap_page);
 ac-&gt;ac_buddy_page = e4b-&gt;bd_buddy_page;
 get_page(ac-&gt;ac_buddy_page);
-
+/* on allocation we use ac to track the held semaphore */
+ac-&gt;alloc_semp =  e4b-&gt;alloc_semp;
+e4b-&gt;alloc_semp = NULL;
 /* store last allocated for subsequent stream allocation */
 if ((ac-&gt;ac_flags &amp; EXT4_MB_HINT_DATA)) {
 spin_lock(&amp;sbi-&gt;s_md_lock);
&lt; at &gt;&lt; at &gt; -4297,6 +4300,7 &lt; at &gt;&lt; at &gt; ext4_mb_initialize_context(struct ext4_allocation_context *ac,
 ac-&gt;ac_pa = NULL;
 ac-&gt;ac_bitmap_page = NULL;
 ac-&gt;ac_buddy_page = NULL;
+ac-&gt;alloc_semp = NULL;
 ac-&gt;ac_lg = NULL;
 
 /* we have to define context: we'll we work with a file or
&lt; at &gt;&lt; at &gt; -4478,6 +4482,8 &lt; at &gt;&lt; at &gt; static int ext4_mb_release_context(struct ext4_allocation_context *ac)
 }
 ext4_mb_put_pa(ac, ac-&gt;ac_sb, pa);
 }
+if (ac-&gt;alloc_semp)
+up_read(ac-&gt;alloc_semp);
 if (ac-&gt;ac_bitmap_page)
 page_cache_release(ac-&gt;ac_bitmap_page);
 if (ac-&gt;ac_buddy_page)
diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
index 95d4c7f..3b598c7 100644
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
&lt; at &gt;&lt; at &gt; -195,6 +195,11 &lt; at &gt;&lt; at &gt; struct ext4_allocation_context {
 __u8 ac_op;/* operation, for history only */
 struct page *ac_bitmap_page;
 struct page *ac_buddy_page;
+/*
+ * pointer to the held semaphore upon successfull
+ * block allocation
+ */
+struct rw_semaphore *alloc_semp;
 struct ext4_prealloc_space *ac_pa;
 struct ext4_locality_group *ac_lg;
 };
</description>
    <dc:creator>Aneesh Kumar K.V</dc:creator>
    <dc:date>2008-12-01T13:43:45</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10342">
    <title>[PATCH -V2 2/4] ext4: mark the blocks/inode bitmap beyond end of group as used.</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10342</link>
    <description>We need to mark the block/inode bitmap beyond end of grou
with '1'

Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt;
---
 fs/ext4/ialloc.c  |    2 +-
 fs/ext4/mballoc.c |    4 ++--
 fs/ext4/resize.c  |    6 ++----
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 1627868..cd5759d 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
&lt; at &gt;&lt; at &gt; -84,7 +84,7 &lt; at &gt;&lt; at &gt; unsigned ext4_init_inode_bitmap(struct super_block *sb, struct buffer_head *bh,
 }
 
 memset(bh-&gt;b_data, 0, (EXT4_INODES_PER_GROUP(sb) + 7) / 8);
-mark_bitmap_end(EXT4_INODES_PER_GROUP(sb), EXT4_BLOCKS_PER_GROUP(sb),
+mark_bitmap_end(EXT4_INODES_PER_GROUP(sb), sb-&gt;s_blocksize * 8,
 bh-&gt;b_data);
 
 return EXT4_INODES_PER_GROUP(sb);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5d4105d..24fc3ab 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
&lt; at &gt;&lt; at &gt; -3044,8 +3044,8 &lt; at &gt;&lt; at &gt; ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
     in_range(block + len - 1, ext4_inode_table(sb, gdp),
      EXT4_SB(sb)-&gt;s_itb_per_group)) {
 ext4_error(sb, __func__,
-   "Allocating block in system zone - block = %llu",
-   block);
+   "Allocating block %llu in system zone of %d group\n",
+   block, ac-&gt;ac_b_ex.fe_group);
 /* File system mounted not to panic on error
  * Fix the bitmap and repeat the block allocation
  * We leak some of the blocks here.
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 6a62360..ef143f4 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
&lt; at &gt;&lt; at &gt; -284,11 +284,9 &lt; at &gt;&lt; at &gt; static int setup_new_group_blocks(struct super_block *sb,
 if ((err = extend_or_restart_transaction(handle, 2, bh)))
 goto exit_bh;
 
-mark_bitmap_end(input-&gt;blocks_count, EXT4_BLOCKS_PER_GROUP(sb),
-bh-&gt;b_data);
+mark_bitmap_end(input-&gt;blocks_count, sb-&gt;s_blocksize * 8, bh-&gt;b_data);
 ext4_journal_dirty_metadata(handle, bh);
 brelse(bh);
</description>
    <dc:creator>Aneesh Kumar K.V</dc:creator>
    <dc:date>2008-12-01T13:43:44</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10341">
    <title>Re: EXT4 ENOSPC Bug</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10341</link>
    <description>Hi Ted, hi all,

Are there any additional informations you could use?
The filesystem is a bit big for you to download unfortunately (and the testdata 
it contains a bit to sensitive).

Andres
</description>
    <dc:creator>Andres Freund</dc:creator>
    <dc:date>2008-12-01T12:34:59</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.file-systems.ext4/10340">
    <title>Re: [PATCH]ext4: fix s_dirty_blocks_counter if block allocationfailed with nodelalloc</title>
    <link>http://permalink.gmane.org/gmane.comp.file-systems.ext4/10340</link>
    <description>
Why did the block allocation fail ? With delayed allocation ENOSPC
should not happen during block allocation. That would mean we did
something wrong in block reservation.


-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Aneesh Kumar K.V</dc:creator>
    <dc:date>2008-12-01T10:36:41</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.comp.file-systems.ext4">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.comp.file-systems.ext4</link>
  </textinput>
</rdf:RDF>
