<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.comp.file-systems.ext4">
    <title>gmane.comp.file-systems.ext4</title>
    <link>http://blog.gmane.org/gmane.comp.file-systems.ext4</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10453"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10443"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10404"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10394"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10391"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10389"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10368"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10367"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10366"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10365"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10363"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10362"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10360"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10351"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10348"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10346"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10344"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10339"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10338"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.file-systems.ext4/10333"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10453">
    <title>[Bug 12151] New: Unexplained fsck errors on a ext4 filesystem</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10453</link>
    <description>http://bugzilla.kernel.org/show_bug.cgi?id=12151

           Summary: Unexplained fsck errors on a ext4 filesystem
           Product: File System
           Version: 2.5
     KernelVersion: kernel-2.6.27.5-117.fc10.x86_64
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: ext4
        AssignedTo: fs_ext4&lt; at &gt;kernel-bugs.osdl.org
        ReportedBy: kernel-bugzilla&lt; at &gt;cygnusx-1.org


Distribution: Fedora 10
Hardware Environment: 

Processor:
Q6600 2.4ghz

Memory:
4gb

dmidecode:
Manufacturer: ASUSTeK Computer INC.
Product Name: P5B-Deluxe

lspci:
00:00.0 Host bridge: Intel Corporation 82P965/G965 Memory Controller Hub (rev
02)
00:01.0 PCI bridge: Intel Corporation 82P965/G965 PCI Express Root Port (rev
02)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI
Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI
Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI
Controller #2 (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1
(rev 02)
00:1c.2 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 3
(rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5
(rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 6
(rev 02)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI
Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI
Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI
Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI
Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HB/HR (ICH8/R) LPC Interface
Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801HR/HO/HH (ICH8R/DO/DH) 6 port
SATA AHCI Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E
Gigabit Ethernet Controller (rev 12)
03:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI
Controller (rev 02)
03:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI
Controller (rev 02)
04:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet
Controller (Copper) (rev 06)
06:01.0 Multimedia audio controller: Creative Labs SB Audigy (rev 04)
06:01.1 Input device controller: Creative Labs SB Audigy Game Port (rev 04)
06:01.2 FireWire (IEEE 1394): Creative Labs SB Audigy FireWire Port (rev 04)
06:02.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 20)
06:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000
Controller (PHY/Link)
06:04.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit
Ethernet Controller (rev 14)


Software Environment:
e2fsprogs-1.41.3-2.fc10.x86_64
rsync-3.0.4-0.fc10.x86_64

Problem Description:
I rebooted and received the errors below from fsck. The nature of the errors
suggest to me a race condition or off by one bug. In all but one case the
problem is the count being off by one. In the one exception the count is off by
two.

I didn't receive complaints from fsck on previous reboots after the creation of
the filesystem. The shutdown before the startup that resulted in these errors
seemed to have gone normally.

I do remember rsync complaining about at least mpc1211 during the rsync that
copied the data across the network from another system. I don't remember the
complaint.

Most of the files on the filesystem are video files in the 100mb+ range. There
are also other large files like isos, virtualization images, etc. All the files
complained about are really small files.

The underlying layers are Linux software raid5 running on 6 1tb hard drives.
Other arrays using the same drives are raid1 and raid10.

mkfs command used to make the filesystem:
mkfs.ext4 -j -b 4096 -i 524288 -m 0 -E stride=256 -O extents /dev/md3

Other messages that may relate:

EXT4-fs: barriers enabled
EXT4-fs: barriers enabled
EXT4-fs: barriers enabled
JBD: barrier-based sync failed on md1:8 - disabling barriers
JBD: barrier-based sync failed on md2:8 - disabling barriers
JBD: barrier-based sync failed on md3:8 - disabling barriers

df -h output:
/dev/md1               32G  5.3G   25G  18% /
/dev/md0              198M   14M  174M   8% /boot
/dev/md2              288G   60G  229G  21% /home
/dev/md3              4.1T  2.3T  1.8T  56% /home/data



An automatic fsck check on boot started, and saw errors.

Group descriptor 374 has invalid unused inodes count 1
Group descriptor 375 has invalid unused inodes count 1
Group descriptor 588 has invalid unused inodes count 1
Group descriptor 940 has invalid unused inodes count 1
Group descriptor 1230 has invalid unused inodes count 1
Group descriptor 1486 has invalid unused inodes count 1
Group descriptor 1834 has invalid unused inodes count 1
Group descriptor 2444 has invalid unused inodes count 1
Group descriptor 2854 has invalid unused inodes count 1
Group descriptor 3066 has invalid unused inodes count 1
Group descriptor 3210 has invalid unused inodes count 1
Group descriptor 3933 has invalid unused inodes count 1
Group descriptor 4656 has invalid unused inodes count 1
Extended attribute block 12255232 has reference count 3 should be 1

Pass 1
Extended attribute block 12255232 has reference count 3 should be 1

Pass 2
Entry '..' in ??? (314882) has incorrent filetype (was 2, should be 1).
Entry 'txdps.tex' in
/backup/home/backup/11-26-2006/home/backup/12-3-2005/usr/share/texmf/tex/generic/texdraw
(453383) has incorrect filetype (was 1, should be 2).
Entry 'mpc1211' in
/backup/home/backup/11-26/2006/home/backup/12-3-2005/usr/src/kernels/2.6.14-1.1637_FC4-x86_64/arch/sh/boards
(469489) is a link to directory /home/backup/home/backup/11-26-2006/home/backup
/12-3-2005/usr/share/texmf/tex/generic/texdraw/txdps.tex (469505).
Entry 'gencfg.c' in /backup/home/builder/mozilla/nsprpub/pr/include (995609)
has an incorrect filetype (was 1, should be 2).
Entry 'CVS' in
/backup/home/builder/mozilla/toolkit/themes/pinstripe/mozapps/extentions
(1006847) in a link to directory
/backup/home/builder/mozilla/nsprpub/pr/include/gencfg.c (1006849).
Entry 'lost+found' in /video/movies (181) has incorrect filetype (was 2, should
be 1).
Entry 'text_italic.png' in
/backup/home/backup/11-26-2006/usr/share/icons/crystalsvg/16x16/actions
(613940) has incorrect filetype (was 1, should be 2).
Entry 'ko' in /backup/home/backup/11-26-2006/usr/share/local (621673) is a link
to directory
/backup/home/backup/11-26-2006/usr/share/icons/crystalsvg/16x16/actions/text_italic.png
(625665).

Pass 3
Unconnected directory inode 314882 (???)
'..' in
/backup/home/backup/11-26-2006/home/backup/12-3-2005/usr/share/texmf/tex/generic/texdraw/txdps.tex
(469505) is
/backup/home/backup/11-26-2006/home/backup/12-3-2005/usr/src/kernels/2.6.14-1.1637_FC4-x
86_64/arch/sh/boards (469489), should be
/backup/home/backup/11-26-2006/home/backup/12-3-2005/usr/share/texmf/tex/generic/texdraw
(453383).
'..' in
/backup/home/backup/11-26-2006/usr/share/icons/crystalsvg/16x16/actions/text_italic.png
(625665) is /backup/home/backup/11-26-2006/usr/share/locale (621673), should be
/backup/home/backup/11-26-2006/
usr/share/icons/crystalsvg/16x16/actions (613940).
'..' in /backup/home/builder/mozilla/nsprpub/pr/include/gencfg.c (1006849) is
/backup/home/builder/mozilla/toolkit/themes/pinstripe/mozapps/extensions
(1006847), should be /backup/home/builder/mozilla/nsprpu
b/pr/include (995609).

Pass 4
Inode 181 ref count is 12, should be 11.
Inode 82141 ref is 1, should be 2.
Inode 95745 ref count is 1, should be 2.
Inode 96001 ref count is 1, should be 2.
Inode 150529 ref count is 1, should be 2.
Inode 240641 ref count is 1, should be 2.
Inode 314016 ref count is 347, should be 346.
Inode 314881 ref count is 0, should be 2.
Inode 314882 ref count is 3, should be 2.
Inode 380416 ref count is 4, should be 3.
Inode 380417 ref count is 1, should be 2.
Inode 730596 ref count is 14, should be 13.
Inode 730625 ref count is 1, should be 2.
Inode 730723 ref count is 284, should be 283.
Inode 784897 ref count is 1, should be 2.
Inode 821761 ref count is 1, should be 2.
Inode 1191927 ref count is 7, should be 6.
Inode 1191937 ref count is 1, should be 2.

Pass 5
Block bitmap differences: -40304640 -48693248 -59540393 -79796520 -93519872
-105185280 -(128714059--128714061) -152568096
Free blocks count wrong for group #1230 (1845, counted=1846).
Free blocks count wrong for group #1486 (1851, counted=1852).
Free blocks count wrong for group #1817 (910, counted=911).
Free blocks count wrong for group #2454 (1149, counted=1150)
Free blocks count wrong for group #3210 (1845, counted=1846).
Free blocks count wrong for group #3928 (31706, counted=31709).
Free blocks count wrong for group #4656 (1540, counted=1541).
Free blocks count wrong (494216298, counted=494216308).
Free inodes count wrong for group #374 (0, counted=1).
Free inodes count wrong for group #375 (0, counted=1).
Free inodes count wrong for group #588 (0, counted=1).
Free inodes count wrong for group #940 (0, counted=1).
Free inodes count wrong for group #1230 (0, counted=1).
Directories count wrong for group #1230 (194, counted=193).
Free inodes count wrong for group #1486 (0, counted=1).
Directories count wrong for group #1486 (197, counted=196).
Free inodes count wrong for group #1834 (0, counted=1).
Free inodes count wrong for group #2444 (0, counted=1).
Free inodes count wrong for group #2854 (0, counted=1).
Directories count wrong for group #2854 (67, counted=66).
Free inodes count wrong for group #3066 (0, counted=1).
Free inodes count wrong for group #3210 (0, counted=1).
Directories count wrong for group #3210 (201, counted=200).
Free inodes count wrong for group #3933 (0, counted=1).
Free inodes count wrong for group #4656 (0, counted=1).
Directories count wrong for group #4656 (104, counted=103).
Free inodes count wrong (6899829, counted=6899842).


</description>
    <dc:creator>bugme-daemon&lt; at &gt;bugzilla.kernel.org</dc:creator>
    <dc:date>2008-12-03T21:38:32</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10443">
    <title>[PATCH 0/6][REPOST] ext{2,3,4}: tighten inheritance and setting of inode flags</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10443</link>
    <description>This patch series prevents the inheritance and setting of flags that are
inappropriate for specific inode types.

Flags which should be inherited are listed explicitly so as to prevent
future flags being overlooked and inherited by accident.

It introduces a function to mask flags based on the inode type and uses
it in inode creation and the SETFLAGS ioctl to help prevent future
inconsistency.

Patches 1-3 fix the TOPDIR flag inheritance bug reported at
http://bugzilla.kernel.org/show_bug.cgi?id=9866.

Patches 4-6 fix a related problem with non-regular file/dir inodes
inheriting inappropriate flags, discovered while testing. For example,
on an unpatched system, the following sequence will create an
un(re)movable device node:

mkdir a
chattr +a a
touch a/a
mknod a/b c 1 3
chattr -a a a/a

All attempts to delete, move or modify a/b will fail. Fsck will report
there is a problem but will not fix it.

Diffstat from linux-next:
 fs/ext2/ialloc.c        |    8 ++------
 fs/ext2/ioctl.c         |    3 +--
 fs/ext3/ialloc.c        |    8 ++------
 fs/ext3/ioctl.c         |    3 +--
 fs/ext4/ext4.h          |   25 +++++++++++++++++++++++++
 fs/ext4/ialloc.c        |   14 +++++---------
 fs/ext4/ioctl.c         |    3 +--
 include/linux/ext2_fs.h |   24 ++++++++++++++++++++++++
 include/linux/ext3_fs.h |   24 ++++++++++++++++++++++++
 9 files changed, 85 insertions(+), 27 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Duane Griffin</dc:creator>
    <dc:date>2008-12-03T19:54:57</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10404">
    <title>patch ext3-don-t-try-to-resize-if-there-are-no-reserved-gdt-blocks-left.patch added to 2.6.27-stable tree</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10404</link>
    <description>
This is a note to let you know that we have just queued up the patch titled

    Subject: ext3: don't try to resize if there are no reserved gdt blocks left

to the 2.6.27-stable tree.  Its filename is

    ext3-don-t-try-to-resize-if-there-are-no-reserved-gdt-blocks-left.patch

A git repo of this tree can be found at 
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary


From 972fbf779832e5ad15effa7712789aeff9224c37 Mon Sep 17 00:00:00 2001
From: Josef Bacik &lt;jbacik&lt; at &gt;redhat.com&gt;
Date: Sat, 18 Oct 2008 20:27:55 -0700
Subject: ext3: don't try to resize if there are no reserved gdt blocks left

From: Josef Bacik &lt;jbacik&lt; at &gt;redhat.com&gt;

commit 972fbf779832e5ad15effa7712789aeff9224c37 upstream.

When trying to resize a ext3 fs and you run out of reserved gdt blocks,
you get an error that doesn't actually tell you what went wrong, it just
says that the gdb it picked is not correct, which is the case since you
don't have any reserved gdt blocks left.  This patch adds a check to make
sure you have reserved gdt blocks to use, and if not prints out a more
relevant error.

Signed-off-by: Josef Bacik &lt;jbacik&lt; at &gt;redhat.com&gt;
Cc: &lt;linux-ext4&lt; at &gt;vger.kernel.org&gt;
Cc: Andreas Dilger &lt;adilger&lt; at &gt;sun.com&gt;
Signed-off-by: Andrew Morton &lt;akpm&lt; at &gt;linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds&lt; at &gt;linux-foundation.org&gt;
Cc: Willy Tarreau &lt;w&lt; at &gt;1wt.eu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh&lt; at &gt;suse.de&gt;

---
 fs/ext3/resize.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/fs/ext3/resize.c
+++ b/fs/ext3/resize.c
&lt; at &gt;&lt; at &gt; -790,7 +790,8 &lt; at &gt;&lt; at &gt; int ext3_group_add(struct super_block *s
 
 if (reserved_gdb || gdb_off == 0) {
 if (!EXT3_HAS_COMPAT_FEATURE(sb,
-     EXT3_FEATURE_COMPAT_RESIZE_INODE)){
+     EXT3_FEATURE_COMPAT_RESIZE_INODE)
+    || !le16_to_cpu(es-&gt;s_reserved_gdt_blocks)) {
 ext3_warning(sb, __func__,
      "No reserved GDT blocks, can't resize");
 return -EPERM;


Patches currently in stable-queue which might be from jbacik&lt; at &gt;redhat.com are

queue-2.6.27/ext3-don-t-try-to-resize-if-there-are-no-reserved-gdt-blocks-left.patch
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>gregkh&lt; at &gt;suse.de</dc:creator>
    <dc:date>2008-12-03T18:55:41</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10394">
    <title>[PATCH -V4] ext4: Don't allow new groups to be added during block allocation</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10394</link>
    <description>After we mark the blocks in the buddy cache as allocated,
we should ensure that we don't reinit the buddy cache untill
the block bitmap is updated. The patch achieve this by holding
the group_info alloc_semaphore till ext4_mb_release_context

v4 changes:
Use ext4_mb_release_context to drop the reference. That way
we make sure the group pa is also updated.

v3 changes:
drop page refence and alloc_sem if we are retrying allocation

Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt;
---
 fs/ext4/mballoc.c |   16 +++++++++++++---
 fs/ext4/mballoc.h |    5 +++++
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 24fc3ab..d1ba97d 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
&lt; at &gt;&lt; at &gt; -1055,7 +1055,8 &lt; at &gt;&lt; at &gt; __releases(e4b-&gt;alloc_semp)
 if (e4b-&gt;bd_buddy_page)
 page_cache_release(e4b-&gt;bd_buddy_page);
 /* Done with the buddy cache */
-up_read(e4b-&gt;alloc_semp);
+if (e4b-&gt;alloc_semp)
+up_read(e4b-&gt;alloc_semp);
 __release(e4b-&gt;alloc_semp);
 }
 
&lt; at &gt;&lt; at &gt; -1375,7 +1376,9 &lt; at &gt;&lt; at &gt; static void ext4_mb_use_best_found(struct ext4_allocation_context *ac,
 get_page(ac-&gt;ac_bitmap_page);
 ac-&gt;ac_buddy_page = e4b-&gt;bd_buddy_page;
 get_page(ac-&gt;ac_buddy_page);
-
+/* on allocation we use ac to track the held semaphore */
+ac-&gt;alloc_semp =  e4b-&gt;alloc_semp;
+e4b-&gt;alloc_semp = NULL;
 /* store last allocated for subsequent stream allocation */
 if ((ac-&gt;ac_flags &amp; EXT4_MB_HINT_DATA)) {
 spin_lock(&amp;sbi-&gt;s_md_lock);
&lt; at &gt;&lt; at &gt; -4297,6 +4300,7 &lt; at &gt;&lt; at &gt; ext4_mb_initialize_context(struct ext4_allocation_context *ac,
 ac-&gt;ac_pa = NULL;
 ac-&gt;ac_bitmap_page = NULL;
 ac-&gt;ac_buddy_page = NULL;
+ac-&gt;alloc_semp = NULL;
 ac-&gt;ac_lg = NULL;
 
 /* we have to define context: we'll we work with a file or
&lt; at &gt;&lt; at &gt; -4478,6 +4482,8 &lt; at &gt;&lt; at &gt; static int ext4_mb_release_context(struct ext4_allocation_context *ac)
 }
 ext4_mb_put_pa(ac, ac-&gt;ac_sb, pa);
 }
+if (ac-&gt;alloc_semp)
+up_read(ac-&gt;alloc_semp);
 if (ac-&gt;ac_bitmap_page)
 page_cache_release(ac-&gt;ac_bitmap_page);
 if (ac-&gt;ac_buddy_page)
&lt; at &gt;&lt; at &gt; -4578,10 +4584,14 &lt; at &gt;&lt; at &gt; ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle,
 ac-&gt;ac_o_ex.fe_len &lt; ac-&gt;ac_b_ex.fe_len)
 ext4_mb_new_preallocation(ac);
 }
-
 if (likely(ac-&gt;ac_status == AC_STATUS_FOUND)) {
 *errp = ext4_mb_mark_diskspace_used(ac, handle, reserv_blks);
 if (*errp ==  -EAGAIN) {
+/*
+ * drop the reference that we took
+ * in ext4_mb_use_best_found
+ */
+ext4_mb_release_context(ac);
 ac-&gt;ac_b_ex.fe_group = 0;
 ac-&gt;ac_b_ex.fe_start = 0;
 ac-&gt;ac_b_ex.fe_len = 0;
diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
index 95d4c7f..3b598c7 100644
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
&lt; at &gt;&lt; at &gt; -195,6 +195,11 &lt; at &gt;&lt; at &gt; struct ext4_allocation_context {
 __u8 ac_op;/* operation, for history only */
 struct page *ac_bitmap_page;
 struct page *ac_buddy_page;
+/*
+ * pointer to the held semaphore upon successfull
+ * block allocation
+ */
+struct rw_semaphore *alloc_semp;
 struct ext4_prealloc_space *ac_pa;
 struct ext4_locality_group *ac_lg;
 };
</description>
    <dc:creator>Aneesh Kumar K.V</dc:creator>
    <dc:date>2008-12-03T15:19:49</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10391">
    <title>Problems with checking corrupted large ext3 file system</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10391</link>
    <description>Hi,

I've some trouble checking a corrupted 9T large ext3 fs which resides
on a logical volume. The underlying physical volumes are three hardware
raid systems, one of which started to crash frequently. I was able
to pvmove away the data from the buggy system, so everything is fine
now on the hardware side.

However, the crashes left me with a seriously corrupted file system
from which I'm trying to recover as much as possible. First step was
to unmount the file system after users reported I/O errors when trying
to open files. The system log contained many messages like

[102445.420125] EXT3-fs error (device dm-2): ext3_free_blocks_sb: bit already cleared for block 544108393                                              

and some of the form

[160301.277477] EXT3-fs error (device dm-2): htree_dirblock_to_tree: bad entry in directory #153542738: rec_len % 4 != 0 - offset=0, inode=1381653864, +rec_len=26709, name_len=79

So I compiled the master branch of the e2fsprogs git repo as of
Dec 1 (tip: 8680b4) and executed

./e2fsck -y -C0 /dev/mapper/abel-abt6_projects

This ran for a while and then started to output a couple of these:

Inode table for group 68217 is not in group.  (block 825373744)
WARNING: SEVERE DATA LOSS POSSIBLE.

along with many lines of the form

Illegal block #3036172 (4233778405) in inode 115335438.                                                                                                
        CLEARED.

But then it continued just fine without printing further
messsages. After about 4 hours it completed but decided to re-run from
the beginning and this is where the real trouble seems to start. The
next day I found thousands of lines like this on the console:

        /backup/data/solexa_analysis/ATH/MA/MA-30-29/run_30/4/length_42/reads_0.fl (inode #145326082, mod time Tue Jan 22 05:09:36 2008)

followed by

Clone multiply-claimed blocks? yes

At this point the fsck seems to hang. No further messages, no progress
bar for at least 17 hours. The lights on the raid system aren't
flashing but there seems to be a bit of I/O going on as stracing the
e2fsck process yields

lseek(3, 6206310776832, SEEK_SET)       = 6206310776832
read(3, "002107740635\tD\t2\t169\t35\t0\thhhhhh"..., 4096) = 4096
lseek(3, 1263113973760, SEEK_SET)       = 1263113973760
write(3, "B9K&lt; at &gt;=?4C=L-F77F4:CGGK\n3\t14221118"..., 4096) = 4096
lseek(3, 5861641846784, SEEK_SET)       = 5861641846784
read(3, "hhhhhh\tIIIIIIIIIIIIIIIIIIIIIIIII"..., 4096) = 4096
lseek(3, 1263113977856, SEEK_SET)       = 1263113977856
write(3, "\t1.00\t0.46\t19\t4\t2\t0\t1\tA\t33\t31\t0\t"..., 4096) = 4096

There's about only one read per second, so the fsck might take rather
long if it continues to run at this speed ;)

It's running for 34 hours now and I don't know what to do, so here are
a couple of questions for you ext3 gurus:

Is there any hope this will ever complete?

Should I abort the fsck and restart?

Do things get even worse if I abort it and mount the file
system r/o so that I can see whether important files are
still there?

Are there any magic e2fsck command line options I should try?

The box is a 2xQuad Core Intel machine with 32G Ram and is running
a vanilla 2.6.25.20 kernel. Any help is greatly appreciated.

Thanks
Andre
</description>
    <dc:creator>Andre Noll</dc:creator>
    <dc:date>2008-12-03T10:11:00</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10389">
    <title>[PATCH -V3] ext4: Don't allow new groups to be added during block allocation</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10389</link>
    <description>After we mark the blocks in the buddy cache as allocated,
we should ensure that we don't reinit the buddy cache untill
the block bitmap is updated. The patch achieve this by holding
the group_info alloc_semaphore till ext4_mb_release_context

v3 changes:
drop page refence and alloc_sem if we are retrying allocation

Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt;
---
 fs/ext4/mballoc.c |   21 ++++++++++++++++++---
 fs/ext4/mballoc.h |    5 +++++
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 24fc3ab..3270204 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
&lt; at &gt;&lt; at &gt; -1055,7 +1055,8 &lt; at &gt;&lt; at &gt; __releases(e4b-&gt;alloc_semp)
 if (e4b-&gt;bd_buddy_page)
 page_cache_release(e4b-&gt;bd_buddy_page);
 /* Done with the buddy cache */
-up_read(e4b-&gt;alloc_semp);
+if (e4b-&gt;alloc_semp)
+up_read(e4b-&gt;alloc_semp);
 __release(e4b-&gt;alloc_semp);
 }
 
&lt; at &gt;&lt; at &gt; -1375,7 +1376,9 &lt; at &gt;&lt; at &gt; static void ext4_mb_use_best_found(struct ext4_allocation_context *ac,
 get_page(ac-&gt;ac_bitmap_page);
 ac-&gt;ac_buddy_page = e4b-&gt;bd_buddy_page;
 get_page(ac-&gt;ac_buddy_page);
-
+/* on allocation we use ac to track the held semaphore */
+ac-&gt;alloc_semp =  e4b-&gt;alloc_semp;
+e4b-&gt;alloc_semp = NULL;
 /* store last allocated for subsequent stream allocation */
 if ((ac-&gt;ac_flags &amp; EXT4_MB_HINT_DATA)) {
 spin_lock(&amp;sbi-&gt;s_md_lock);
&lt; at &gt;&lt; at &gt; -4297,6 +4300,7 &lt; at &gt;&lt; at &gt; ext4_mb_initialize_context(struct ext4_allocation_context *ac,
 ac-&gt;ac_pa = NULL;
 ac-&gt;ac_bitmap_page = NULL;
 ac-&gt;ac_buddy_page = NULL;
+ac-&gt;alloc_semp = NULL;
 ac-&gt;ac_lg = NULL;
 
 /* we have to define context: we'll we work with a file or
&lt; at &gt;&lt; at &gt; -4478,6 +4482,8 &lt; at &gt;&lt; at &gt; static int ext4_mb_release_context(struct ext4_allocation_context *ac)
 }
 ext4_mb_put_pa(ac, ac-&gt;ac_sb, pa);
 }
+if (ac-&gt;alloc_semp)
+up_read(ac-&gt;alloc_semp);
 if (ac-&gt;ac_bitmap_page)
 page_cache_release(ac-&gt;ac_bitmap_page);
 if (ac-&gt;ac_buddy_page)
&lt; at &gt;&lt; at &gt; -4578,7 +4584,6 &lt; at &gt;&lt; at &gt; ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle,
 ac-&gt;ac_o_ex.fe_len &lt; ac-&gt;ac_b_ex.fe_len)
 ext4_mb_new_preallocation(ac);
 }
-
 if (likely(ac-&gt;ac_status == AC_STATUS_FOUND)) {
 *errp = ext4_mb_mark_diskspace_used(ac, handle, reserv_blks);
 if (*errp ==  -EAGAIN) {
&lt; at &gt;&lt; at &gt; -4586,6 +4591,16 &lt; at &gt;&lt; at &gt; ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle,
 ac-&gt;ac_b_ex.fe_start = 0;
 ac-&gt;ac_b_ex.fe_len = 0;
 ac-&gt;ac_status = AC_STATUS_CONTINUE;
+/*
+ * drop the reference that we took
+ * in ext4_mb_use_best_found
+ */
+if (ac-&gt;alloc_semp)
+up_read(ac-&gt;alloc_semp);
+if (ac-&gt;ac_bitmap_page)
+page_cache_release(ac-&gt;ac_bitmap_page);
+if (ac-&gt;ac_buddy_page)
+page_cache_release(ac-&gt;ac_buddy_page);
 goto repeat;
 } else if (*errp) {
 ac-&gt;ac_b_ex.fe_len = 0;
diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
index 95d4c7f..3b598c7 100644
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
&lt; at &gt;&lt; at &gt; -195,6 +195,11 &lt; at &gt;&lt; at &gt; struct ext4_allocation_context {
 __u8 ac_op;/* operation, for history only */
 struct page *ac_bitmap_page;
 struct page *ac_buddy_page;
+/*
+ * pointer to the held semaphore upon successfull
+ * block allocation
+ */
+struct rw_semaphore *alloc_semp;
 struct ext4_prealloc_space *ac_pa;
 struct ext4_locality_group *ac_lg;
 };
</description>
    <dc:creator>Aneesh Kumar K.V</dc:creator>
    <dc:date>2008-12-03T09:06:03</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10368">
    <title>[BUG][PATCH 4/4] jbd2: fix a cause of __schedule_bug via blkdev_releasepage</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10368</link>
    <description>jbd2: fix a cause of __schedule_bug via blkdev_releasepage
From: Toshiyuki Okajima &lt;toshi.okajima&lt; at &gt;jp.fujitsu.com&gt;

A cause of this problem is calling jbd2_log_wait_commit() on 
jbd2_journal_try_to_free_buffers() with a read-lock via blkdev_releasepage(). 
This logic is for uncommitted data buffers. And a read/write-lock is required 
for a client usage of blkdev_releasepage.

By the way, we want to release only metadata buffers on ext4_release_metadata().
Because a page which binds to blkdev is used as metadata for ext4.

Therefore we don't need to wait for a commit on 
jbd2_journal_try_to_free_buffers() via ext4_release_matadata().
As a result, we add a jbd2_journal_try_to_free_metadata_buffers() almost same 
as jbd2_journal_try_to_free_buffers() except not calling jbd2_log_wait_commit().

This issue was reported by Aneesh Kumar K.V.
http://marc.info/?l=linux-ext4&amp;m=122814568309893&amp;w=2

Reported-by: "Aneesh Kumar K.V" &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt;
Signed-off-by: Toshiyuki Okajima &lt;toshi.okajima&lt; at &gt;jp.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir&lt; at &gt;linux.vnet.ibm.com&gt; 
Cc: "Theodore Ts'o" &lt;tytso&lt; at &gt;mit.edu&gt;
--
 fs/jbd2/journal.c     |    1 +
 fs/jbd2/transaction.c |   34 ++++++++++++++++++++++++++++++----
 include/linux/jbd2.h  |    1 +
 3 files changed, 32 insertions(+), 4 deletions(-)

diff -Nurp linux-2.6.28-rc6/fs/jbd2/journal.c linux-2.6.28-rc6.2/fs/jbd2/journal.c
--- linux-2.6.28-rc6/fs/jbd2/journal.c2008-11-21 08:19:22.000000000 +0900
+++ linux-2.6.28-rc6.2/fs/jbd2/journal.c2008-12-02 09:54:42.000000000 +0900
&lt; at &gt;&lt; at &gt; -79,6 +79,7 &lt; at &gt;&lt; at &gt; EXPORT_SYMBOL(jbd2_journal_wipe);
 EXPORT_SYMBOL(jbd2_journal_blocks_per_page);
 EXPORT_SYMBOL(jbd2_journal_invalidatepage);
 EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers);
+EXPORT_SYMBOL(jbd2_journal_try_to_free_metadata_buffers);
 EXPORT_SYMBOL(jbd2_journal_force_commit);
 EXPORT_SYMBOL(jbd2_journal_file_inode);
 EXPORT_SYMBOL(jbd2_journal_init_jbd_inode);
diff -Nurp linux-2.6.28-rc6/fs/jbd2/transaction.c linux-2.6.28-rc6.2/fs/jbd2/transaction.c
--- linux-2.6.28-rc6/fs/jbd2/transaction.c2008-11-21 08:19:22.000000000 +0900
+++ linux-2.6.28-rc6.2/fs/jbd2/transaction.c2008-12-02 10:21:53.000000000 +0900
&lt; at &gt;&lt; at &gt; -1497,12 +1497,14 &lt; at &gt;&lt; at &gt; static void jbd2_journal_wait_for_transa
 }
 
 /**
- * int jbd2_journal_try_to_free_buffers() - try to free page buffers.
+ * int __journal_try_to_free_buffers() - try to free page buffers.
  * &lt; at &gt;journal: journal for operation
  * &lt; at &gt;page: to try and free
  * &lt; at &gt;gfp_mask: we use the mask to detect how hard should we try to release
  * buffers. If __GFP_WAIT and __GFP_FS is set, we wait for commit code to
  * release the buffers.
+ * &lt; at &gt;is_metadata: If true, we won't wait for commit though __GFP_WAIT
+ *               and __GFP_FS is set.
  *
  *
  * For all the buffers on this page,
&lt; at &gt;&lt; at &gt; -1528,14 +1530,14 &lt; at &gt;&lt; at &gt; static void jbd2_journal_wait_for_transa
  *
  * Who else is affected by this?  hmm...  Really the only contender
  * is do_get_write_access() - it could be looking at the buffer while
- * journal_try_to_free_buffer() is changing its state.  But that
+ * __journal_try_to_free_buffer() is changing its state.  But that
  * cannot happen because we never reallocate freed data as metadata
  * while the data is part of a transaction.  Yes?
  *
  * Return 0 on failure, 1 on success
  */
-int jbd2_journal_try_to_free_buffers(journal_t *journal,
-struct page *page, gfp_t gfp_mask)
+static int __journal_try_to_free_buffers(journal_t *journal,
+struct page *page, gfp_t gfp_mask, bool is_metadata)
 {
 struct buffer_head *head;
 struct buffer_head *bh;
&lt; at &gt;&lt; at &gt; -1567,6 +1569,8 &lt; at &gt;&lt; at &gt; int jbd2_journal_try_to_free_buffers(jou
 } while ((bh = bh-&gt;b_this_page) != head);
 
 ret = try_to_free_buffers(page);
+if (is_metadata)
+return ret;
 
 /*
  * There are a number of places where jbd2_journal_try_to_free_buffers()
&lt; at &gt;&lt; at &gt; -1592,6 +1596,28 &lt; at &gt;&lt; at &gt; busy:
 }
 
 /*
+ * jbd2_journal_try_to_free_buffers:
+ * This is a wrapper function for __journal_try_to_free_buffers to try to 
+ * release data.
+ */
+int jbd2_journal_try_to_free_buffers(journal_t *journal,
+struct page *page, gfp_t gfp_mask)
+{
+return __journal_try_to_free_buffers(journal, page, gfp_mask, false);
+}
+
+/*
+ * jbd2_journal_try_to_free_metadata_buffers:
+ * This is a wrapper function for __journal_try_to_free_buffers to try to 
+ * release metadata.
+ */
+int jbd2_journal_try_to_free_metadata_buffers(journal_t *journal,
+struct page *page, gfp_t gfp_mask)
+{
+return __journal_try_to_free_buffers(journal, page, gfp_mask, true);
+}
+
+/*
  * This buffer is no longer needed.  If it is on an older transaction's
  * checkpoint list we need to record it on this transaction's forget list
  * to pin this buffer (and hence its checkpointing transaction) down until
diff -Nurp linux-2.6.28-rc6/include/linux/jbd2.h linux-2.6.28-rc6.2/include/linux/jbd2.h
--- linux-2.6.28-rc6/include/linux/jbd2.h2008-11-21 08:19:22.000000000 +0900
+++ linux-2.6.28-rc6.2/include/linux/jbd2.h2008-12-02 09:59:18.000000000 +0900
&lt; at &gt;&lt; at &gt; -1052,6 +1052,7 &lt; at &gt;&lt; at &gt; extern void journal_sync_buffer (struct
 extern void jbd2_journal_invalidatepage(journal_t *,
 struct page *, unsigned long);
 extern int jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
+extern int jbd2_journal_try_to_free_metadata_buffers(journal_t *, struct page *, gfp_t);
 extern int jbd2_journal_stop(handle_t *);
 extern int jbd2_journal_flush (journal_t *);
 extern void jbd2_journal_lock_updates (journal_t *);
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Toshiyuki Okajima</dc:creator>
    <dc:date>2008-12-02T11:18:03</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10367">
    <title>[BUG][PATCH 3/4] jbd: fix a cause of __schedule_bug via blkdev_releasepage</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10367</link>
    <description>jbd: fix a cause of __schedule_bug via blkdev_releasepage
From: Toshiyuki Okajima &lt;toshi.okajima&lt; at &gt;jp.fujitsu.com&gt;

A cause of this problem is calling log_wait_commit() on 
journal_try_to_free_buffers() with a read-lock via blkdev_releasepage(). This 
logic is for uncommitted data buffers. And a read/write-lock is required for a 
client usage of blkdev_releasepage.

By the way, we want to release only metadata buffers on ext3_release_metadata().
Because a page which binds to blkdev is used as metadata for ext3.

Therefore we don't need to wait for a commit on journal_try_to_free_buffers() 
via ext3_release_matadata().
As a result, we add a journal_try_to_free_metadata_buffers() almost same 
as journal_try_to_free_buffers() except not calling log_wait_commit.

This issue was reported by Aneesh Kumar K.V.
http://marc.info/?l=linux-ext4&amp;m=122814568309893&amp;w=2

Reported-by: "Aneesh Kumar K.V" &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt;
Signed-off-by: Toshiyuki Okajima &lt;toshi.okajima&lt; at &gt;jp.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir&lt; at &gt;linux.vnet.ibm.com&gt; 
Cc: "Theodore Ts'o" &lt;tytso&lt; at &gt;mit.edu&gt;
--
 fs/jbd/journal.c     |    1 +
 fs/jbd/transaction.c |   34 ++++++++++++++++++++++++++++++----
 include/linux/jbd.h  |    1 +
 3 files changed, 32 insertions(+), 4 deletions(-)

diff -Nurp linux-2.6.28-rc6/fs/jbd/journal.c linux-2.6.28-rc6.2/fs/jbd/journal.c
--- linux-2.6.28-rc6/fs/jbd/journal.c2008-11-21 08:19:22.000000000 +0900
+++ linux-2.6.28-rc6.2/fs/jbd/journal.c2008-12-02 09:54:26.000000000 +0900
&lt; at &gt;&lt; at &gt; -79,6 +79,7 &lt; at &gt;&lt; at &gt; EXPORT_SYMBOL(journal_wipe);
 EXPORT_SYMBOL(journal_blocks_per_page);
 EXPORT_SYMBOL(journal_invalidatepage);
 EXPORT_SYMBOL(journal_try_to_free_buffers);
+EXPORT_SYMBOL(journal_try_to_free_metadata_buffers);
 EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
diff -Nurp linux-2.6.28-rc6/fs/jbd/transaction.c linux-2.6.28-rc6.2/fs/jbd/transaction.c
--- linux-2.6.28-rc6/fs/jbd/transaction.c2008-11-21 08:19:22.000000000 +0900
+++ linux-2.6.28-rc6.2/fs/jbd/transaction.c2008-12-02 10:21:45.000000000 +0900
&lt; at &gt;&lt; at &gt; -1687,12 +1687,14 &lt; at &gt;&lt; at &gt; static void journal_wait_for_transaction
 }
 
 /**
- * int journal_try_to_free_buffers() - try to free page buffers.
+ * int __journal_try_to_free_buffers() - try to free page buffers.
  * &lt; at &gt;journal: journal for operation
  * &lt; at &gt;page: to try and free
  * &lt; at &gt;gfp_mask: we use the mask to detect how hard should we try to release
  * buffers. If __GFP_WAIT and __GFP_FS is set, we wait for commit code to
  * release the buffers.
+ * &lt; at &gt;is_metadata: If true, we won't wait for commit though __GFP_WAIT 
+ *               and __GFP_FS is set.
  *
  *
  * For all the buffers on this page,
&lt; at &gt;&lt; at &gt; -1718,14 +1720,14 &lt; at &gt;&lt; at &gt; static void journal_wait_for_transaction
  *
  * Who else is affected by this?  hmm...  Really the only contender
  * is do_get_write_access() - it could be looking at the buffer while
- * journal_try_to_free_buffer() is changing its state.  But that
+ * __journal_try_to_free_buffer() is changing its state.  But that
  * cannot happen because we never reallocate freed data as metadata
  * while the data is part of a transaction.  Yes?
  *
  * Return 0 on failure, 1 on success
  */
-int journal_try_to_free_buffers(journal_t *journal,
-struct page *page, gfp_t gfp_mask)
+static int __journal_try_to_free_buffers(journal_t *journal,
+struct page *page, gfp_t gfp_mask, bool is_metadata)
 {
 struct buffer_head *head;
 struct buffer_head *bh;
&lt; at &gt;&lt; at &gt; -1756,6 +1758,8 &lt; at &gt;&lt; at &gt; int journal_try_to_free_buffers(journal_
 } while ((bh = bh-&gt;b_this_page) != head);
 
 ret = try_to_free_buffers(page);
+if (is_metadata)
+return ret;
 
 /*
  * There are a number of places where journal_try_to_free_buffers()
&lt; at &gt;&lt; at &gt; -1781,6 +1785,28 &lt; at &gt;&lt; at &gt; busy:
 }
 
 /*
+ * journal_try_to_free_buffers:
+ * This is a wrapper function for __journal_try_to_free_buffers to try to 
+ * release data.
+ */
+int journal_try_to_free_buffers(journal_t *journal,
+struct page *page, gfp_t gfp_mask)
+{
+return __journal_try_to_free_buffers(journal, page, gfp_mask, false);
+}
+
+/*
+ * journal_try_to_free_metadata_buffers:
+ * This is a wrapper function for __journal_try_to_free_buffers to try to 
+ * release metadata.
+ */
+int journal_try_to_free_metadata_buffers(journal_t *journal,
+struct page *page, gfp_t gfp_mask)
+{
+return __journal_try_to_free_buffers(journal, page, gfp_mask, true);
+}
+
+/*
  * This buffer is no longer needed.  If it is on an older transaction's
  * checkpoint list we need to record it on this transaction's forget list
  * to pin this buffer (and hence its checkpointing transaction) down until
diff -Nurp linux-2.6.28-rc6/include/linux/jbd.h linux-2.6.28-rc6.2/include/linux/jbd.h
--- linux-2.6.28-rc6/include/linux/jbd.h2008-11-21 08:19:22.000000000 +0900
+++ linux-2.6.28-rc6.2/include/linux/jbd.h2008-12-02 09:58:59.000000000 +0900
&lt; at &gt;&lt; at &gt; -893,6 +893,7 &lt; at &gt;&lt; at &gt; extern void journal_sync_buffer (struct
 extern void journal_invalidatepage(journal_t *,
 struct page *, unsigned long);
 extern int journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
+extern int journal_try_to_free_metadata_buffers(journal_t *, struct page *, gfp_t);
 extern int journal_stop(handle_t *);
 extern int journal_flush (journal_t *);
 extern void journal_lock_updates (journal_t *);
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Toshiyuki Okajima</dc:creator>
    <dc:date>2008-12-02T11:16:18</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10366">
    <title>[BUG][PATCH 2/4] ext4: fix a cause of __schedule_bug via blkdev_releasepage</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10366</link>
    <description>ext4: fix a cause of __schedule_bug via blkdev_releasepage
From: Toshiyuki Okajima &lt;toshi.okajima&lt; at &gt;jp.fujitsu.com&gt;

A cause of this problem is calling jbd2_log_wait_commit() on 
jbd2_journal_try_to_free_buffers() with a read-lock via blkdev_releasepage(). 
This logic is for uncommitted data buffers. And a read/write-lock is required 
for a client usage of blkdev_releasepage.

By the way, we want to release only metadata buffers on ext4_release_metadata().
Because a page which binds to blkdev is used as metadata for ext4.

Therefore we don't need to wait for a commit on 
jbd2_journal_try_to_free_buffers() via ext4_release_matadata().
As a result, we add a jbd2_journal_try_to_free_metadata_buffers() almost same 
as jbd2_journal_try_to_free_buffers() except not calling jbd2_log_wait_commit().
So, we change jbd2_journal_try_to_free_buffers() with 
jbd2_journal_try_to_free_metadata_buffers().

This issue was reported by Aneesh Kumar K.V.
http://marc.info/?l=linux-ext4&amp;m=122814568309893&amp;w=2

Reported-by: "Aneesh Kumar K.V" &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt;
Signed-off-by: Toshiyuki Okajima &lt;toshi.okajima&lt; at &gt;jp.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir&lt; at &gt;linux.vnet.ibm.com&gt; 
Cc: "Theodore Ts'o" &lt;tytso&lt; at &gt;mit.edu&gt;
--
 fs/ext4/inode.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -Nurp linux-2.6.28-rc6/fs/ext4/inode.c linux-2.6.28-rc6.2/fs/ext4/inode.c
--- linux-2.6.28-rc6/fs/ext4/inode.c2008-12-02 09:32:27.000000000 +0900
+++ linux-2.6.28-rc6.2/fs/ext4/inode.c2008-12-02 09:52:46.000000000 +0900
&lt; at &gt;&lt; at &gt; -3018,7 +3018,7 &lt; at &gt;&lt; at &gt; int ext4_release_metadata(void *client, 
 BUG_ON(EXT4_SB(sb) == NULL);
 journal = EXT4_SB(sb)-&gt;s_journal;
 if (journal != NULL)
-return jbd2_journal_try_to_free_buffers(journal, page, wait);
+return jbd2_journal_try_to_free_metadata_buffers(journal, page, wait);
 else
 return try_to_free_buffers(page);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Toshiyuki Okajima</dc:creator>
    <dc:date>2008-12-02T11:09:12</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10365">
    <title>[BUG][PATCH 1/4] ext3: fix a cause of __schedule_bug via blkdev_releasepage</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10365</link>
    <description>ext3: fix a cause of __schedule_bug via blkdev_releasepage
From: Toshiyuki Okajima &lt;toshi.okajima&lt; at &gt;jp.fujitsu.com&gt;

A cause of this problem is calling log_wait_commit() on 
journal_try_to_free_buffers() with a read-lock via blkdev_releasepage(). This 
logic is for uncommitted data buffers. And a read/write-lock is required for a 
client usage of blkdev_releasepage.

By the way, we want to release only metadata buffers on ext3_release_metadata().
Because a page which binds to blkdev is used as metadata for ext3.

Therefore we don't need to wait for a commit on journal_try_to_free_buffers() 
via ext3_release_matadata().
As a result, we add a journal_try_to_free_metadata_buffers() almost same 
as journal_try_to_free_buffers() except not calling log_wait_commit.
So, we change journal_try_to_free_buffers() with 
journal_try_to_free_metadata_buffers().

This issue was reported by Aneesh Kumar K.V.
http://marc.info/?l=linux-ext4&amp;m=122814568309893&amp;w=2

Reported-by: "Aneesh Kumar K.V" &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt;
Signed-off-by: Toshiyuki Okajima &lt;toshi.okajima&lt; at &gt;jp.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir&lt; at &gt;linux.vnet.ibm.com&gt; 
Cc: "Theodore Ts'o" &lt;tytso&lt; at &gt;mit.edu&gt;
--
 fs/ext3/inode.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -Nurp linux-2.6.28-rc6/fs/ext3/inode.c linux-2.6.28-rc6.2/fs/ext3/inode.c
--- linux-2.6.28-rc6/fs/ext3/inode.c2008-12-02 09:32:27.000000000 +0900
+++ linux-2.6.28-rc6.2/fs/ext3/inode.c2008-12-02 10:19:53.000000000 +0900
&lt; at &gt;&lt; at &gt; -1696,7 +1696,7 &lt; at &gt;&lt; at &gt; int ext3_release_metadata(void *client, 
 BUG_ON(EXT3_SB(sb) == NULL);
 journal = EXT3_SB(sb)-&gt;s_journal;
 if (journal != NULL)
-return journal_try_to_free_buffers(journal, page, wait);
+return journal_try_to_free_metadata_buffers(journal, page, wait);
 else
 return try_to_free_buffers(page);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Toshiyuki Okajima</dc:creator>
    <dc:date>2008-12-02T11:06:47</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10363">
    <title>[PATCH -V3] ext4: Init the complete page while building buddy cache</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10363</link>
    <description>We need to init the complete page during buddy cache init
by setting the contents to '1'. This ensures that during
resize when we increment the s_groups_count we don't hit
the erros as below.

EXT4-fs error (device sdb1): ext4_mb_mark_diskspace_used:
Allocating block 1040385 in system zone of 127 group

V3 changes:
We don't need to do a memset when updating buddy cache since
we do a complete init of the page.

Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt;
---
 fs/ext4/mballoc.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 8f33691..0ecc7c7 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
&lt; at &gt;&lt; at &gt; -846,6 +846,8 &lt; at &gt;&lt; at &gt; static int ext4_mb_init_cache(struct page *page, char *incore)
 
 err = 0;
 first_block = page-&gt;index * blocks_per_page;
+/* init the page  */
+memset(page_address(page), 0xff, PAGE_CACHE_SIZE);
 for (i = 0; i &lt; blocks_per_page; i++) {
 int group;
 struct ext4_group_info *grinfo;
&lt; at &gt;&lt; at &gt; -872,7 +874,6 &lt; at &gt;&lt; at &gt; static int ext4_mb_init_cache(struct page *page, char *incore)
 BUG_ON(incore == NULL);
 mb_debug("put buddy for group %u in page %lu/%x\n",
 group, page-&gt;index, i * blocksize);
-memset(data, 0xff, blocksize);
 grinfo = ext4_get_group_info(sb, group);
 grinfo-&gt;bb_fragments = 0;
 memset(grinfo-&gt;bb_counters, 0,
</description>
    <dc:creator>Aneesh Kumar K.V</dc:creator>
    <dc:date>2008-12-02T06:11:02</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10362">
    <title>[PATCH -V5] ext4: fix BUG when calling ext4_error with locked block group</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10362</link>
    <description>The mballoc code likes to call ext4_error while it is holding locked
block groups.  This can causes a scheduling in atomic context BUG.  We
can't just unlock the block group and relock it after/if ext4_error
returns since that might result in race conditions in the case where
the filesystem is set to continue after finding errors.

-V5 changes:
update ext4_commit_super to use the percpu free blocks and free inodes
counter values.

Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso&lt; at &gt;mit.edu&gt;
---
 fs/ext4/ext4.h    |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext4/mballoc.c |   30 +++++++++++++++---------------
 fs/ext4/mballoc.h |   47 -----------------------------------------------
 fs/ext4/super.c   |   45 +++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 105 insertions(+), 64 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index a5fa26e..44cd21c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
&lt; at &gt;&lt; at &gt; -1131,6 +1131,9 &lt; at &gt;&lt; at &gt; extern void ext4_abort(struct super_block *, const char *, const char *, ...)
 __attribute__ ((format (printf, 3, 4)));
 extern void ext4_warning(struct super_block *, const char *, const char *, ...)
 __attribute__ ((format (printf, 3, 4)));
+extern void ext4_grp_locked_error(struct super_block *, ext4_group_t,
+const char *, const char *, ...)
+__attribute__ ((format (printf, 4, 5)));
 extern void ext4_update_dynamic_rev(struct super_block *sb);
 extern int ext4_update_compat_feature(handle_t *handle, struct super_block *sb,
 __u32 compat);
&lt; at &gt;&lt; at &gt; -1254,6 +1257,50 &lt; at &gt;&lt; at &gt; static inline void ext4_update_i_disksize(struct inode *inode, loff_t newsize)
 return ;
 }
 
+struct ext4_group_info {
+unsigned long   bb_state;
+struct rb_root  bb_free_root;
+unsigned short  bb_first_free;
+unsigned short  bb_free;
+unsigned short  bb_fragments;
+struct          list_head bb_prealloc_list;
+#ifdef DOUBLE_CHECK
+void            *bb_bitmap;
+#endif
+struct rw_semaphore alloc_sem;
+unsigned short  bb_counters[];
+};
+
+#define EXT4_GROUP_INFO_NEED_INIT_BIT0
+#define EXT4_GROUP_INFO_LOCKED_BIT1
+
+#define EXT4_MB_GRP_NEED_INIT(grp)\
+(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &amp;((grp)-&gt;bb_state)))
+
+static inline void ext4_lock_group(struct super_block *sb, ext4_group_t group)
+{
+struct ext4_group_info *grinfo = ext4_get_group_info(sb, group);
+
+bit_spin_lock(EXT4_GROUP_INFO_LOCKED_BIT, &amp;(grinfo-&gt;bb_state));
+}
+
+static inline void ext4_unlock_group(struct super_block *sb,
+ext4_group_t group)
+{
+struct ext4_group_info *grinfo = ext4_get_group_info(sb, group);
+
+bit_spin_unlock(EXT4_GROUP_INFO_LOCKED_BIT, &amp;(grinfo-&gt;bb_state));
+}
+
+static inline int ext4_is_group_locked(struct super_block *sb,
+ext4_group_t group)
+{
+struct ext4_group_info *grinfo = ext4_get_group_info(sb, group);
+
+return bit_spin_is_locked(EXT4_GROUP_INFO_LOCKED_BIT,
+&amp;(grinfo-&gt;bb_state));
+}
+
 /*
  * Inodes and files operations
  */
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index abb66a8..b1c216c 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
&lt; at &gt;&lt; at &gt; -457,8 +457,8 &lt; at &gt;&lt; at &gt; static void mb_free_blocks_double(struct inode *inode, struct ext4_buddy *e4b,
 blocknr += first + i;
 blocknr +=
     le32_to_cpu(EXT4_SB(sb)-&gt;s_es-&gt;s_first_data_block);
-
-ext4_error(sb, __func__, "double-free of inode"
+ext4_grp_locked_error(sb, e4b-&gt;bd_group,
+   __func__, "double-free of inode"
    " %lu's block %llu(bit %u in group %u)\n",
    inode ? inode-&gt;i_ino : 0, blocknr,
    first + i, e4b-&gt;bd_group);
&lt; at &gt;&lt; at &gt; -702,7 +702,7 &lt; at &gt;&lt; at &gt; static void ext4_mb_generate_buddy(struct super_block *sb,
 grp-&gt;bb_fragments = fragments;
 
 if (free != grp-&gt;bb_free) {
-ext4_error(sb, __func__,
+ext4_grp_locked_error(sb, group,  __func__,
 "EXT4-fs: group %u: %u blocks in bitmap, %u in gd\n",
 group, free, grp-&gt;bb_free);
 /*
&lt; at &gt;&lt; at &gt; -1099,8 +1099,6 &lt; at &gt;&lt; at &gt; static void mb_set_bits(spinlock_t *lock, void *bm, int cur, int len)
 
 static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
   int first, int count)
-__releases(bitlock)
-__acquires(bitlock)
 {
 int block = 0;
 int max = 0;
&lt; at &gt;&lt; at &gt; -1139,12 +1137,11 &lt; at &gt;&lt; at &gt; __acquires(bitlock)
 blocknr += block;
 blocknr +=
     le32_to_cpu(EXT4_SB(sb)-&gt;s_es-&gt;s_first_data_block);
-ext4_unlock_group(sb, e4b-&gt;bd_group);
-ext4_error(sb, __func__, "double-free of inode"
+ext4_grp_locked_error(sb, e4b-&gt;bd_group,
+   __func__, "double-free of inode"
    " %lu's block %llu(bit %u in group %u)\n",
    inode ? inode-&gt;i_ino : 0, blocknr, block,
    e4b-&gt;bd_group);
-ext4_lock_group(sb, e4b-&gt;bd_group);
 }
 mb_clear_bit(block, EXT4_MB_BITMAP(e4b));
 e4b-&gt;bd_info-&gt;bb_counters[order]++;
&lt; at &gt;&lt; at &gt; -1629,7 +1626,8 &lt; at &gt;&lt; at &gt; static void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac,
  * free blocks even though group info says we
  * we have free blocks
  */
-ext4_error(sb, __func__, "%d free blocks as per "
+ext4_grp_locked_error(sb, e4b-&gt;bd_group,
+__func__, "%d free blocks as per "
 "group info. But bitmap says 0\n",
 free);
 break;
&lt; at &gt;&lt; at &gt; -1638,7 +1636,8 &lt; at &gt;&lt; at &gt; static void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac,
 mb_find_extent(e4b, 0, i, ac-&gt;ac_g_ex.fe_len, &amp;ex);
 BUG_ON(ex.fe_len &lt;= 0);
 if (free &lt; ex.fe_len) {
-ext4_error(sb, __func__, "%d free blocks as per "
+ext4_grp_locked_error(sb, e4b-&gt;bd_group,
+__func__, "%d free blocks as per "
 "group info. But got %d blocks\n",
 free, ex.fe_len);
 /*
&lt; at &gt;&lt; at &gt; -3828,8 +3827,9 &lt; at &gt;&lt; at &gt; ext4_mb_release_inode_pa(struct ext4_buddy *e4b, struct buffer_head *bitmap_bh,
 pa, (unsigned long) pa-&gt;pa_lstart,
 (unsigned long) pa-&gt;pa_pstart,
 (unsigned long) pa-&gt;pa_len);
-ext4_error(sb, __func__, "free %u, pa_free %u\n",
-free, pa-&gt;pa_free);
+ext4_grp_locked_error(sb, group,
+__func__, "free %u, pa_free %u\n",
+free, pa-&gt;pa_free);
 /*
  * pa is already deleted so we use the value obtained
  * from the bitmap and continue.
&lt; at &gt;&lt; at &gt; -4641,9 +4641,9 &lt; at &gt;&lt; at &gt; ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
 else if (block &gt;= (entry-&gt;start_blk + entry-&gt;count))
 n = &amp;(*n)-&gt;rb_right;
 else {
-ext4_error(sb, __func__,
-    "Double free of blocks %d (%d %d)\n",
-    block, entry-&gt;start_blk, entry-&gt;count);
+ext4_grp_locked_error(sb, e4b-&gt;bd_group, __func__,
+"Double free of blocks %d (%d %d)\n",
+block, entry-&gt;start_blk, entry-&gt;count);
 return 0;
 }
 }
diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
index 997f78f..95d4c7f 100644
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
&lt; at &gt;&lt; at &gt; -118,27 +118,6 &lt; at &gt;&lt; at &gt; struct ext4_free_data {
 tid_tt_tid;
 };
 
-struct ext4_group_info {
-unsigned longbb_state;
-struct rb_root  bb_free_root;
-unsigned shortbb_first_free;
-unsigned shortbb_free;
-unsigned shortbb_fragments;
-structlist_head bb_prealloc_list;
-#ifdef DOUBLE_CHECK
-void*bb_bitmap;
-#endif
-struct rw_semaphore alloc_sem;
-unsigned shortbb_counters[];
-};
-
-#define EXT4_GROUP_INFO_NEED_INIT_BIT0
-#define EXT4_GROUP_INFO_LOCKED_BIT1
-
-#define EXT4_MB_GRP_NEED_INIT(grp)\
-(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &amp;((grp)-&gt;bb_state)))
-
-
 struct ext4_prealloc_space {
 struct list_headpa_inode_list;
 struct list_headpa_group_list;
&lt; at &gt;&lt; at &gt; -264,32 +243,6 &lt; at &gt;&lt; at &gt; static inline void ext4_mb_store_history(struct ext4_allocation_context *ac)
 #define in_range(b, first, len)((b) &gt;= (first) &amp;&amp; (b) &lt;= (first) + (len) - 1)
 
 struct buffer_head *read_block_bitmap(struct super_block *, ext4_group_t);
-
-
-static inline void ext4_lock_group(struct super_block *sb, ext4_group_t group)
-{
-struct ext4_group_info *grinfo = ext4_get_group_info(sb, group);
-
-bit_spin_lock(EXT4_GROUP_INFO_LOCKED_BIT, &amp;(grinfo-&gt;bb_state));
-}
-
-static inline void ext4_unlock_group(struct super_block *sb,
-ext4_group_t group)
-{
-struct ext4_group_info *grinfo = ext4_get_group_info(sb, group);
-
-bit_spin_unlock(EXT4_GROUP_INFO_LOCKED_BIT, &amp;(grinfo-&gt;bb_state));
-}
-
-static inline int ext4_is_group_locked(struct super_block *sb,
-ext4_group_t group)
-{
-struct ext4_group_info *grinfo = ext4_get_group_info(sb, group);
-
-return bit_spin_is_locked(EXT4_GROUP_INFO_LOCKED_BIT,
-&amp;(grinfo-&gt;bb_state));
-}
-
 static inline ext4_fsblk_t ext4_grp_offs_to_block(struct super_block *sb,
 struct ext4_free_extent *fex)
 {
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 7f188c1..8aef386 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
&lt; at &gt;&lt; at &gt; -350,6 +350,44 &lt; at &gt;&lt; at &gt; void ext4_warning(struct super_block *sb, const char *function,
 va_end(args);
 }
 
+void ext4_grp_locked_error(struct super_block *sb, ext4_group_t grp,
+const char *function, const char *fmt, ...)
+__releases(bitlock)
+__acquires(bitlock)
+{
+va_list args;
+struct ext4_super_block *es = EXT4_SB(sb)-&gt;s_es;
+
+va_start(args, fmt);
+printk(KERN_CRIT "EXT4-fs error (device %s): %s: ", sb-&gt;s_id, function);
+vprintk(fmt, args);
+printk("\n");
+va_end(args);
+
+if (test_opt(sb, ERRORS_CONT)) {
+EXT4_SB(sb)-&gt;s_mount_state |= EXT4_ERROR_FS;
+es-&gt;s_state |= cpu_to_le16(EXT4_ERROR_FS);
+ext4_commit_super(sb, es, 0);
+return;
+}
+ext4_unlock_group(sb, grp);
+ext4_handle_error(sb);
+/*
+ * We only get here in the ERRORS_RO case; relocking the group
+ * may be dangerous, but nothing bad will happen since the
+ * filesystem will have already been marked read/only and the
+ * journal has been aborted.  We return 1 as a hint to callers
+ * who might what to use the return value from
+ * ext4_grp_locked_error() to distinguish beween the
+ * ERRORS_CONT and ERRORS_RO case, and perhaps return more
+ * aggressively from the ext4 function in question, with a
+ * more appropriate error code.
+ */
+ext4_lock_group(sb, grp);
+return;
+}
+
+
 void ext4_update_dynamic_rev(struct super_block *sb)
 {
 struct ext4_super_block *es = EXT4_SB(sb)-&gt;s_es;
&lt; at &gt;&lt; at &gt; -2820,8 +2858,11 &lt; at &gt;&lt; at &gt; static void ext4_commit_super(struct super_block *sb,
 set_buffer_uptodate(sbh);
 }
 es-&gt;s_wtime = cpu_to_le32(get_seconds());
-ext4_free_blocks_count_set(es, ext4_count_free_blocks(sb));
-es-&gt;s_free_inodes_count = cpu_to_le32(ext4_count_free_inodes(sb));
+ext4_free_blocks_count_set(es, percpu_counter_sum_positive(
+&amp;EXT4_SB(sb)-&gt;s_freeblocks_counter));
+es-&gt;s_free_inodes_count = cpu_to_le32(percpu_counter_sum_positive(
+&amp;EXT4_SB(sb)-&gt;s_freeinodes_counter));
+
 BUFFER_TRACE(sbh, "marking dirty");
 mark_buffer_dirty(sbh);
 if (sync) {
</description>
    <dc:creator>Aneesh Kumar K.V</dc:creator>
    <dc:date>2008-12-02T06:09:57</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10360">
    <title>[patch 1/1] ext4: allocate -&gt;s_blockgroup_lock separately</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10360</link>
    <description>From: Pekka Enberg &lt;penberg&lt; at &gt;cs.helsinki.fi&gt;

As spotted by kmemtrace, struct ext4_sb_info is 17664 bytes on 64-bit
which makes it a very bad fit for SLAB allocators.  The culprit of the
wasted memory is -&gt;s_blockgroup_lock which can be as big as 16 KB when
NR_CPUS &gt;= 32.

To fix that, allocate -&gt;s_blockgroup_lock, which fits nicely in a order 2
page in the worst case, separately.  This shinks down struct ext4_sb_info
enough to fit a 2 KB slab cache so now we allocate 16 KB + 2 KB instead of
32 KB saving 14 KB of memory.

Acked-by: Andreas Dilger &lt;adilger&lt; at &gt;sun.com&gt;
Signed-off-by: Pekka Enberg &lt;penberg&lt; at &gt;cs.helsinki.fi&gt;
Cc: &lt;linux-ext4&lt; at &gt;vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm&lt; at &gt;linux-foundation.org&gt;
---

 fs/ext4/ext4_sb.h |    4 ++--
 fs/ext4/super.c   |   10 +++++++++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff -puN fs/ext4/ext4_sb.h~ext4-allocate-s_blockgroup_lock-separately fs/ext4/ext4_sb.h
--- a/fs/ext4/ext4_sb.h~ext4-allocate-s_blockgroup_lock-separately
+++ a/fs/ext4/ext4_sb.h
&lt; at &gt;&lt; at &gt; -62,7 +62,7 &lt; at &gt;&lt; at &gt; struct ext4_sb_info {
 struct percpu_counter s_freeinodes_counter;
 struct percpu_counter s_dirs_counter;
 struct percpu_counter s_dirtyblocks_counter;
-struct blockgroup_lock s_blockgroup_lock;
+struct blockgroup_lock *s_blockgroup_lock;
 struct proc_dir_entry *s_proc;
 
 /* root of the per fs reservation window tree */
&lt; at &gt;&lt; at &gt; -150,7 +150,7 &lt; at &gt;&lt; at &gt; struct ext4_sb_info {
 static inline spinlock_t *
 sb_bgl_lock(struct ext4_sb_info *sbi, unsigned int block_group)
 {
-return bgl_lock_ptr(&amp;sbi-&gt;s_blockgroup_lock, block_group);
+return bgl_lock_ptr(sbi-&gt;s_blockgroup_lock, block_group);
 }
 
 #endif/* _EXT4_SB */
diff -puN fs/ext4/super.c~ext4-allocate-s_blockgroup_lock-separately fs/ext4/super.c
--- a/fs/ext4/super.c~ext4-allocate-s_blockgroup_lock-separately
+++ a/fs/ext4/super.c
&lt; at &gt;&lt; at &gt; -497,6 +497,7 &lt; at &gt;&lt; at &gt; static void ext4_put_super(struct super_
 ext4_blkdev_remove(sbi);
 }
 sb-&gt;s_fs_info = NULL;
+kfree(sbi-&gt;s_blockgroup_lock);
 kfree(sbi);
 return;
 }
&lt; at &gt;&lt; at &gt; -1886,6 +1887,13 &lt; at &gt;&lt; at &gt; static int ext4_fill_super(struct super_
 sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
 if (!sbi)
 return -ENOMEM;
+
+sbi-&gt;s_blockgroup_lock =
+kzalloc(sizeof(struct blockgroup_lock), GFP_KERNEL);
+if (!sbi-&gt;s_blockgroup_lock) {
+kfree(sbi);
+return -ENOMEM;
+}
 sb-&gt;s_fs_info = sbi;
 sbi-&gt;s_mount_opt = 0;
 sbi-&gt;s_resuid = EXT4_DEF_RESUID;
&lt; at &gt;&lt; at &gt; -2194,7 +2202,7 &lt; at &gt;&lt; at &gt; static int ext4_fill_super(struct super_
  &amp;sbi-&gt;s_inode_readahead_blks);
 #endif
 
-bgl_lock_init(&amp;sbi-&gt;s_blockgroup_lock);
+bgl_lock_init(sbi-&gt;s_blockgroup_lock);
 
 for (i = 0; i &lt; db_count; i++) {
 block = descriptor_loc(sb, logical_sb_block, i);
_
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>akpm&lt; at &gt;linux-foundation.org</dc:creator>
    <dc:date>2008-12-01T22:27:04</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10351">
    <title>[ext4] Documentation patch</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10351</link>
    <description>My first patch, hope it goes through ok! 
Applies to 2.6.28-rc6-00184-gd9d060a.

Compare ext4's journalling with anything but ext3's. These use the same
journalling modes, most other Linux filesystems do only metadata
journalling.

Signed-off-by: Harald Arnesen &lt;harald&lt; at &gt;skogtun.org&gt;

========================================================================
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 174eaff..5bbe79e 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
&lt; at &gt;&lt; at &gt; -61,7 +61,7 &lt; at &gt;&lt; at &gt; Note: More extensive information for getting started with ext4 can be
   - When comparing performance with other filesystems, remember that
     ext3/4 by default offers higher data integrity guarantees than most.
     So when comparing with a metadata-only journalling filesystem, such
-    as ext3, use `mount -o data=writeback'.  And you might as well use
+    as jfs or xfs, use `mount -o data=writeback'.  And you might as well use
     `mount -o nobh' too along with it.  Making the journal larger than
     the mke2fs default often helps performance with metadata-intensive
     workloads.
</description>
    <dc:creator>Harald Arnesen</dc:creator>
    <dc:date>2008-12-01T16:46:56</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10348">
    <title>BUG: scheduling while atomic: kswapd0/273/0x00000002</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10348</link>
    <description>Hi,

With the latest patch queue i am getting the below error

-aneesh


BUG: scheduling while atomic: kswapd0/273/0x00000002
1 lock held by kswapd0/273:
 #0:  (&amp;ei-&gt;client_lock){----}, at: [&lt;c04a014f&gt;] blkdev_releasepage+0x27/0x5a
Modules linked in: autofs4 hidp rfkill input_polldev sbs sbshc battery ac parport_pc lp parport i6300esb i2c_i801 i2c_core tg3 libphy e752x_edac edac_core qla2xxx scsi_transport_fc dm_multipath dm_mirror dm_region_hash dm_log dm_mod loop xt_tcpudp ip6t_REJECT ipv6 ipt_REJECT x_tables sunrpc rfcomm bnep l2cap bluetooth bridge stp sg rtc_cmos rtc_core rtc_lib pcspkr button ata_piix libata mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: ip_tables]
Pid: 273, comm: kswapd0 Not tainted 2.6.28-rc6-autotest #1
Call Trace:
 [&lt;c0426277&gt;] __schedule_bug+0x5e/0x65
 [&lt;c067d302&gt;] schedule+0x85/0x7db
 [&lt;c04454d6&gt;] ? mark_held_locks+0x49/0x64
 [&lt;c0445650&gt;] ? trace_hardirqs_on_caller+0xe0/0x101
 [&lt;c044567c&gt;] ? trace_hardirqs_on+0xb/0xd
 [&lt;c067fdc8&gt;] ? _spin_unlock_irqrestore+0x42/0x58
 [&lt;c043b548&gt;] ? prepare_to_wait+0x3f/0x45
 [&lt;c04e65dc&gt;] jbd2_log_wait_commit+0x8e/0xd8
 [&lt;c043b405&gt;] ? autoremove_wake_function+0x0/0x33
 [&lt;c04e2ca1&gt;] jbd2_journal_try_to_free_buffers+0x175/0x185
 [&lt;c04c71d2&gt;] ext4_release_metadata+0x54/0x66
 [&lt;c04c717e&gt;] ? ext4_release_metadata+0x0/0x66
 [&lt;c04a0166&gt;] blkdev_releasepage+0x3e/0x5a
 [&lt;c045e987&gt;] try_to_release_page+0x32/0x3f
 [&lt;c0467c5c&gt;] shrink_page_list+0x436/0x57e
 [&lt;c044349b&gt;] ? put_lock_stats+0xd/0x21
 [&lt;c044353c&gt;] ? lock_release_holdtime+0x8d/0x93
 [&lt;c067fe00&gt;] ? _spin_unlock_irq+0x22/0x42
 [&lt;c0445650&gt;] ? trace_hardirqs_on_caller+0xe0/0x101
 [&lt;c0467f8e&gt;] shrink_list+0x1ea/0x47c
 [&lt;c044349b&gt;] ? put_lock_stats+0xd/0x21
 [&lt;c0468432&gt;] shrink_zone+0x212/0x27f
 [&lt;c0468f5b&gt;] kswapd+0x341/0x4c6
 [&lt;c0466cc5&gt;] ? isolate_pages_global+0x0/0x18c
 [&lt;c043b405&gt;] ? autoremove_wake_function+0x0/0x33
 [&lt;c0468c1a&gt;] ? kswapd+0x0/0x4c6
 [&lt;c043b344&gt;] kthread+0x3b/0x63
 [&lt;c043b309&gt;] ? kthread+0x0/0x63
 [&lt;c04048f3&gt;] kernel_thread_helper+0x7/0x10
BUG: scheduling while atomic: kswapd0/273/0x0000000
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Aneesh Kumar K.V</dc:creator>
    <dc:date>2008-12-01T15:26:56</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10346">
    <title>[PATCH -V5 RESEND] ext4: Use EXT4_GROUP_INFO_NEED_INIT_BIT during resize</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10346</link>
    <description>The new groups added during resize are flagged as
need_init group. Make sure we properly initialize these
groups. When we have block size &lt; page size and we are adding
new groups the page may still be marked uptodate even though
we haven't initialized the group. While forcing the init
of buddy cache we need to make sure other groups part of the
same page of buddy cache is not using the cache.
group_info-&gt;alloc_sem is added to ensure the same.

FILENAME: aneesh-3-use_EXT4_GROUP_INFO_NEED_INIT_BIT-during-resize
V5 changes:
The change in this version is to make sure we we are adding new groups
we don't have a parllel ext4_mb_init_cache happening on other groups
which are part of the same page in buddy cache. We do that by taking
write lock on the group_info alloc_sem semaphore.

Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso&lt; at &gt;mit.edu&gt;
---
 fs/ext4/balloc.c  |   21 +++--
 fs/ext4/ext4.h    |    7 +-
 fs/ext4/mballoc.c |  261 +++++++++++++++++++++++++++++++++++++++++------------
 fs/ext4/mballoc.h |    3 +
 fs/ext4/resize.c  |   49 ++---------
 5 files changed, 230 insertions(+), 111 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index fa69cb3..39df492 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
&lt; at &gt;&lt; at &gt; -381,6 +381,7 &lt; at &gt;&lt; at &gt; void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
 ext4_debug("Adding block(s) %llu-%llu\n", block, block + count - 1);
 
 ext4_get_group_no_and_offset(sb, block, &amp;block_group, &amp;bit);
+grp = ext4_get_group_info(sb, block_group);
 /*
  * Check to see if we are freeing blocks across a group
  * boundary.
&lt; at &gt;&lt; at &gt; -425,7 +426,11 &lt; at &gt;&lt; at &gt; void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
 err = ext4_journal_get_write_access(handle, gd_bh);
 if (err)
 goto error_return;
-
+/*
+ * make sure we don't allow a parallel init on other groups in the
+ * same buddy cache
+ */
+down_write(&amp;grp-&gt;alloc_sem);
 for (i = 0, blocks_freed = 0; i &lt; count; i++) {
 BUFFER_TRACE(bitmap_bh, "clear bit");
 if (!ext4_clear_bit_atomic(sb_bgl_lock(sbi, block_group),
&lt; at &gt;&lt; at &gt; -450,6 +455,13 &lt; at &gt;&lt; at &gt; void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
 sbi-&gt;s_flex_groups[flex_group].free_blocks += blocks_freed;
 spin_unlock(sb_bgl_lock(sbi, flex_group));
 }
+/*
+ * request to reload the buddy with the
+ * new bitmap information
+ */
+set_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &amp;(grp-&gt;bb_state));
+ext4_mb_update_group_info(grp, blocks_freed);
+up_write(&amp;grp-&gt;alloc_sem);
 
 /* We dirtied the bitmap block */
 BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
&lt; at &gt;&lt; at &gt; -461,13 +473,6 &lt; at &gt;&lt; at &gt; void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
 if (!err)
 err = ret;
 sb-&gt;s_dirt = 1;
-/*
- * request to reload the buddy with the
- * new bitmap information
- */
-grp = ext4_get_group_info(sb, block_group);
-set_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &amp;(grp-&gt;bb_state));
-ext4_mb_update_group_info(grp, blocks_freed);
 
 error_return:
 brelse(bitmap_bh);
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0bbc052..a5fa26e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
&lt; at &gt;&lt; at &gt; -1065,12 +1065,13 &lt; at &gt;&lt; at &gt; extern int __init init_ext4_mballoc(void);
 extern void exit_ext4_mballoc(void);
 extern void ext4_mb_free_blocks(handle_t *, struct inode *,
 unsigned long, unsigned long, int, unsigned long *);
-extern int ext4_mb_add_more_groupinfo(struct super_block *sb,
+extern int ext4_mb_add_groupinfo(struct super_block *sb,
 ext4_group_t i, struct ext4_group_desc *desc);
 extern void ext4_mb_update_group_info(struct ext4_group_info *grp,
 ext4_grpblk_t add);
-
-
+extern int ext4_mb_get_buddy_cache_lock(struct super_block *, ext4_group_t);
+extern void ext4_mb_put_buddy_cache_lock(struct super_block *,
+ext4_group_t, int);
 /* inode.c */
 int ext4_forget(handle_t *handle, int is_metadata, struct inode *inode,
 struct buffer_head *bh, ext4_fsblk_t blocknr);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 9cc93af..81b05cf 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
&lt; at &gt;&lt; at &gt; -886,18 +886,20 &lt; at &gt;&lt; at &gt; static int ext4_mb_init_cache(struct page *page, char *incore)
 ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 struct ext4_buddy *e4b)
 {
-struct ext4_sb_info *sbi = EXT4_SB(sb);
-struct inode *inode = sbi-&gt;s_buddy_cache;
 int blocks_per_page;
 int block;
 int pnum;
 int poff;
 struct page *page;
 int ret;
+struct ext4_group_info *grp;
+struct ext4_sb_info *sbi = EXT4_SB(sb);
+struct inode *inode = sbi-&gt;s_buddy_cache;
 
 mb_debug("load group %u\n", group);
 
 blocks_per_page = PAGE_CACHE_SIZE / sb-&gt;s_blocksize;
+grp = ext4_get_group_info(sb, group);
 
 e4b-&gt;bd_blkbits = sb-&gt;s_blocksize_bits;
 e4b-&gt;bd_info = ext4_get_group_info(sb, group);
&lt; at &gt;&lt; at &gt; -905,6 +907,15 &lt; at &gt;&lt; at &gt; ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 e4b-&gt;bd_group = group;
 e4b-&gt;bd_buddy_page = NULL;
 e4b-&gt;bd_bitmap_page = NULL;
+e4b-&gt;alloc_semp = &amp;grp-&gt;alloc_sem;
+
+/* Take the read lock on the group alloc
+ * sem. This would make sure a parallel
+ * ext4_mb_init_group happening on other
+ * groups mapped by the page is blocked
+ * till we are done with allocation
+ */
+down_read(e4b-&gt;alloc_semp);
 
 /*
  * the buddy cache inode stores the block bitmap
&lt; at &gt;&lt; at &gt; -920,6 +931,14 &lt; at &gt;&lt; at &gt; ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 page = find_get_page(inode-&gt;i_mapping, pnum);
 if (page == NULL || !PageUptodate(page)) {
 if (page)
+/*
+ * drop the page reference and try
+ * to get the page with lock. If we
+ * are not uptodate that implies
+ * somebody just created the page but
+ * is yet to initialize the same. So
+ * wait for it to initialize.
+ */
 page_cache_release(page);
 page = find_or_create_page(inode-&gt;i_mapping, pnum, GFP_NOFS);
 if (page) {
&lt; at &gt;&lt; at &gt; -985,6 +1004,9 &lt; at &gt;&lt; at &gt; ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 page_cache_release(e4b-&gt;bd_buddy_page);
 e4b-&gt;bd_buddy = NULL;
 e4b-&gt;bd_bitmap = NULL;
+
+/* Done with the buddy cache */
+up_read(e4b-&gt;alloc_semp);
 return ret;
 }
 
&lt; at &gt;&lt; at &gt; -994,6 +1016,8 &lt; at &gt;&lt; at &gt; static void ext4_mb_release_desc(struct ext4_buddy *e4b)
 page_cache_release(e4b-&gt;bd_bitmap_page);
 if (e4b-&gt;bd_buddy_page)
 page_cache_release(e4b-&gt;bd_buddy_page);
+/* Done with the buddy cache */
+up_read(e4b-&gt;alloc_semp);
 }
 
 
&lt; at &gt;&lt; at &gt; -1694,6 +1718,173 &lt; at &gt;&lt; at &gt; static int ext4_mb_good_group(struct ext4_allocation_context *ac,
 return 0;
 }
 
+/*
+ * lock the group_info alloc_sem of all the groups
+ * belonging to the same buddy cache page. This
+ * make sure other parallel operation on the buddy
+ * cache doesn't happen  whild holding the buddy cache
+ * lock
+ */
+int ext4_mb_get_buddy_cache_lock(struct super_block *sb, ext4_group_t group)
+{
+int i;
+int block, pnum;
+int blocks_per_page;
+int groups_per_page;
+ext4_group_t first_group;
+struct ext4_group_info *grp;
+
+blocks_per_page = PAGE_CACHE_SIZE / sb-&gt;s_blocksize;
+/*
+ * the buddy cache inode stores the block bitmap
+ * and buddy information in consecutive blocks.
+ * So for each group we need two blocks.
+ */
+block = group * 2;
+pnum = block / blocks_per_page;
+first_group = pnum * blocks_per_page / 2;
+
+groups_per_page = blocks_per_page &gt;&gt; 1;
+if (groups_per_page == 0)
+groups_per_page = 1;
+/* read all groups the page covers into the cache */
+for (i = 0; i &lt; groups_per_page; i++) {
+
+if ((first_group + i) &gt;= EXT4_SB(sb)-&gt;s_groups_count)
+break;
+grp = ext4_get_group_info(sb, first_group + i);
+/* take all groups write allocation
+ * semaphore. This make sure there is
+ * no block allocation going on in any
+ * of that groups
+ */
+down_write(&amp;grp-&gt;alloc_sem);
+}
+return i;
+}
+
+void ext4_mb_put_buddy_cache_lock(struct super_block *sb,
+ext4_group_t group, int locked_group)
+{
+int i;
+int block, pnum;
+int blocks_per_page;
+ext4_group_t first_group;
+struct ext4_group_info *grp;
+
+blocks_per_page = PAGE_CACHE_SIZE / sb-&gt;s_blocksize;
+/*
+ * the buddy cache inode stores the block bitmap
+ * and buddy information in consecutive blocks.
+ * So for each group we need two blocks.
+ */
+block = group * 2;
+pnum = block / blocks_per_page;
+first_group = pnum * blocks_per_page / 2;
+/* release locks on all the groups */
+for (i = 0; i &lt; locked_group; i++) {
+
+grp = ext4_get_group_info(sb, first_group + i);
+/* take all groups write allocation
+ * semaphore. This make sure there is
+ * no block allocation going on in any
+ * of that groups
+ */
+up_write(&amp;grp-&gt;alloc_sem);
+}
+
+}
+
+static int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
+{
+
+int ret;
+void *bitmap;
+int blocks_per_page;
+int block, pnum, poff;
+int num_grp_locked = 0;
+struct ext4_group_info *this_grp;
+struct ext4_sb_info *sbi = EXT4_SB(sb);
+struct inode *inode = sbi-&gt;s_buddy_cache;
+struct page *page = NULL, *bitmap_page = NULL;
+
+mb_debug("init group %lu\n", group);
+blocks_per_page = PAGE_CACHE_SIZE / sb-&gt;s_blocksize;
+this_grp = ext4_get_group_info(sb, group);
+/*
+ * This ensures we don't add group
+ * to this buddy cache via resize
+ */
+num_grp_locked =  ext4_mb_get_buddy_cache_lock(sb, group);
+if (!EXT4_MB_GRP_NEED_INIT(this_grp)) {
+/*
+ * somebody initialized the group
+ * return without doing anything
+ */
+ret = 0;
+goto err;
+}
+/*
+ * the buddy cache inode stores the block bitmap
+ * and buddy information in consecutive blocks.
+ * So for each group we need two blocks.
+ */
+block = group * 2;
+pnum = block / blocks_per_page;
+poff = block % blocks_per_page;
+page = find_or_create_page(inode-&gt;i_mapping, pnum, GFP_NOFS);
+if (page) {
+BUG_ON(page-&gt;mapping != inode-&gt;i_mapping);
+ret = ext4_mb_init_cache(page, NULL);
+if (ret) {
+unlock_page(page);
+goto err;
+}
+unlock_page(page);
+}
+if (page == NULL || !PageUptodate(page)) {
+ret = -EIO;
+goto err;
+}
+mark_page_accessed(page);
+bitmap_page = page;
+bitmap = page_address(page) + (poff * sb-&gt;s_blocksize);
+
+/* init buddy cache */
+block++;
+pnum = block / blocks_per_page;
+poff = block % blocks_per_page;
+page = find_or_create_page(inode-&gt;i_mapping, pnum, GFP_NOFS);
+if (page == bitmap_page) {
+/*
+ * If both the bitmap and buddy are in
+ * the same page we don't need to force
+ * init the buddy
+ */
+unlock_page(page);
+} else if (page) {
+BUG_ON(page-&gt;mapping != inode-&gt;i_mapping);
+ret = ext4_mb_init_cache(page, bitmap);
+if (ret) {
+unlock_page(page);
+goto err;
+}
+unlock_page(page);
+}
+if (page == NULL || !PageUptodate(page)) {
+ret = -EIO;
+goto err;
+}
+mark_page_accessed(page);
+err:
+ext4_mb_put_buddy_cache_lock(sb, group, num_grp_locked);
+if (bitmap_page)
+page_cache_release(bitmap_page);
+if (page)
+page_cache_release(page);
+return ret;
+}
+
 static noinline_for_stack int
 ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
 {
&lt; at &gt;&lt; at &gt; -1777,7 +1968,7 &lt; at &gt;&lt; at &gt; ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
 group = 0;
 
 /* quick check to skip empty groups */
-grp = ext4_get_group_info(ac-&gt;ac_sb, group);
+grp = ext4_get_group_info(sb, group);
 if (grp-&gt;bb_free == 0)
 continue;
 
&lt; at &gt;&lt; at &gt; -1790,10 +1981,9 &lt; at &gt;&lt; at &gt; ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
  * we need full data about the group
  * to make a good selection
  */
-err = ext4_mb_load_buddy(sb, group, &amp;e4b);
+err = ext4_mb_init_group(sb, group);
 if (err)
 goto out;
-ext4_mb_release_desc(&amp;e4b);
 }
 
 /*
&lt; at &gt;&lt; at &gt; -2244,7 +2434,7 &lt; at &gt;&lt; at &gt; ext4_mb_store_history(struct ext4_allocation_context *ac)
 
 
 /* Create and initialize ext4_group_info data for the given group. */
-static int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
+int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
   struct ext4_group_desc *desc)
 {
 int i, len;
&lt; at &gt;&lt; at &gt; -2302,6 +2492,7 &lt; at &gt;&lt; at &gt; static int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
 }
 
 INIT_LIST_HEAD(&amp;meta_group_info[i]-&gt;bb_prealloc_list);
+init_rwsem(&amp;meta_group_info[i]-&gt;alloc_sem);
 meta_group_info[i]-&gt;bb_free_root.rb_node = NULL;;
 
 #ifdef DOUBLE_CHECK
&lt; at &gt;&lt; at &gt; -2329,54 +2520,6 &lt; at &gt;&lt; at &gt; static int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
 } /* ext4_mb_add_groupinfo */
 
 /*
- * Add a group to the existing groups.
- * This function is used for online resize
- */
-int ext4_mb_add_more_groupinfo(struct super_block *sb, ext4_group_t group,
-       struct ext4_group_desc *desc)
-{
-struct ext4_sb_info *sbi = EXT4_SB(sb);
-struct inode *inode = sbi-&gt;s_buddy_cache;
-int blocks_per_page;
-int block;
-int pnum;
-struct page *page;
-int err;
-
-/* Add group based on group descriptor*/
-err = ext4_mb_add_groupinfo(sb, group, desc);
-if (err)
-return err;
-
-/*
- * Cache pages containing dynamic mb_alloc datas (buddy and bitmap
- * datas) are set not up to date so that they will be re-initilaized
- * during the next call to ext4_mb_load_buddy
- */
-
-/* Set buddy page as not up to date */
-blocks_per_page = PAGE_CACHE_SIZE / sb-&gt;s_blocksize;
-block = group * 2;
-pnum = block / blocks_per_page;
-page = find_get_page(inode-&gt;i_mapping, pnum);
-if (page != NULL) {
-ClearPageUptodate(page);
-page_cache_release(page);
-}
-
-/* Set bitmap page as not up to date */
-block++;
-pnum = block / blocks_per_page;
-page = find_get_page(inode-&gt;i_mapping, pnum);
-if (page != NULL) {
-ClearPageUptodate(page);
-page_cache_release(page);
-}
-
-return 0;
-}
-
-/*
  * Update an existing group.
  * This function is used for online resize
  */
&lt; at &gt;&lt; at &gt; -4583,11 +4726,6 &lt; at &gt;&lt; at &gt; void ext4_mb_free_blocks(handle_t *handle, struct inode *inode,
 err = ext4_journal_get_write_access(handle, gd_bh);
 if (err)
 goto error_return;
-
-err = ext4_mb_load_buddy(sb, block_group, &amp;e4b);
-if (err)
-goto error_return;
-
 #ifdef AGGRESSIVE_CHECK
 {
 int i;
&lt; at &gt;&lt; at &gt; -4601,6 +4739,8 &lt; at &gt;&lt; at &gt; void ext4_mb_free_blocks(handle_t *handle, struct inode *inode,
 /* We dirtied the bitmap block */
 BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
 err = ext4_journal_dirty_metadata(handle, bitmap_bh);
+if (err)
+goto error_return;
 
 if (ac) {
 ac-&gt;ac_b_ex.fe_group = block_group;
&lt; at &gt;&lt; at &gt; -4609,6 +4749,9 &lt; at &gt;&lt; at &gt; void ext4_mb_free_blocks(handle_t *handle, struct inode *inode,
 ext4_mb_store_history(ac);
 }
 
+err = ext4_mb_load_buddy(sb, block_group, &amp;e4b);
+if (err)
+goto error_return;
 if (metadata) {
 /* blocks being freed are metadata. these blocks shouldn't
  * be used until this transaction is committed */
diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
index b5dff1f..a931b6b 100644
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
&lt; at &gt;&lt; at &gt; -20,6 +20,7 &lt; at &gt;&lt; at &gt;
 #include &lt;linux/version.h&gt;
 #include &lt;linux/blkdev.h&gt;
 #include &lt;linux/marker.h&gt;
+#include &lt;linux/mutex.h&gt;
 #include "ext4_jbd2.h"
 #include "ext4.h"
 #include "group.h"
&lt; at &gt;&lt; at &gt; -130,6 +131,7 &lt; at &gt;&lt; at &gt; struct ext4_group_info {
 #ifdef DOUBLE_CHECK
 void*bb_bitmap;
 #endif
+struct rw_semaphore alloc_sem;
 unsigned shortbb_counters[];
 };
 
&lt; at &gt;&lt; at &gt; -250,6 +252,7 &lt; at &gt;&lt; at &gt; struct ext4_buddy {
 struct super_block *bd_sb;
 __u16 bd_blkbits;
 ext4_group_t bd_group;
+struct rw_semaphore *alloc_semp;
 };
 #define EXT4_MB_BITMAP(e4b)((e4b)-&gt;bd_bitmap)
 #define EXT4_MB_BUDDY(e4b)((e4b)-&gt;bd_buddy)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index db03e05..90a0eaa 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
&lt; at &gt;&lt; at &gt; -747,6 +747,7 &lt; at &gt;&lt; at &gt; int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
 struct inode *inode = NULL;
 handle_t *handle;
 int gdb_off, gdb_num;
+int num_grp_locked = 0;
 int err, err2;
 
 gdb_num = input-&gt;group / EXT4_DESC_PER_BLOCK(sb);
&lt; at &gt;&lt; at &gt; -787,6 +788,7 &lt; at &gt;&lt; at &gt; int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
 }
 }
 
+
 if ((err = verify_group_input(sb, input)))
 goto exit_put;
 
&lt; at &gt;&lt; at &gt; -855,6 +857,7 &lt; at &gt;&lt; at &gt; int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
          * using the new disk blocks.
          */
 
+num_grp_locked = ext4_mb_get_buddy_cache_lock(sb, input-&gt;group);
 /* Update group descriptor block for new group */
 gdp = (struct ext4_group_desc *)((char *)primary-&gt;b_data +
  gdb_off * EXT4_DESC_SIZE(sb));
&lt; at &gt;&lt; at &gt; -871,9 +874,11 &lt; at &gt;&lt; at &gt; int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
  * We can allocate memory for mb_alloc based on the new group
  * descriptor
  */
-err = ext4_mb_add_more_groupinfo(sb, input-&gt;group, gdp);
-if (err)
+err = ext4_mb_add_groupinfo(sb, input-&gt;group, gdp);
+if (err) {
+ext4_mb_put_buddy_cache_lock(sb, input-&gt;group, num_grp_locked);
 goto exit_journal;
+}
 
 /*
  * Make the new blocks and inodes valid next.  We do this before
&lt; at &gt;&lt; at &gt; -915,6 +920,7 &lt; at &gt;&lt; at &gt; int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
 
 /* Update the global fs size fields */
 sbi-&gt;s_groups_count++;
+ext4_mb_put_buddy_cache_lock(sb, input-&gt;group, num_grp_locked);
 
 ext4_journal_dirty_metadata(handle, primary);
 
&lt; at &gt;&lt; at &gt; -1082,45 +1088,6 &lt; at &gt;&lt; at &gt; int ext4_group_extend(struct super_block *sb, struct ext4_super_block *es,
 if ((err = ext4_journal_stop(handle)))
 goto exit_put;
 
-/*
- * Mark mballoc pages as not up to date so that they will be updated
- * next time they are loaded by ext4_mb_load_buddy.
- *
- * XXX Bad, Bad, BAD!!!  We should not be overloading the
- * Uptodate flag, particularly on thte bitmap bh, as way of
- * hinting to ext4_mb_load_buddy() that it needs to be
- * overloaded.  A user could take a LVM snapshot, then do an
- * on-line fsck, and clear the uptodate flag, and this would
- * not be a bug in userspace, but a bug in the kernel.  FIXME!!!
- */
-{
-struct ext4_sb_info *sbi = EXT4_SB(sb);
-struct inode *inode = sbi-&gt;s_buddy_cache;
-int blocks_per_page;
-int block;
-int pnum;
-struct page *page;
-
-/* Set buddy page as not up to date */
-blocks_per_page = PAGE_CACHE_SIZE / sb-&gt;s_blocksize;
-block = group * 2;
-pnum = block / blocks_per_page;
-page = find_get_page(inode-&gt;i_mapping, pnum);
-if (page != NULL) {
-ClearPageUptodate(page);
-page_cache_release(page);
-}
-
-/* Set bitmap page as not up to date */
-block++;
-pnum = block / blocks_per_page;
-page = find_get_page(inode-&gt;i_mapping, pnum);
-if (page != NULL) {
-ClearPageUptodate(page);
-page_cache_release(page);
-}
-}
</description>
    <dc:creator>Aneesh Kumar K.V</dc:creator>
    <dc:date>2008-12-01T13:43:29</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10344">
    <title>[PATCH -V2 1/4] ext4: Use new buffer_head flag to check uninit group bitmaps initialization</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10344</link>
    <description>For uninit block group, the ondisk bitmap is not initialized. That implies
we cannot depend on the uptodate flag on the bitmap buffer_head to
find bitmap validity. Use a new buffer_head flag which would be set after
we properly initialize the bitmap. This also prevent the initializing
the uninit group bitmap initialization every time we do a
ext4_read_block_bitmap.

Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar&lt; at &gt;linux.vnet.ibm.com&gt;
---
 fs/ext4/balloc.c  |   25 +++++++++++++++++++++++--
 fs/ext4/ext4.h    |   18 ++++++++++++++++++
 fs/ext4/ialloc.c  |   24 ++++++++++++++++++++++--
 fs/ext4/mballoc.c |   24 ++++++++++++++++++++++--
 4 files changed, 85 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 3cae4c6..c00023f 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
&lt; at &gt;&lt; at &gt; -320,20 +320,41 &lt; at &gt;&lt; at &gt; ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
     block_group, bitmap_blk);
 return NULL;
 }
-if (buffer_uptodate(bh) &amp;&amp;
-    !(desc-&gt;bg_flags &amp; cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))
+
+if (bitmap_uptodate(bh))
 return bh;
 
 lock_buffer(bh);
+if (bitmap_uptodate(bh)) {
+unlock_buffer(bh);
+return bh;
+}
 spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
 if (desc-&gt;bg_flags &amp; cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
 ext4_init_block_bitmap(sb, bh, block_group, desc);
+set_bitmap_uptodate(bh);
 set_buffer_uptodate(bh);
 spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
 unlock_buffer(bh);
 return bh;
 }
 spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
+if (buffer_uptodate(bh)) {
+/*
+ * if not uninit if bh is uptodate,
+ * bitmap is also uptodate
+ */
+set_bitmap_uptodate(bh);
+unlock_buffer(bh);
+return bh;
+}
+/*
+ * submit the buffer_head for read. We can
+ * safely mark the bitmap as uptodate now.
+ * We do it here so the bitmap uptodate bit
+ * get set with buffer lock held.
+ */
+set_bitmap_uptodate(bh);
 if (bh_submit_read(bh) &lt; 0) {
 put_bh(bh);
 ext4_error(sb, __func__,
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index fc4148d..9039b8a 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
&lt; at &gt;&lt; at &gt; -19,6 +19,7 &lt; at &gt;&lt; at &gt;
 #include &lt;linux/types.h&gt;
 #include &lt;linux/blkdev.h&gt;
 #include &lt;linux/magic.h&gt;
+#include &lt;linux/jbd2.h&gt;
 #include "ext4_i.h"
 
 /*
&lt; at &gt;&lt; at &gt; -1360,6 +1361,23 &lt; at &gt;&lt; at &gt; extern int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode,
 extern int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 __u64 start, __u64 len);
 
+/*
+ * Add new method to test wether block and inode bitmaps are properly
+ * initialized. With uninit_bg reading the block from disk is not enough
+ * to mark the bitmap uptodate. We need to also zero-out the bitmap
+ */
+#define BH_BITMAP_UPTODATE BH_JBDPrivateStart
+
+static inline int bitmap_uptodate(struct buffer_head *bh)
+{
+return (buffer_uptodate(bh) &amp;&amp;
+test_bit(BH_BITMAP_UPTODATE, &amp;(bh)-&gt;b_state));
+}
+static inline void set_bitmap_uptodate(struct buffer_head *bh)
+{
+set_bit(BH_BITMAP_UPTODATE, &amp;(bh)-&gt;b_state);
+}
+
 #endif/* __KERNEL__ */
 
 #endif/* _EXT4_H */
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index d1ccae5..1627868 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
&lt; at &gt;&lt; at &gt; -115,20 +115,40 &lt; at &gt;&lt; at &gt; ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
     block_group, bitmap_blk);
 return NULL;
 }
-if (buffer_uptodate(bh) &amp;&amp;
-    !(desc-&gt;bg_flags &amp; cpu_to_le16(EXT4_BG_INODE_UNINIT)))
+if (bitmap_uptodate(bh))
 return bh;
 
 lock_buffer(bh);
+if (bitmap_uptodate(bh)) {
+unlock_buffer(bh);
+return bh;
+}
 spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
 if (desc-&gt;bg_flags &amp; cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
 ext4_init_inode_bitmap(sb, bh, block_group, desc);
+set_bitmap_uptodate(bh);
 set_buffer_uptodate(bh);
 spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
 unlock_buffer(bh);
 return bh;
 }
 spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
+if (buffer_uptodate(bh)) {
+/*
+ * if not uninit if bh is uptodate,
+ * bitmap is also uptodate
+ */
+set_bitmap_uptodate(bh);
+unlock_buffer(bh);
+return bh;
+}
+/*
+ * submit the buffer_head for read. We can
+ * safely mark the bitmap as uptodate now.
+ * We do it here so the bitmap uptodate bit
+ * get set with buffer lock held.
+ */
+set_bitmap_uptodate(bh);
 if (bh_submit_read(bh) &lt; 0) {
 put_bh(bh);
 ext4_error(sb, __func__,
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 259eef7..5d4105d 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
&lt; at &gt;&lt; at &gt; -794,22 +794,42 &lt; at &gt;&lt; at &gt; static int ext4_mb_init_cache(struct page *page, char *incore)
 if (bh[i] == NULL)
 goto out;
 
-if (buffer_uptodate(bh[i]) &amp;&amp;
-    !(desc-&gt;bg_flags &amp; cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))
+if (bitmap_uptodate(bh[i]))
 continue;
 
 lock_buffer(bh[i]);
+if (bitmap_uptodate(bh[i])) {
+unlock_buffer(bh[i]);
+continue;
+}
 spin_lock(sb_bgl_lock(EXT4_SB(sb), first_group + i));
 if (desc-&gt;bg_flags &amp; cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
 ext4_init_block_bitmap(sb, bh[i],
 first_group + i, desc);
+set_bitmap_uptodate(bh[i]);
 set_buffer_uptodate(bh[i]);
 spin_unlock(sb_bgl_lock(EXT4_SB(sb), first_group + i));
 unlock_buffer(bh[i]);
 continue;
 }
 spin_unlock(sb_bgl_lock(EXT4_SB(sb), first_group + i));
+if (buffer_uptodate(bh[i])) {
+/*
+ * if not uninit if bh is uptodate,
+ * bitmap is also uptodate
+ */
+set_bitmap_uptodate(bh[i]);
+unlock_buffer(bh[i]);
+continue;
+}
 get_bh(bh[i]);
+/*
+ * submit the buffer_head for read. We can
+ * safely mark the bitmap as uptodate now.
+ * We do it here so the bitmap uptodate bit
+ * get set with buffer lock held.
+ */
+set_bitmap_uptodate(bh[i]);
 bh[i]-&gt;b_end_io = end_buffer_read_sync;
 submit_bh(READ, bh[i]);
 mb_debug("read bitmap for group %u\n", first_group + i);
</description>
    <dc:creator>Aneesh Kumar K.V</dc:creator>
    <dc:date>2008-12-01T13:43:43</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10339">
    <title>[PATCH]ext4: fix s_dirty_blocks_counter if block allocation failed with nodelalloc</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10339</link>
    <description>ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc

From: Akira Fujita &lt;a-fujita&lt; at &gt;rs.jp.nec.com&gt;

If block allocation failed after marking claimed blocks as dirty blocks
with nodelalloc, we have to subtract these blocks from
s_dirty_blocks_counter in error handling.
Otherwise s_dirty_blocks_counter goes wrong so that
filesystem's free blocks decreases incorrectly.

This issue was reported as ext4 online defrag's bug by Li Zefan.
http://marc.info/?l=linux-ext4&amp;m=122697235715170&amp;w=2

Reported-by: Li Zefan &lt;lizf&lt; at &gt;cn.fujitsu.com&gt;
Signed-off-by: Akira Fujita &lt;a-fujita&lt; at &gt;rs.jp.nec.com&gt;
---
 mballoc.c |    9 +++++++++
 1 file changed, 9 insertions(+)
diff -X linux-2.6.28-rc6-ext4/Documentation/dontdiff -upNr linux-2.6.28-rc6-ext4/fs/ext4/mballoc.c linux-2.6.28-rc6-mballoc-fix/fs/ext4/mballoc.c
--- linux-2.6.28-rc6-ext4/fs/ext4/mballoc.c2008-12-01 11:44:28.000000000 +0900
+++ linux-2.6.28-rc6-mballoc-fix/fs/ext4/mballoc.c2008-12-01 12:04:06.000000000 +0900
&lt; at &gt;&lt; at &gt; -4495,12 +4495,18 &lt; at &gt;&lt; at &gt; ext4_fsblk_t ext4_mb_new_blocks(handle_t
 if (!ac) {
 ar-&gt;len = 0;
 *errp = -ENOMEM;
+if (!(ac-&gt;ac_flags &amp; EXT4_MB_DELALLOC_RESERVED))
+percpu_counter_sub(&amp;sbi-&gt;s_dirtyblocks_counter,
+reserv_blks);
 goto out1;
 }

 *errp = ext4_mb_initialize_context(ac, ar);
 if (*errp) {
 ar-&gt;len = 0;
+if (!(ac-&gt;ac_flags &amp; EXT4_MB_DELALLOC_RESERVED))
+percpu_counter_sub(&amp;sbi-&gt;s_dirtyblocks_counter,
+reserv_blks);
 goto out2;
 }

&lt; at &gt;&lt; at &gt; -4541,6 +4547,9 &lt; at &gt;&lt; at &gt; repeat:
 if (freed)
 goto repeat;
 *errp = -ENOSPC;
+if (!(ac-&gt;ac_flags &amp; EXT4_MB_DELALLOC_RESERVED))
+percpu_counter_sub(&amp;sbi-&gt;s_dirtyblocks_counter,
+reserv_blks);
 ac-&gt;ac_b_ex.fe_len = 0;
 ar-&gt;len = 0;
 ext4_mb_show_ac(ac);

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Akira Fujita</dc:creator>
    <dc:date>2008-12-01T10:21:54</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10338">
    <title>kernel BUG at fs/ext4/extents.c:ext4_ext_search_right()</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10338</link>
    <description>My test kernel is: 2.6.27
Hardware Environment: x86 or EM64T

I put stress on ext4 device by using the script.
After waiting some times, kernel brings up I/O error as follows:

RHEL5 kernel: EXT4-fs error (device hda3): ext4_ext_search_right: bad
header in inode #575041: unexpected eh_depth - magic f30a, entries 15,
max 84(0), depth 1(2)

If using "sync" command at the time, the command will not be finished.
The kernel message as follows:

INFO: task sync:16199 blocked for more than 120 seconds.et: Discarding 
datagram
"echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        d5bb2dc0 00000046 00000001 00000000 c04363d5 d5a6a360 d5a6a5b8 
c2498080
        00000001 c223c8e0 c223c8d0 00000246 ffffffff 00000046 00000000 
00000000
        00000000 f6598f10 00000000 f6598f18 c223c8d0 c048c021 c06544e8 
c048c01c
Call Trace:
  [&lt;c04363d5&gt;] prepare_to_wait+0x12/0x42
  [&lt;c048c021&gt;] inode_wait+0x5/0x8
  [&lt;c06544e8&gt;] __wait_on_bit+0x33/0x58
  [&lt;c048c01c&gt;] inode_wait+0x0/0x8
  [&lt;c0494ae6&gt;] __writeback_single_inode+0xd2/0x248
  [&lt;c0494e38&gt;] generic_sync_sb_inodes+0x1d/0x243
  [&lt;c04362fd&gt;] wake_bit_function+0x0/0x3c
  [&lt;c0494f9d&gt;] generic_sync_sb_inodes+0x182/0x243
  [&lt;c04950db&gt;] sync_inodes_sb+0x78/0x80
  [&lt;c0495136&gt;] __sync_inodes+0x53/0x97
  [&lt;c04975bc&gt;] do_sync+0x35/0x55
  [&lt;c04975e6&gt;] sys_sync+0xa/0x10
  [&lt;c0403839&gt;] sysenter_do_call+0x12/0x31
  =======================
INFO: lockdep is turned off.


The script as follows:
#!/bin/bash

echo -e "input the name of the block device(e.g. /dev/sdb1), it will be 
formatted:"
echo -n "&gt;&gt;&gt; Device name = "
read ext4stress_dev

mkfs.ext3 -b 1024 $ext4stress_dev
tune2fs -E test_fs -O extents $ext4stress_dev
mount -t ext4dev -o nomballoc,delalloc,data=writeback $ext4stress_dev /mnt

tar xvzf bonnie++-1.03d.gz
cd bonnie++-1.03d
./configure
make
make install
cd ..
cp -rf bonnie++-1.03d /mnt
cd /mnt/bonnie++-1.03d/
./bonnie++ -u root
cd ..


The "bonnie++-1.03d.gz" can be download by 
"http://www.coker.com.au/bonnie++/".



--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo&lt; at &gt;vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

</description>
    <dc:creator>Zhang Xiliang</dc:creator>
    <dc:date>2008-12-01T02:10:40</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10333">
    <title>EXT4 ENOSPC Bug</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10333</link>
    <description>Hi Ted, Hi Andreas, hi all,

On a testsystem (spare laptop) with ext4 as root filesystem sometimes the 
system starts to return ENOSPC to all write/create syscalls.
Sometimes the system runs without problems, at other times it starts having 
problems soon after boot.
A reboot resolves the problem temporarily.

I don't see  a specific usage triggering the problem.

Deleting some files sometimes allows the creation (just touch $unused_filename) 
of some files, but not many.

Anything I can do to help you to debug the problem?

Andres

Kernel is: 2.6.28-rc6-andres-00007-ged31348

# df /
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/root_crypt
                     302855628 110032848 192822780  37% /

# df -i /
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/mapper/root_crypt
                     19234816  786691 18448125    5% /


# tune2fs -l /dev/mapper/root_crypt 
tune2fs 1.41.3 (12-Oct-2008)
Filesystem volume name:   root
Last mounted on:          &lt;not available&gt;
Filesystem UUID:          b81f5bab-9544-45a9-8f15-aa7012ae2522
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype 
needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg 
dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              19234816
Block count:              76921099
Reserved block count:     0
Free blocks:              48205695
Free inodes:              18448125
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1005
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Tue Sep 30 11:57:33 2008
Last mount time:          Thu Nov 27 14:30:58 2008
Last write time:          Thu Nov 27 14:30:58 2008
Mount count:              116
Maximum mount count:      200
Last checked:             Mon Oct  6 13:54:09 2008
Check interval:           31104000 (12 months)
Next check after:         Thu Oct  1 13:54:09 2009
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:          256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
First orphan inode:       288694
Default directory hash:   tea
Directory Hash Seed:      d0501622-2c99-4b0d-9a0c-d6b6705ff4f2
Journal backup:           inode blocks

</description>
    <dc:creator>Andres Freund</dc:creator>
    <dc:date>2008-11-29T13:18:19</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.file-systems.ext4/10332">
    <title>[RFC][PATCH] "tune2fs -I 256" speedup</title>
    <link>http://comments.gmane.org/gmane.comp.file-systems.ext4/10332</link>
    <description>Hi,

After some on and off list discussion about the 'tune2fs -I'
performance, Theodore Tso pointed out that the blk_move_list used by
transalate_block() was less then optimal.

This is my attempt at fixing that.

Have fun
Andreas
</description>
    <dc:creator>Andreas Schultz</dc:creator>
    <dc:date>2008-11-29T10:54:19</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.comp.file-systems.ext4">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.comp.file-systems.ext4</link>
  </textinput>
</rdf:RDF>
