<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel about="http://blog.gmane.org/gmane.linux.kernel.device-mapper.devel">
    <title>gmane.linux.kernel.device-mapper.devel</title>
    <link>http://blog.gmane.org/gmane.linux.kernel.device-mapper.devel</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6889"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6887"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6879"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6871"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6870"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6869"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6868"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6867"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6860"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6857"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6855"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6846"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6845"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6843"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6840"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6831"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6830"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6829"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6828"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6827"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6889">
    <title>path_checker within multipath {}</title>
    <link>http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6889</link>
    <description>Am I missing something obvious here?

/usr/share/doc/device-mapper-multipath-0.4.7/multipath.conf.annotated
says this should work, and I can't find a RHEL 5 issue for it:

My config says:

multipaths {
        multipath {
                wwid "22263000155ada9e9"
                path_grouping_policy failover
                path_checker tur
        }
        multipath {
                wwid "36090a018108277e8ed92e4e9cee1733e"
                path_grouping_policy multibus
                path_checker readsector0
        }
}

but multipathd -k"show conf" says:

multipaths {
        multipath {
                wwid 22263000155ada9e9
                path_grouping_policy failover
        }
        multipath {
                wwid 36090a018108277e8ed92e4e9cee1733e
                path_grouping_policy multibus
        }

and multipath -v3 shows them all

sdb: path checker = readsector0 (config file default)

while the grouping policy seems to work, 

pgpolicy = multibus (LUN setting)

I'm sure I can use 'tur' on both multipaths via the defaults {} section
and be fine, but from the docs (even checked the RHEL 5.3 beta SRPM)
this should work.
</description>
    <dc:creator>Matthew Kent</dc:creator>
    <dc:date>2008-11-27T19:47:54</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6887">
    <title>Announce: unlimited number of shared snapshots</title>
    <link>http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6887</link>
    <description>Hi

I made first release of my snapshot storage that can hold unlimited number 
of snapshots and share data between them:

http://people.redhat.com/mpatocka/patches/kernel/new-snapshots/

How-to-use is at the beginning of dm-multisnapshot.patch

If you see some errors, report them.

This storage uses btree with the key (chunk number, snapshot range). It 
uses log-structured storage, so that it is safe w.r.t. crash.

TODO:
- reclaim allocated space, allow deleting snapshots
- writable snapshots

Mikulas

</description>
    <dc:creator>Mikulas Patocka</dc:creator>
    <dc:date>2008-11-27T05:41:36</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6879">
    <title>[PATCH 0/9] dm snapshot: shared exception store (v4)</title>
    <link>http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6879</link>
    <description>This is the fourth version of the shared exception store (based on
Zumastor code). You can find the previous submissions:

https://www.redhat.com/archives/dm-devel/2008-August/msg00003.html

https://www.redhat.com/archives/dm-devel/2008-August/msg00087.html

https://www.redhat.com/archives/dm-devel/2008-October/msg00159.html

Except for bug fixes, the major change since the previous version is
fixing the disk endian (Zumastor simply uses the CPU endian for the
disk format). The current code uses little endian for the disk format
(I just slightly tested it with ppc64. It needs more tests, of
course). I also modified the snapshot disk initialization code to
reserve chunks for the journaling mechanism. I've not implemented the
journaling mechanism but there is no need to change the disk format
for that later.

There are 9 patches and I modified only 7th patch (Zumastor shared
snapshot code) since the last submission. IOW, I have not change the
way to refactor the existing code. I'll think about how to integrate
this with the existing code cleaner again but I think that the
user-interface is far more important than how to integrate this with
the existing in-kernel code. We don't want to change the
user-interface after mainline mering while we can evolve (or clean up)
the in-kernel structures any time (and it's not hard at all).

For example, a way to handle the snapshots looks oK?

- create one snapshot (the id is 0):

vine:/home/fujita# dmsetup message work 0 snapshot create 0

- get the list of snapshots:

vine:/home/fujita# dmsetup status
work: 0 268430022 s-snap-origin 2 0 5

(this means we have two snapshots, their IDs are 0 and 5).

With Zumastor code, the maximum number of snapshots is 64 so the above
interface to get the list would work, but it might not work well for
an shared-exception-snapshot implementation that support tons of
snapshots (which someone might implement in the future).

Any opinions?


The patches are available (against the latest Linus git tree):

http://www.kernel.org/pub/linux/kernel/people/tomo/dm-snap/2008-11-26/

If you like a git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git dm-snap


The following is the same instruction as the previous submission.

=
In this example, /dev/sda1 is used as an origin device and
/dev/sdb1 is used as a cow device.

- first, set up an origin and a cow:

vine:/home/fujita# echo |dmesg create --notable work
vine:/home/fujita# echo 0 `blockdev --getsize /dev/sda1` s-snap-origin /dev/sda1 /dev/sdb1 S 16|dmsetup load work
vine:/home/fujita# dmsetup resume work


- confirm the current status:

vine:/home/fujita# dmsetup status
work: 0 268430022 s-snap-origin : no snapshot


- create one snapshot (the id is 0):

vine:/home/fujita# dmsetup message work 0 snapshot create 0


- see if the snapshot is created:

vine:/home/fujita# dmsetup status
work: 0 268430022 s-snap-origin 1 0


- create one more snapshot (the id is 5) and see the result:

vine:/home/fujita# dmsetup message work 0 snapshot create 5
vine:/home/fujita# dmsetup status
work: 0 268430022 s-snap-origin 2 0 5

Note that creating a snapshot just means that you store some
information on the cow device. You don't create a device for it yet.


- create a device to access to the snapshot #5

vine:/home/fujita# echo |dmsetup create --notable snap5
vine:/home/fujita# echo 0 `blockdev --getsize /dev/sda1` s-snap /dev/sda1 5|dmsetup load snap5
vine:/home/fujita# dmsetup resume snap5

- confirm the current status:

vine:/home/fujita# dmsetup status
snap5: 0 268430022 s-snap Unknown
work: 0 268430022 s-snap-origin 2 0 5


- remove the devices after you play with them:

vine:/home/fujita# dmsetup remove snap5
vine:/home/fujita# dmsetup remove work
vine:/home/fujita# dmsetup info
No devices found


- Again, setup the origin and the cow:

vine:/home/fujita# echo |dmsetup create --notable work
vine:/home/fujita# echo 0 `blockdev --getsize /dev/sda1` s-snap-origin /dev/sda1 /dev/sdb1 S 16|dmsetup load work
vine:/home/fujita# dmsetup resume work
vine:/home/fujita# dmsetup status
work: 0 268430022 s-snap-origin 2 0 5

Note that you have two snapshots that you created. You can play with
them again after creating devices for them.


- delete the 5th snapshot

vine:/home/fujita# dmsetup message work 0 snapshot delete 5
vine:/home/fujita# dmsetup status
work: 0 268430022 s-snap-origin 1 0


*) By default, dmsetup perfomrs multiple tasks. I uses dmsetup
commands that issue a single ioctl to make things clear.



</description>
    <dc:creator>FUJITA Tomonori</dc:creator>
    <dc:date>2008-11-26T16:17:30</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6871">
    <title>[RFC][PATCH 1/4] dm-log: fix io_client_destroy leak</title>
    <link>http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6871</link>
    <description>In create_log_context function, dm_io_client_destroy function needs
to be called, when memory allocation of disk_header, sync_bits and
recovering_bits failed, but dm_io_client_destroy is not called.
This patch fixes it.


Signed-off-by: Takahiro Yasui &lt;tyasui&lt; at &gt;redhat.com&gt;
---
 drivers/md/dm-log.c |    5 +++++
 1 file changed, 5 insertions(+)

Index: linux-2.6.28-rc4/drivers/md/dm-log.c
===================================================================
--- linux-2.6.28-rc4.orig/drivers/md/dm-log.c
+++ linux-2.6.28-rc4/drivers/md/dm-log.c
&lt; at &gt;&lt; at &gt; -467,6 +467,7 &lt; at &gt;&lt; at &gt; static int create_log_context(struct dm_
 lc-&gt;disk_header = vmalloc(buf_size);
 if (!lc-&gt;disk_header) {
 DMWARN("couldn't allocate disk log buffer");
+dm_io_client_destroy(lc-&gt;io_req.client);
 kfree(lc);
 return -ENOMEM;
 }
&lt; at &gt;&lt; at &gt; -483,6 +484,8 &lt; at &gt;&lt; at &gt; static int create_log_context(struct dm_
 if (!dev)
 vfree(lc-&gt;clean_bits);
 vfree(lc-&gt;disk_header);
+if (dev)
+dm_io_client_destroy(lc-&gt;io_req.client);
 kfree(lc);
 return -ENOMEM;
 }
&lt; at &gt;&lt; at &gt; -496,6 +499,8 &lt; at &gt;&lt; at &gt; static int create_log_context(struct dm_
 if (!dev)
 vfree(lc-&gt;clean_bits);
 vfree(lc-&gt;disk_header);
+if (dev)
+dm_io_client_destroy(lc-&gt;io_req.client);
 kfree(lc);
 return -ENOMEM;
 }

</description>
    <dc:creator>Takahiro Yasui</dc:creator>
    <dc:date>2008-11-26T00:01:37</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6870">
    <title>[RFC][PATCH 3/4] dm-log: support multiple log devices</title>
    <link>http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6870</link>
    <description>This patch introduces multiple log devices feature.

 * "log" member is added into log context to keep a device status and
   device info.

 * read_headers function reads log data from each log devices and check
   if they contain the same header values.

 * write_headers issues write I/O to all active log devices at the same
   time and check each result when all I/Os complete.

 * add parse_params to search "region size" in the parameter list.


Signed-off-by: Takahiro Yasui &lt;tyasui&lt; at &gt;redhat.com&gt;
---
 drivers/md/dm-log.c |  579 +++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 416 insertions(+), 163 deletions(-)

Index: linux-2.6.28-rc4/drivers/md/dm-log.c
===================================================================
--- linux-2.6.28-rc4.orig/drivers/md/dm-log.c
+++ linux-2.6.28-rc4/drivers/md/dm-log.c
&lt; at &gt;&lt; at &gt; -249,6 +249,15 &lt; at &gt;&lt; at &gt; struct log_header {
 sector_t nr_regions;
 };
 
+struct log {
+struct log_c *lc;
+int failed;
+
+struct dm_dev *dev;
+struct log_header header;
+struct dm_io_region header_location;
+};
+
 struct log_c {
 struct dm_target *ti;
 int touched;
&lt; at &gt;&lt; at &gt; -270,17 +279,19 &lt; at &gt;&lt; at &gt; struct log_c {
 FORCESYNC,/* Force a sync to happen */
 } sync;
 
-struct dm_io_request io_req;
-
 /*
  * Disk log fields
  */
-int log_dev_failed;
-struct dm_dev *log_dev;
-struct log_header header;
+unsigned int nr_logs;
 
-struct dm_io_region header_location;
 struct log_header *disk_header;
+struct dm_io_request io_req;
+
+unsigned int nr_active_logs;
+struct dm_io_region *io_regions;
+struct log **io_logs;/* index log array of io_regions */
+
+struct log log[0];
 };
 
 /*
&lt; at &gt;&lt; at &gt; -323,72 +334,236 &lt; at &gt;&lt; at &gt; static void header_from_disk(struct log_
 core-&gt;nr_regions = le64_to_cpu(disk-&gt;nr_regions);
 }
 
-static int read_header(struct log_c *log)
+static void update_io_regions(struct log_c *lc)
+{
+struct log *l;
+int count = 0;
+
+for (l = lc-&gt;log; l &lt; lc-&gt;log + lc-&gt;nr_logs; l++) {
+if (l-&gt;failed)
+continue;
+
+lc-&gt;io_regions[count] = l-&gt;header_location;
+lc-&gt;io_logs[count] = l;
+count++;
+}
+
+lc-&gt;nr_active_logs = count;
+}
+
+static void fail_log_device(struct log *l)
+{
+if (l-&gt;failed)
+return;
+
+l-&gt;failed = 1;
+dm_table_event(l-&gt;lc-&gt;ti-&gt;table);
+}
+
+static void fail_all_devices(struct log_c *lc)
 {
+struct log *l;
+
+for (l = lc-&gt;log; l &lt; lc-&gt;log + lc-&gt;nr_logs; l++)
+l-&gt;failed = 1;
+
+lc-&gt;nr_active_logs = 0;
+dm_table_event(lc-&gt;ti-&gt;table);
+}
+
+static int read_header(struct log *l)
+{
+struct log_c *lc = l-&gt;lc;
 int r;
 
-log-&gt;io_req.bi_rw = READ;
+lc-&gt;io_req.bi_rw = READ;
 
-r = dm_io(&amp;log-&gt;io_req, 1, &amp;log-&gt;header_location, NULL);
-if (r)
+r = dm_io(&amp;lc-&gt;io_req, 1, &amp;l-&gt;header_location, NULL);
+if (r) {
+DMWARN("Failed to read header on ditry "
+       "region log device, %s", l-&gt;dev-&gt;name);
+fail_log_device(l);
 return r;
+}
 
-header_from_disk(&amp;log-&gt;header, log-&gt;disk_header);
+header_from_disk(&amp;l-&gt;header, lc-&gt;disk_header);
 
 /* New log required? */
-if (log-&gt;sync != DEFAULTSYNC || log-&gt;header.magic != MIRROR_MAGIC) {
-log-&gt;header.magic = MIRROR_MAGIC;
-log-&gt;header.version = MIRROR_DISK_VERSION;
-log-&gt;header.nr_regions = 0;
+if (lc-&gt;sync != DEFAULTSYNC || l-&gt;header.magic != MIRROR_MAGIC) {
+l-&gt;header.magic = MIRROR_MAGIC;
+l-&gt;header.version = MIRROR_DISK_VERSION;
+l-&gt;header.nr_regions = 0;
 }
 
 #ifdef __LITTLE_ENDIAN
-if (log-&gt;header.version == 1)
-log-&gt;header.version = 2;
+if (l-&gt;header.version == 1)
+l-&gt;header.version = 2;
 #endif
 
-if (log-&gt;header.version != MIRROR_DISK_VERSION) {
+if (l-&gt;header.version != MIRROR_DISK_VERSION) {
 DMWARN("incompatible disk log version");
+fail_log_device(l);
 return -EINVAL;
 }
 
 return 0;
 }
 
-static inline int write_header(struct log_c *log)
+/*
+ * read_headers
+ *
+ *   Issue read I/Os sequentially and check their contents.
+ *
+ *   return value:
+ *     nr_regions ... the number of region stored on log disks
+ *     -EIO       ... all read I/Os failed and no active log exists
+ *     -EINVAL    ... header data are not consistent among logs
+ */
+static int read_headers(struct log_c *lc)
 {
-log-&gt;io_req.bi_rw = WRITE;
-return dm_io(&amp;log-&gt;io_req, 1, &amp;log-&gt;header_location, NULL);
+struct log *l;
+sector_t nr_regions = 0;
+int active_logs = 0;
+
+/*
+ * read all log headers
+ *
+ * Read shoud be done sequentially, since one buffer is
+ * shared by all logs.
+ */
+for (l = lc-&gt;log; l &lt; lc-&gt;log + lc-&gt;nr_logs; l++)
+if (!l-&gt;failed &amp;&amp; !read_header(l))
+active_logs++;
+
+if (!active_logs) {
+DMWARN("All read I/Os to log disks failed.\n");
+fail_all_devices(lc);
+return -EIO;
+}
+
+/*
+ * check consistency of log headers
+ */
+for (l = lc-&gt;log; l &lt; lc-&gt;log + lc-&gt;nr_logs; l++) {
+if (l-&gt;failed || !l-&gt;header.nr_regions)
+continue;
+
+if (!nr_regions) {
+nr_regions = l-&gt;header.nr_regions;
+continue;
+}
+
+if (l-&gt;header.nr_regions != nr_regions) {
+DMWARN("log %s has inconsistent region counts %ld"
+       " (expected %ld)", l-&gt;dev-&gt;name,
+       l-&gt;header.nr_regions, nr_regions);
+fail_all_devices(lc);
+return -EINVAL;
+}
+}
+
+/*
+ * Refresh log contents, since current data might contain
+ * data on a new log disk which does not have valid log data.
+ */
+if (active_logs &gt; 1) {
+for (l = lc-&gt;log; l &lt; lc-&gt;log + lc-&gt;nr_logs; l++) {
+if (l-&gt;failed || !l-&gt;header.nr_regions)
+continue;
+if (!read_header(l))
+break;
+}
+
+if (unlikely(l == lc-&gt;log + lc-&gt;nr_logs))
+nr_regions = 0;
+
+/* initialize new log headers */
+for (l = lc-&gt;log; l &lt; lc-&gt;log + lc-&gt;nr_logs; l++)
+if (!l-&gt;failed)
+l-&gt;header.nr_regions = nr_regions;
+}
+
+update_io_regions(lc);
+
+return nr_regions;
 }
 
-/*----------------------------------------------------------------
- * core log constructor/destructor
+/*
+ * write_headers
  *
- * argv contains region_size followed optionally by [no]sync
- *--------------------------------------------------------------*/
+ *   Issue write I/Os to all active logs and return 0 if at lease
+ *   one log has scceeded its I/O, othersize (no active logs)
+ *   returns a return value of dm_io function.
+ */
+static int write_headers(struct log_c *lc)
+{
+unsigned long error;
+int i, r;
+
+lc-&gt;io_req.bi_rw = WRITE;
+
+r = dm_io(&amp;lc-&gt;io_req, lc-&gt;nr_active_logs, lc-&gt;io_regions,
+  &amp;error);
+if (r) {
+/* check error devices and disable them */
+for (i = 0; i &lt; lc-&gt;nr_active_logs; i++)
+if (test_bit(i, &amp;error))
+fail_log_device(lc-&gt;io_logs[i]);
+
+update_io_regions(lc);
+
+if (!lc-&gt;nr_active_logs)
+return r;
+}
+
+return 0;
+}
+
 #define BYTE_SHIFT 3
-static int create_log_context(struct dm_dirty_log *log, struct dm_target *ti,
-      unsigned int argc, char **argv,
-      struct dm_dev *dev)
+static inline size_t log_bitset_size(struct log_c *lc)
 {
-enum sync sync = DEFAULTSYNC;
+return dm_round_up(lc-&gt;region_count,
+   sizeof(*lc-&gt;clean_bits) &lt;&lt; BYTE_SHIFT)
+&gt;&gt; BYTE_SHIFT;
+}
 
-struct log_c *lc;
-uint32_t region_size;
-unsigned int region_count;
-size_t bitset_size, buf_size;
-int r;
+static size_t log_buffer_size(struct log_c *lc)
+{
+/* Buffer holds both header and bitset. */
+return dm_round_up((LOG_OFFSET &lt;&lt; SECTOR_SHIFT) +
+   log_bitset_size(lc),
+   lc-&gt;ti-&gt;limits.hardsect_size);
+}
 
+static int parse_params(unsigned int argc, char **argv,
+uint32_t *region_size, enum sync *sync)
+{
+/*
+ * check number of parameters
+ */
 if (argc &lt; 1 || argc &gt; 2) {
 DMWARN("wrong number of arguments to dirty region log");
 return -EINVAL;
 }
 
+/*
+ * get region size
+ */
+if (sscanf(argv[0], "%u", region_size) != 1) {
+DMWARN("invalid region size string to dirty region log");
+return -EINVAL;
+}
+
+/*
+ * get sync option
+ */
+*sync = DEFAULTSYNC;
+
 if (argc &gt; 1) {
 if (!strcmp(argv[1], "sync"))
-sync = FORCESYNC;
+*sync = FORCESYNC;
 else if (!strcmp(argv[1], "nosync"))
-sync = NOSYNC;
+*sync = NOSYNC;
 else {
 DMWARN("unrecognised sync argument to "
        "dirty region log: %s", argv[1]);
&lt; at &gt;&lt; at &gt; -396,113 +571,180 &lt; at &gt;&lt; at &gt; static int create_log_context(struct dm_
 }
 }
 
-if (sscanf(argv[0], "%u", &amp;region_size) != 1) {
-DMWARN("invalid region size string");
-return -EINVAL;
-}
+return 0;
+}
 
-region_count = dm_sector_div_up(ti-&gt;len, region_size);
+static struct log_c *create_log_context(struct dm_target *ti,
+unsigned int nr_logs,
+uint32_t region_size,
+enum sync sync)
+{
+struct log_c *lc;
+size_t len;
+
+len = sizeof(*lc) + sizeof(lc-&gt;log[0]) * nr_logs;
 
-lc = kmalloc(sizeof(*lc), GFP_KERNEL);
+lc = kzalloc(len, GFP_KERNEL);
 if (!lc) {
-DMWARN("couldn't allocate core log");
-return -ENOMEM;
+DMWARN("couldn't allocate log context");
+return NULL;
 }
 
 lc-&gt;ti = ti;
 lc-&gt;touched = 0;
 lc-&gt;region_size = region_size;
-lc-&gt;region_count = region_count;
+lc-&gt;region_count = dm_sector_div_up(ti-&gt;len, region_size);
+
 lc-&gt;sync = sync;
+lc-&gt;nr_logs = nr_logs;
+
+return lc;
+}
+
+static void destroy_log_context(struct log_c *lc)
+{
+vfree(lc-&gt;recovering_bits);
+vfree(lc-&gt;sync_bits);
+vfree(lc-&gt;clean_bits);
+kfree(lc);
+}
+
+static void destroy_log_devices(struct log_c *lc)
+{
+struct log *l;
+
+kfree(lc-&gt;io_logs);
+kfree(lc-&gt;io_regions);
+
+if (lc-&gt;io_req.client)
+dm_io_client_destroy(lc-&gt;io_req.client);
+
+vfree(lc-&gt;disk_header);
+lc-&gt;clean_bits = NULL;
+
+for (l = lc-&gt;log; l &lt; lc-&gt;log + lc-&gt;nr_logs; l++)
+dm_put_device(l-&gt;lc-&gt;ti, l-&gt;dev);
+}
+
+static int create_log_devices(struct log_c *lc, char **dev)
+{
+struct log *l;
+size_t buf_size = 0;
+int r;
 
 /*
- * Work out how many "unsigned long"s we need to hold the bitset.
+ * setup each log device
+ */
+for (l = lc-&gt;log; l &lt; lc-&gt;log + lc-&gt;nr_logs; l++, dev++) {
+r = dm_get_device(lc-&gt;ti, dev[0], 0, 0 /* FIXME */,
+  FMODE_READ | FMODE_WRITE, &amp;l-&gt;dev);
+if (r) {
+lc-&gt;ti-&gt;error = "Device lookup failure";
+
+while (--l &gt;= lc-&gt;log)
+dm_put_device(l-&gt;lc-&gt;ti, l-&gt;dev);
+
+return r;
+}
+
+if (!buf_size)
+buf_size = log_buffer_size(lc);
+
+l-&gt;lc = lc;
+l-&gt;failed = 0;
+
+l-&gt;header.magic = 0;
+l-&gt;header.version = 0;
+l-&gt;header.nr_regions = 0;
+
+l-&gt;header_location.bdev = l-&gt;dev-&gt;bdev;
+l-&gt;header_location.sector = 0;
+l-&gt;header_location.count = buf_size &gt;&gt; SECTOR_SHIFT;
+}
+
+/*
+ * setup common info
  */
-bitset_size = dm_round_up(region_count,
-  sizeof(*lc-&gt;clean_bits) &lt;&lt; BYTE_SHIFT);
-bitset_size &gt;&gt;= BYTE_SHIFT;
+lc-&gt;nr_active_logs = lc-&gt;nr_logs;
+
+lc-&gt;disk_header = vmalloc(buf_size);
+if (!lc-&gt;disk_header) {
+DMWARN("couldn't allocate disk log buffer");
+destroy_log_devices(lc);
+return -ENOMEM;
+}
+
+lc-&gt;io_req.mem.type = DM_IO_VMA;
+lc-&gt;io_req.mem.ptr.vma = lc-&gt;disk_header;
+lc-&gt;io_req.notify.fn = NULL;
+lc-&gt;io_req.client = dm_io_client_create(dm_div_up(buf_size,
+  PAGE_SIZE));
+if (IS_ERR(lc-&gt;io_req.client)) {
+r = PTR_ERR(lc-&gt;io_req.client);
+DMWARN("couldn't allocate disk io client");
+destroy_log_devices(lc);
+return -ENOMEM;
+}
+
+lc-&gt;io_regions = kmalloc(sizeof(*lc-&gt;io_regions) * lc-&gt;nr_logs,
+ GFP_KERNEL);
+if (!lc-&gt;io_regions) {
+DMWARN("couldn't allocate I/O regions");
+destroy_log_devices(lc);
+return -ENOMEM;
+}
 
+lc-&gt;io_logs = kmalloc(sizeof(*lc-&gt;io_logs) * lc-&gt;nr_logs,
+      GFP_KERNEL);
+if (!lc-&gt;io_logs) {
+DMWARN("couldn't allocate I/O region index log array");
+destroy_log_devices(lc);
+return -ENOMEM;
+}
+
+return 0;
+}
+
+static int setup_log_bitmaps(struct log_c *lc)
+{
+size_t bitset_size;
+
+/*
+ * Work out how many "unsigned long"s we need to hold the bitset.
+ */
+bitset_size = log_bitset_size(lc);
 lc-&gt;bitset_uint32_count = bitset_size / sizeof(*lc-&gt;clean_bits);
 
 /*
  * Disk log?
  */
-if (!dev) {
+if (!lc-&gt;nr_logs) {
 lc-&gt;clean_bits = vmalloc(bitset_size);
 if (!lc-&gt;clean_bits) {
 DMWARN("couldn't allocate clean bitset");
-kfree(lc);
-return -ENOMEM;
-}
-lc-&gt;disk_header = NULL;
-} else {
-lc-&gt;log_dev = dev;
-lc-&gt;log_dev_failed = 0;
-lc-&gt;header_location.bdev = lc-&gt;log_dev-&gt;bdev;
-lc-&gt;header_location.sector = 0;
-
-/*
- * Buffer holds both header and bitset.
- */
-buf_size = dm_round_up((LOG_OFFSET &lt;&lt; SECTOR_SHIFT) +
-       bitset_size, ti-&gt;limits.hardsect_size);
-lc-&gt;header_location.count = buf_size &gt;&gt; SECTOR_SHIFT;
-
-lc-&gt;io_req.mem.type = DM_IO_VMA;
-lc-&gt;io_req.mem.ptr.vma = lc-&gt;disk_header;
-lc-&gt;io_req.notify.fn = NULL;
-lc-&gt;io_req.client = dm_io_client_create(dm_div_up(buf_size,
-   PAGE_SIZE));
-if (IS_ERR(lc-&gt;io_req.client)) {
-r = PTR_ERR(lc-&gt;io_req.client);
-DMWARN("couldn't allocate disk io client");
-kfree(lc);
 return -ENOMEM;
 }
-
-lc-&gt;disk_header = vmalloc(buf_size);
-if (!lc-&gt;disk_header) {
-DMWARN("couldn't allocate disk log buffer");
-dm_io_client_destroy(lc-&gt;io_req.client);
-kfree(lc);
-return -ENOMEM;
-}
-
+} else
 lc-&gt;clean_bits = (void *)lc-&gt;disk_header +
  (LOG_OFFSET &lt;&lt; SECTOR_SHIFT);
-}
 
 memset(lc-&gt;clean_bits, -1, bitset_size);
 
 lc-&gt;sync_bits = vmalloc(bitset_size);
 if (!lc-&gt;sync_bits) {
 DMWARN("couldn't allocate sync bitset");
-if (!dev)
-vfree(lc-&gt;clean_bits);
-vfree(lc-&gt;disk_header);
-if (dev)
-dm_io_client_destroy(lc-&gt;io_req.client);
-kfree(lc);
 return -ENOMEM;
 }
-memset(lc-&gt;sync_bits, (sync == NOSYNC) ? -1 : 0, bitset_size);
-lc-&gt;sync_count = (sync == NOSYNC) ? region_count : 0;
+memset(lc-&gt;sync_bits, (lc-&gt;sync == NOSYNC) ? -1 : 0, bitset_size);
+lc-&gt;sync_count = (lc-&gt;sync == NOSYNC) ? lc-&gt;region_count : 0;
 
 lc-&gt;recovering_bits = vmalloc(bitset_size);
 if (!lc-&gt;recovering_bits) {
 DMWARN("couldn't allocate sync bitset");
-vfree(lc-&gt;sync_bits);
-if (!dev)
-vfree(lc-&gt;clean_bits);
-vfree(lc-&gt;disk_header);
-if (dev)
-dm_io_client_destroy(lc-&gt;io_req.client);
-kfree(lc);
 return -ENOMEM;
 }
 memset(lc-&gt;recovering_bits, 0, bitset_size);
 lc-&gt;sync_search = 0;
-log-&gt;context = lc;
 
 return 0;
 }
&lt; at &gt;&lt; at &gt; -510,51 +752,78 &lt; at &gt;&lt; at &gt; static int create_log_context(struct dm_
 static int core_ctr(struct dm_dirty_log *log, struct dm_target *ti,
     unsigned int argc, char **argv)
 {
-return create_log_context(log, ti, argc, argv, NULL);
-}
+struct log_c *lc;
+uint32_t region_size;
+enum sync sync;
+int r;
 
-static void destroy_log_context(struct log_c *lc)
-{
-vfree(lc-&gt;sync_bits);
-vfree(lc-&gt;recovering_bits);
-kfree(lc);
+r = parse_params(argc, argv, &amp;region_size, &amp;sync);
+if (r)
+return r;
+
+lc = create_log_context(ti, 0, region_size, sync);
+if (!lc)
+return -ENOMEM;
+
+r = setup_log_bitmaps(lc);
+if (r) {
+destroy_log_context(lc);
+return r;
+}
+
+log-&gt;context = lc;
+
+return 0;
 }
 
 static void core_dtr(struct dm_dirty_log *log)
 {
 struct log_c *lc = (struct log_c *) log-&gt;context;
 
-vfree(lc-&gt;clean_bits);
 destroy_log_context(lc);
 }
 
 /*----------------------------------------------------------------
- * disk log constructor/destructor
+ * disks log constructor/destructor
  *
  * argv contains log_device region_size followed optionally by [no]sync
  *--------------------------------------------------------------*/
 static int disk_ctr(struct dm_dirty_log *log, struct dm_target *ti,
     unsigned int argc, char **argv)
 {
+struct log_c *lc;
+uint32_t region_size;
+enum sync sync;
 int r;
-struct dm_dev *dev;
 
-if (argc &lt; 2 || argc &gt; 3) {
-DMWARN("wrong number of arguments to disk dirty region log");
+if (!argc) {
+DMWARN("wrong number of arguments to dirty region log");
 return -EINVAL;
 }
 
-r = dm_get_device(ti, argv[0], 0, 0 /* FIXME */,
-  FMODE_READ | FMODE_WRITE, &amp;dev);
+r = parse_params(argc-1, argv+1, &amp;region_size, &amp;sync);
 if (r)
 return r;
 
-r = create_log_context(log, ti, argc - 1, argv + 1, dev);
+lc = create_log_context(ti, 1, region_size, sync);
+if (!lc)
+return -ENOMEM;
+
+r = create_log_devices(lc, argv);
+if (r) {
+destroy_log_context(lc);
+return r;
+}
+
+r = setup_log_bitmaps(lc);
 if (r) {
-dm_put_device(ti, dev);
+destroy_log_devices(lc);
+destroy_log_context(lc);
 return r;
 }
 
+log-&gt;context = lc;
+
 return 0;
 }
 
&lt; at &gt;&lt; at &gt; -562,9 +831,7 &lt; at &gt;&lt; at &gt; static void disk_dtr(struct dm_dirty_log
 {
 struct log_c *lc = (struct log_c *) log-&gt;context;
 
-dm_put_device(lc-&gt;ti, lc-&gt;log_dev);
-vfree(lc-&gt;disk_header);
-dm_io_client_destroy(lc-&gt;io_req.client);
+destroy_log_devices(lc);
 destroy_log_context(lc);
 }
 
&lt; at &gt;&lt; at &gt; -578,45 +845,31 &lt; at &gt;&lt; at &gt; static int count_bits32(uint32_t *addr, 
 return count;
 }
 
-static void fail_log_device(struct log_c *lc)
-{
-if (lc-&gt;log_dev_failed)
-return;
-
-lc-&gt;log_dev_failed = 1;
-dm_table_event(lc-&gt;ti-&gt;table);
-}
-
 static int disk_resume(struct dm_dirty_log *log)
 {
-int r;
+int r = 0;
 unsigned i;
 struct log_c *lc = (struct log_c *) log-&gt;context;
+struct log *l;
 size_t size = lc-&gt;bitset_uint32_count * sizeof(uint32_t);
+unsigned int nr_regions = 0;
 
-/* read the disk header */
-r = read_header(lc);
-if (r) {
-DMWARN("%s: Failed to read header on dirty region log device",
-       lc-&gt;log_dev-&gt;name);
-fail_log_device(lc);
-/*
- * If the log device cannot be read, we must assume
- * all regions are out-of-sync.  If we simply return
- * here, the state will be uninitialized and could
- * lead us to return 'in-sync' status for regions
- * that are actually 'out-of-sync'.
- */
-lc-&gt;header.nr_regions = 0;
+if (lc-&gt;nr_active_logs) {
+r = read_headers(lc);
+if (r &lt; 0) {
+DMWARN("Failed to read dirty region log");
+nr_regions = 0;
+} else
+nr_regions = r;
 }
 
 /* set or clear any new bits -- device has grown */
 if (lc-&gt;sync == NOSYNC)
-for (i = lc-&gt;header.nr_regions; i &lt; lc-&gt;region_count; i++)
+for (i = nr_regions; i &lt; lc-&gt;region_count; i++)
 /* FIXME: amazingly inefficient */
 log_set_bit(lc, lc-&gt;clean_bits, i);
 else
-for (i = lc-&gt;header.nr_regions; i &lt; lc-&gt;region_count; i++)
+for (i = nr_regions; i &lt; lc-&gt;region_count; i++)
 /* FIXME: amazingly inefficient */
 log_clear_bit(lc, lc-&gt;clean_bits, i);
 
&lt; at &gt;&lt; at &gt; -630,17 +883,17 &lt; at &gt;&lt; at &gt; static int disk_resume(struct dm_dirty_l
 lc-&gt;sync_search = 0;
 
 /* set the correct number of regions in the header */
-lc-&gt;header.nr_regions = lc-&gt;region_count;
-
-/* update disk headers */
-header_to_disk(&amp;lc-&gt;header, lc-&gt;disk_header);
+for (l = lc-&gt;log; l &lt; lc-&gt;log + lc-&gt;nr_logs; l++)
+l-&gt;header.nr_regions = lc-&gt;region_count;
 
-/* write the new header */
-r = write_header(lc);
-if (r) {
-DMWARN("%s: Failed to write header on dirty region log device",
-       lc-&gt;log_dev-&gt;name);
-fail_log_device(lc);
+if (lc-&gt;nr_active_logs) {
+/* update disk headers */
+header_to_disk(&amp;lc-&gt;io_logs[0]-&gt;header, lc-&gt;disk_header);
+
+/* write the new header */
+r = write_headers(lc);
+if (r)
+DMWARN("Failed to write dirty region log");
 }
 
 return r;
&lt; at &gt;&lt; at &gt; -683,12 +936,12 &lt; at &gt;&lt; at &gt; static int disk_flush(struct dm_dirty_lo
 struct log_c *lc = (struct log_c *) log-&gt;context;
 
 /* only write if the log has changed */
-if (!lc-&gt;touched)
+if (!lc-&gt;touched || !lc-&gt;nr_active_logs)
 return 0;
 
-r = write_header(lc);
+r = write_headers(lc);
 if (r)
-fail_log_device(lc);
+DMWARN("Failed to write dirty region log");
 else
 lc-&gt;touched = 0;
 
&lt; at &gt;&lt; at &gt; -784,13 +1037,13 &lt; at &gt;&lt; at &gt; static int disk_status(struct dm_dirty_l
 
 switch(status) {
 case STATUSTYPE_INFO:
-DMEMIT("3 %s %s %c", log-&gt;type-&gt;name, lc-&gt;log_dev-&gt;name,
-       lc-&gt;log_dev_failed ? 'D' : 'A');
+DMEMIT("3 %s %s %c", log-&gt;type-&gt;name, lc-&gt;log[0].dev-&gt;name,
+       lc-&gt;log[0].failed ? 'D' : 'A');
 break;
 
 case STATUSTYPE_TABLE:
 DMEMIT("%s %u %s %u ", log-&gt;type-&gt;name,
-       lc-&gt;sync == DEFAULTSYNC ? 2 : 3, lc-&gt;log_dev-&gt;name,
+       lc-&gt;sync == DEFAULTSYNC ? 2 : 3, lc-&gt;log[0].dev-&gt;name,
        lc-&gt;region_size);
 DMEMIT_SYNC;
 }

</description>
    <dc:creator>Takahiro Yasui</dc:creator>
    <dc:date>2008-11-26T00:01:41</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6869">
    <title>[RFC][PATCH 4/4] dm-log: interface update for multiplelog devices</title>
    <link>http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6869</link>
    <description>This patch introduces a new interface for multi-log devices. disk log
has two interfaces, constructor and status, and disk_ctr function and
disk_status functions are extended to handle multiple log device paths.

 - Current interface

    disk_ctr():
        log_path region_size [[no]sync]

    disk_status() - STATUSTYPE_INFO:
        nr_params disk log_path log_status:"A" or "D"

    disk_status() - STATUSTYPE_TABLE:
        disk nr_params log_path region_size [[no]sync]


 - Proposed interface

    disk_ctr():
        [log_path]{1,} region_size [[no]sync]

    disk_status() - STATUSTYPE_INFO:
        nr_params disk [log_path]{1,} [log_status:"A" or "D"]{1,}

    disk_status() - STATUSTYPE_TABLE:
        disk nr_params [log_path]{1,} region_size [[no]sync]


Signed-off-by: Takahiro Yasui &lt;tyasui&lt; at &gt;redhat.com&gt;
---
 drivers/md/dm-log.c   |   75 ++++++++++++++++++++++++++++++++++++++------------
 drivers/md/dm-raid1.c |    2 -
 2 files changed, 59 insertions(+), 18 deletions(-)

Index: linux-2.6.28-rc4/drivers/md/dm-log.c
===================================================================
--- linux-2.6.28-rc4.orig/drivers/md/dm-log.c
+++ linux-2.6.28-rc4/drivers/md/dm-log.c
&lt; at &gt;&lt; at &gt; -11,7 +11,7 &lt; at &gt;&lt; at &gt;
 #include &lt;linux/vmalloc.h&gt;
 #include &lt;linux/dm-io.h&gt;
 #include &lt;linux/dm-dirty-log.h&gt;
-
+#include &lt;linux/dm-kcopyd.h&gt;
 #include &lt;linux/device-mapper.h&gt;
 
 #define DM_MSG_PREFIX "dirty region log"
&lt; at &gt;&lt; at &gt; -536,8 +536,40 &lt; at &gt;&lt; at &gt; static size_t log_buffer_size(struct log
 }
 
 static int parse_params(unsigned int argc, char **argv,
-uint32_t *region_size, enum sync *sync)
+unsigned int *nr_logs, uint32_t *region_size,
+enum sync *sync)
 {
+unsigned long tmp_region;
+int i;
+
+if (argc &amp;&amp; nr_logs) {
+/*
+ * case of disk log
+ *
+ * find region_size param from the end as a number string
+ * and get the number of log devices
+ */
+for (i = argc - 1; i &gt;= 0; i--)
+if (!strict_strtoul(argv[i], 10, &amp;tmp_region))
+break;
+
+if (i &lt; 0) {
+DMWARN("invalid region size to dirty region log");
+return -EINVAL;
+}
+
+/* limit the same number as mirror disk */
+if (i &lt; 1 || i &gt; DM_KCOPYD_MAX_REGIONS + 1) {
+DMWARN("Invalid number of dirty region log devices");
+return -EINVAL;
+}
+
+*nr_logs = i;
+
+argc -= i;
+argv += i;
+}
+
 /*
  * check number of parameters
  */
&lt; at &gt;&lt; at &gt; -757,7 +789,7 &lt; at &gt;&lt; at &gt; static int core_ctr(struct dm_dirty_log 
 enum sync sync;
 int r;
 
-r = parse_params(argc, argv, &amp;region_size, &amp;sync);
+r = parse_params(argc, argv, NULL, &amp;region_size, &amp;sync);
 if (r)
 return r;
 
&lt; at &gt;&lt; at &gt; -786,26 +818,22 &lt; at &gt;&lt; at &gt; static void core_dtr(struct dm_dirty_log
 /*----------------------------------------------------------------
  * disks log constructor/destructor
  *
- * argv contains log_device region_size followed optionally by [no]sync
+ * argv contains [log_path]{1,} region_size [[no]sync]
  *--------------------------------------------------------------*/
 static int disk_ctr(struct dm_dirty_log *log, struct dm_target *ti,
     unsigned int argc, char **argv)
 {
 struct log_c *lc;
+unsigned int nr_logs;
 uint32_t region_size;
 enum sync sync;
 int r;
 
-if (!argc) {
-DMWARN("wrong number of arguments to dirty region log");
-return -EINVAL;
-}
-
-r = parse_params(argc-1, argv+1, &amp;region_size, &amp;sync);
+r = parse_params(argc, argv, &amp;nr_logs, &amp;region_size, &amp;sync);
 if (r)
 return r;
 
-lc = create_log_context(ti, 1, region_size, sync);
+lc = create_log_context(ti, nr_logs, region_size, sync);
 if (!lc)
 return -ENOMEM;
 
&lt; at &gt;&lt; at &gt; -1032,19 +1060,32 &lt; at &gt;&lt; at &gt; static int core_status(struct dm_dirty_l
 static int disk_status(struct dm_dirty_log *log, status_type_t status,
        char *result, unsigned int maxlen)
 {
-int sz = 0;
 struct log_c *lc = log-&gt;context;
+char buffer[lc-&gt;nr_logs + 1];
+unsigned int i;
+int sz = 0;
+int params;
 
 switch(status) {
 case STATUSTYPE_INFO:
-DMEMIT("3 %s %s %c", log-&gt;type-&gt;name, lc-&gt;log[0].dev-&gt;name,
-       lc-&gt;log[0].failed ? 'D' : 'A');
+DMEMIT("%d %s ", lc-&gt;nr_logs + 2, log-&gt;type-&gt;name);
+for (i = 0; i &lt; lc-&gt;nr_logs; i++) {
+DMEMIT("%s ", lc-&gt;log[i].dev-&gt;name);
+buffer[i] = lc-&gt;log[i].failed ? 'D' : 'A';
+}
+buffer[i] = '\0';
+DMEMIT("%s", buffer);
 break;
 
 case STATUSTYPE_TABLE:
-DMEMIT("%s %u %s %u ", log-&gt;type-&gt;name,
-       lc-&gt;sync == DEFAULTSYNC ? 2 : 3, lc-&gt;log[0].dev-&gt;name,
-       lc-&gt;region_size);
+params = lc-&gt;nr_logs + 1;
+params += (lc-&gt;sync == DEFAULTSYNC) ? 0 : 1;
+DMEMIT("%s %d ", log-&gt;type-&gt;name, params);
+
+for (i = 0; i &lt; lc-&gt;nr_logs; i++)
+DMEMIT("%s ", lc-&gt;log[i].dev-&gt;name);
+
+DMEMIT("%u ", lc-&gt;region_size);
 DMEMIT_SYNC;
 }
 
Index: linux-2.6.28-rc4/drivers/md/dm-raid1.c
===================================================================
--- linux-2.6.28-rc4.orig/drivers/md/dm-raid1.c
+++ linux-2.6.28-rc4/drivers/md/dm-raid1.c
&lt; at &gt;&lt; at &gt; -926,7 +926,7 &lt; at &gt;&lt; at &gt; static int parse_features(struct mirror_
  * [#features &lt;features&gt;]
  *
  * log_type is "core" or "disk"
- * #log_params is between 1 and 3
+ * #log_params is more than 0
  *
  * If present, features must be "handle_errors".
  */

</description>
    <dc:creator>Takahiro Yasui</dc:creator>
    <dc:date>2008-11-26T00:01:43</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6868">
    <title>[RFC][PATCH 2/4] dm-log: unify rw_header toread/write_header</title>
    <link>http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6868</link>
    <description>rw_header function updates three members of io_req data every time
when I/O is processed. bi_rw and notify.fn are never modified once
they get initialized, and also they can be set in advance.
This patch removes unnecessary update operations.


Signed-off-by: Takahiro Yasui &lt;tyasui&lt; at &gt;redhat.com&gt;
---
 drivers/md/dm-log.c |   23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

Index: linux-2.6.28-rc4/drivers/md/dm-log.c
===================================================================
--- linux-2.6.28-rc4.orig/drivers/md/dm-log.c
+++ linux-2.6.28-rc4/drivers/md/dm-log.c
&lt; at &gt;&lt; at &gt; -323,20 +323,13 &lt; at &gt;&lt; at &gt; static void header_from_disk(struct log_
 core-&gt;nr_regions = le64_to_cpu(disk-&gt;nr_regions);
 }
 
-static int rw_header(struct log_c *lc, int rw)
-{
-lc-&gt;io_req.bi_rw = rw;
-lc-&gt;io_req.mem.ptr.vma = lc-&gt;disk_header;
-lc-&gt;io_req.notify.fn = NULL;
-
-return dm_io(&amp;lc-&gt;io_req, 1, &amp;lc-&gt;header_location, NULL);
-}
-
 static int read_header(struct log_c *log)
 {
 int r;
 
-r = rw_header(log, READ);
+log-&gt;io_req.bi_rw = READ;
+
+r = dm_io(&amp;log-&gt;io_req, 1, &amp;log-&gt;header_location, NULL);
 if (r)
 return r;
 
&lt; at &gt;&lt; at &gt; -364,8 +357,8 &lt; at &gt;&lt; at &gt; static int read_header(struct log_c *log
 
 static inline int write_header(struct log_c *log)
 {
-header_to_disk(&amp;log-&gt;header, log-&gt;disk_header);
-return rw_header(log, WRITE);
+log-&gt;io_req.bi_rw = WRITE;
+return dm_io(&amp;log-&gt;io_req, 1, &amp;log-&gt;header_location, NULL);
 }
 
 /*----------------------------------------------------------------
&lt; at &gt;&lt; at &gt; -454,7 +447,10 &lt; at &gt;&lt; at &gt; static int create_log_context(struct dm_
 buf_size = dm_round_up((LOG_OFFSET &lt;&lt; SECTOR_SHIFT) +
        bitset_size, ti-&gt;limits.hardsect_size);
 lc-&gt;header_location.count = buf_size &gt;&gt; SECTOR_SHIFT;
+
 lc-&gt;io_req.mem.type = DM_IO_VMA;
+lc-&gt;io_req.mem.ptr.vma = lc-&gt;disk_header;
+lc-&gt;io_req.notify.fn = NULL;
 lc-&gt;io_req.client = dm_io_client_create(dm_div_up(buf_size,
    PAGE_SIZE));
 if (IS_ERR(lc-&gt;io_req.client)) {
&lt; at &gt;&lt; at &gt; -636,6 +632,9 &lt; at &gt;&lt; at &gt; static int disk_resume(struct dm_dirty_l
 /* set the correct number of regions in the header */
 lc-&gt;header.nr_regions = lc-&gt;region_count;
 
+/* update disk headers */
+header_to_disk(&amp;lc-&gt;header, lc-&gt;disk_header);
+
 /* write the new header */
 r = write_header(lc);
 if (r) {

</description>
    <dc:creator>Takahiro Yasui</dc:creator>
    <dc:date>2008-11-26T00:01:39</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6867">
    <title>[RFC][PATCH 0/4] dm-log: support multi-log devices</title>
    <link>http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6867</link>
    <description>Hi All,

This is my first post to this mailing list, but let me introduce
a patch set to implement multi-log devices for dm-mirror.
I appreciate your kind comments and suggestions on this patch set.


PATCH SET
=========
  1/4: dm-log: fix no io_client_destroy
  2/4: dm-log: remove unnecessary updates of io_req members
  3/4: dm-log: introduce multi-log devices
  4/4: dm-log: update interface to control multi-log devices


BACKGROUND
==========

device mapper mirroring (dm-mirror) uses "log" to keep status of
consistency among mirror devices. There are two types of log,
"core" and "disk". The former keep its data only on memory, and
the latter can keep data on a device. When system starts booting
and setup disks managed by device mapper, device mapper checks
log data if each region is in sync state among mirror devices.
And if not, it executes data replication from default-mirror device
to others.

However, once log disk breaks down, data replications are required
for a whole data disks, and if a size of data disk is huge, it
takes long time for disk replication, and it utilizes much system
resources, such as I/O bandwidth, CPU, memory. That might cause
system performance degradation until disk replication completes.

This patch introduces multi-log devices for mirror target, which
stores log data on multiple log devices, and decreases probability
of disk replication even if one log disk has broken down.


DESIGN OVERVIEW
===============

  * maximum number of log devices

    Nine devices, the same maximum number as mirror devices, can be
    used for log devices.

  * error handling during setting up a mirror device

    All log devices should be detected Linux kernel. If any disks
    are not found, a mirror construction will fail.

    Also, log headers are checked if all log devices have the same
    region numbers. New log devices are excluded from this check.
    (They are usually initialized by "0")

  * error handling during I/O operation

    An error has been detected on a log device after a mirror device
    is constructed, the device gets marked "fail". I/O operations
    are done only to valid log devices, and no I/O is issued on
    those failed devices. If all log devices fails, disk log works
    much like "core" log.

  * ioctl interface

    This patch does not affect "core" log interface, but change
    "disk" log interface. To keep backward compatibility, path names
    of log device are just listed without a number of log devices.

    &lt;current interface&gt;

        disk_ctr():
             log_path region_size [[no]sync]

        disk_status() - STATUSTYPE_INFO:
             nr_params disk log_path log_status:"A" or "D"

        disk_status() - STATUSTYPE_TABLE:
            disk nr_params log_path region_size [[no]sync]

    &lt;proposed interface&gt;

        disk_ctr():
            [log_path]{1,} region_size [[no]sync]

        disk_status() - STATUSTYPE_INFO:
            nr_params disk [log_path]{1,} [log_status:"A" or "D"]{1,}

        disk_status() - STATUSTYPE_TABLE:
            disk nr_params [log_path]{1,} region_size [[no]sync]


EXAMPLES
========

  * create mirror with one log device (the same operation as usual)

    # dmsetup create mirror --table \
      "0 2097152 mirror disk 2 253:0 1024 2 253:1 0 253:2 0"
    # dmsetup status mirror
    0 2097152 mirror 2 253:1 253:2 960/2048 1 AA 3 disk 253:0 A

  * create mirror with two log devices [NEW]

    # dmsetup create mirror --table \
      "0 2097152 mirror disk 3 253:0 253:1 1024 2 253:2 0 253:3 0"
    # dmsetup status mirror
    0 2097152 mirror 2 253:2 253:3 2048/2048 1 AA 4 disk 253:0 253:1 AA

  * create mirror with four log devices [NEW]

    # dmsetup create mirror --table \
      "0 2097152 mirror disk 5 253:0 253:1 253:2 253:3 1024 \
       2 253:4 0 253:5 0"
    # dmsetup status mirror
      0 2097152 mirror 2 253:4 253:5 316/2048 1 AA 6 disk \
      253:0 253:1 253:2 253:3 AAAA


FUTURE WORKS
============

  * Independent header and bitmap I/O

    Currently disk_header holds header and bitmap and issue I/O together.
    But header can be modified during construction and resume procedures,
    and we can remove header I/O.

  * Partial bitmap update

    bitmap size could be larger if the amount of mirror devices are huge
    or region size is small. In the current implementation, disk log handles
    bitmap as one buffer and updates a whole bitmap every time. Therefore,
    the amount of I/O issued to a log device could be larger than to mirror
    devices. Partial bitmap update can issue I/O of updated sectors.

---
Takahiro Yasui
Hitachi Computer Products (America) Inc.

</description>
    <dc:creator>Takahiro Yasui</dc:creator>
    <dc:date>2008-11-26T00:01:35</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6860">
    <title>[PATCH] Read the verbosity level from multipath.confconfiguration file</title>
    <link>http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6860</link>
    <description>With this patch, multipathd can read the verbosity level from
the multipath.conf configuration file.
For example:
verbosity       5

Signed-off-by: Ritesh Raj Sarraf &lt;rsarraf&lt; at &gt;netapp.com&gt;
---
 libmultipath/config.c      |    2 +-
 libmultipath/defaults.h    |    1 +
 libmultipath/dict.c        |   20 ++++++++++++++++++++
 multipath/multipath.conf.5 |    5 +++++
 4 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/libmultipath/config.c b/libmultipath/config.c
index 6d731d2..b49cc90 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
&lt; at &gt;&lt; at &gt; -420,7 +420,7 &lt; at &gt;&lt; at &gt; load_config (char * file)
  * internal defaults
  */
 if (!conf-&gt;verbosity)
-conf-&gt;verbosity = 2;
+conf-&gt;verbosity = DEFAULT_VERBOSITY;
 
 conf-&gt;dev_type = DEV_NONE;
 conf-&gt;minio = 1000;
diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
index 87b155e..a943673 100644
--- a/libmultipath/defaults.h
+++ b/libmultipath/defaults.h
&lt; at &gt;&lt; at &gt; -11,6 +11,7 &lt; at &gt;&lt; at &gt;
 #define DEFAULT_NO_PATH_RETRY  NO_PATH_RETRY_UNDEF
 #define DEFAULT_PGTIMEOUT      -PGTIMEOUT_NONE
 #define DEFAULT_USER_FRIENDLY_NAMES    0
+#define DEFAULT_VERBOSITY2
 
 #define DEFAULT_CHECKINT5
 #define MAX_CHECKINT(a)(a &lt;&lt; 2)
diff --git a/libmultipath/dict.c b/libmultipath/dict.c
index 2429a93..a6ec494 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
&lt; at &gt;&lt; at &gt; -33,6 +33,17 &lt; at &gt;&lt; at &gt; polling_interval_handler(vector strvec)
 }
 
 static int
+verbosity_handler(vector strvec)
+{
+char * buff;
+
+buff = VECTOR_SLOT(strvec, 1);
+conf-&gt;verbosity = atoi(buff);
+
+return 0;
+}
+
+static int
 udev_dir_handler(vector strvec)
 {
 conf-&gt;udev_dir = set_value(strvec);
&lt; at &gt;&lt; at &gt; -1316,6 +1327,14 &lt; at &gt;&lt; at &gt; snprint_def_polling_interval (char * buff, int len, void * data)
 }
 
 static int
+snprint_def_verbosity (char * buff, int len, void * data)
+{
+if (conf-&gt;verbosity == DEFAULT_VERBOSITY)
+return 0;
+return snprintf(buff, len, "%i", conf-&gt;verbosity);
+}
+
+static int
 snprint_def_udev_dir (char * buff, int len, void * data)
 {
 if (!conf-&gt;udev_dir)
&lt; at &gt;&lt; at &gt; -1549,6 +1568,7 &lt; at &gt;&lt; at &gt; void
 init_keywords(void)
 {
 install_keyword_root("defaults", NULL);
+install_keyword("verbosity", &amp;verbosity_handler, &amp;snprint_def_verbosity);
 install_keyword("polling_interval", &amp;polling_interval_handler, &amp;snprint_def_polling_interval);
 install_keyword("udev_dir", &amp;udev_dir_handler, &amp;snprint_def_udev_dir);
 install_keyword("multipath_dir", &amp;multipath_dir_handler, &amp;snprint_def_multipath_dir);
diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5
index 39639dd..6c53d9d 100644
--- a/multipath/multipath.conf.5
+++ b/multipath/multipath.conf.5
&lt; at &gt;&lt; at &gt; -74,6 +74,11 &lt; at &gt;&lt; at &gt; interval between two path checks in seconds; default is
 directory where udev creates its device nodes; default is
 .I /dev
 .TP
+.B verbosity
+default verbosity. Higher values increase the verbosity level. Valid
+levels are between 0 and 6; default is 2.
+.I /dev
+.TP
 .B selector
 The default path selector algorithm to use; they are offered by the
 kernel multipath target. The only currently implemented is
</description>
    <dc:creator>Ritesh Raj Sarraf</dc:creator>
    <dc:date>2008-11-21T10:54:30</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6857">
    <title>[PATCH] Read the verbosity level from multipath.confconfiguration file</title>
    <link>http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6857</link>
    <description>From: Ritesh Raj Sarraf &lt;rrs&lt; at &gt;researchut.com&gt;

With this patch, multipathd can read the verbosity level from
the multipath.conf configuration file.
For example:
verbosity       5

Signed-off-by: Ritesh Raj Sarraf &lt;rrs&lt; at &gt;researchut.com&gt;
---
 libmultipath/config.c      |    2 +-
 libmultipath/defaults.h    |    1 +
 libmultipath/dict.c        |   20 ++++++++++++++++++++
 multipath/multipath.conf.5 |    5 +++++
 4 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/libmultipath/config.c b/libmultipath/config.c
index 6d731d2..b49cc90 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
&lt; at &gt;&lt; at &gt; -420,7 +420,7 &lt; at &gt;&lt; at &gt; load_config (char * file)
  * internal defaults
  */
 if (!conf-&gt;verbosity)
-conf-&gt;verbosity = 2;
+conf-&gt;verbosity = DEFAULT_VERBOSITY;
 
 conf-&gt;dev_type = DEV_NONE;
 conf-&gt;minio = 1000;
diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
index 87b155e..a943673 100644
--- a/libmultipath/defaults.h
+++ b/libmultipath/defaults.h
&lt; at &gt;&lt; at &gt; -11,6 +11,7 &lt; at &gt;&lt; at &gt;
 #define DEFAULT_NO_PATH_RETRY  NO_PATH_RETRY_UNDEF
 #define DEFAULT_PGTIMEOUT      -PGTIMEOUT_NONE
 #define DEFAULT_USER_FRIENDLY_NAMES    0
+#define DEFAULT_VERBOSITY2
 
 #define DEFAULT_CHECKINT5
 #define MAX_CHECKINT(a)(a &lt;&lt; 2)
diff --git a/libmultipath/dict.c b/libmultipath/dict.c
index 2429a93..9d26e9e 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
&lt; at &gt;&lt; at &gt; -33,6 +33,17 &lt; at &gt;&lt; at &gt; polling_interval_handler(vector strvec)
 }
 
 static int
+verbosity_handler(vector strvec)
+{
+char * buff;
+
+buff = VECTOR_SLOT(strvec, 1);
+conf-&gt;verbosity = atoi(buff);
+
+return 0;
+}
+
+static int
 udev_dir_handler(vector strvec)
 {
 conf-&gt;udev_dir = set_value(strvec);
&lt; at &gt;&lt; at &gt; -1316,6 +1327,14 &lt; at &gt;&lt; at &gt; snprint_def_polling_interval (char * buff, int len, void * data)
 }
 
 static int
+snprint_def_verbosity (char * buff, int len, void * data)
+{
+if (conf-&gt;checkint == DEFAULT_VERBOSITY)
+return 0;
+return snprintf(buff, len, "%i", conf-&gt;verbosity);
+}
+
+static int
 snprint_def_udev_dir (char * buff, int len, void * data)
 {
 if (!conf-&gt;udev_dir)
&lt; at &gt;&lt; at &gt; -1549,6 +1568,7 &lt; at &gt;&lt; at &gt; void
 init_keywords(void)
 {
 install_keyword_root("defaults", NULL);
+install_keyword("verbosity", &amp;verbosity_handler, &amp;snprint_def_verbosity);
 install_keyword("polling_interval", &amp;polling_interval_handler, &amp;snprint_def_polling_interval);
 install_keyword("udev_dir", &amp;udev_dir_handler, &amp;snprint_def_udev_dir);
 install_keyword("multipath_dir", &amp;multipath_dir_handler, &amp;snprint_def_multipath_dir);
diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5
index 39639dd..6c53d9d 100644
--- a/multipath/multipath.conf.5
+++ b/multipath/multipath.conf.5
&lt; at &gt;&lt; at &gt; -74,6 +74,11 &lt; at &gt;&lt; at &gt; interval between two path checks in seconds; default is
 directory where udev creates its device nodes; default is
 .I /dev
 .TP
+.B verbosity
+default verbosity. Higher values increase the verbosity level. Valid
+levels are between 0 and 6; default is 2.
+.I /dev
+.TP
 .B selector
 The default path selector algorithm to use; they are offered by the
 kernel multipath target. The only currently implemented is
</description>
    <dc:creator>rrs&lt; at &gt;researchut.com</dc:creator>
    <dc:date>2008-11-20T12:41:15</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6855">
    <title>[PATCH] Change occurences of netapp to ontap.</title>
    <link>http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6855</link>
    <description>From: Ritesh Raj Sarraf &lt;rrs&lt; at &gt;researchut.com&gt;

Change all occurences of netapp to ontap to be consistent.
This was proposed earlier, before the libprio change, but was deferred for
compatibility reasons.

Also update maintainers.
---
 libmultipath/prioritizers/netapp.c |  243 ------------------------------------
 libmultipath/prioritizers/netapp.h |    7 -
 libmultipath/prioritizers/ontap.c  |  243 ++++++++++++++++++++++++++++++++++++
 libmultipath/prioritizers/ontap.h  |    7 +
 4 files changed, 250 insertions(+), 250 deletions(-)
 delete mode 100644 libmultipath/prioritizers/netapp.c
 delete mode 100644 libmultipath/prioritizers/netapp.h
 create mode 100644 libmultipath/prioritizers/ontap.c
 create mode 100644 libmultipath/prioritizers/ontap.h

diff --git a/libmultipath/prioritizers/netapp.c b/libmultipath/prioritizers/netapp.c
deleted file mode 100644
index 812ecf6..0000000
--- a/libmultipath/prioritizers/netapp.c
+++ /dev/null
&lt; at &gt;&lt; at &gt; -1,243 +0,0 &lt; at &gt;&lt; at &gt;
-/* 
- * Copyright 2005 Network Appliance, Inc., All Rights Reserved
- * Author:  David Wysochanski available at davidw&lt; at &gt;netapp.com
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of 
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License v2 for more details.
- */
-
-#include &lt;stdio.h&gt;
-#include &lt;stdlib.h&gt;
-#include &lt;string.h&gt;
-#include &lt;sys/ioctl.h&gt;
-#include &lt;errno.h&gt;
-#include &lt;assert.h&gt;
-
-#include &lt;sg_include.h&gt;
-#include &lt;debug.h&gt;
-#include &lt;prio.h&gt;
-
-#define INQUIRY_CMD0x12
-#define INQUIRY_CMDLEN6
-#define DEFAULT_PRIOVAL10
-#define RESULTS_MAX256
-#define SG_TIMEOUT30000
-
-#define pp_netapp_log(prio, fmt, args...) \
-        condlog(prio, "%s: netapp prio: " fmt, dev, ##args)
-
-static void dump_cdb(unsigned char *cdb, int size)
-{
-int i;
-char buf[10*5+1];
-char * p = &amp;buf[0];
-
-condlog(0, "- SCSI CDB: ");
-for (i=0; i&lt;size; i++) {
-p += snprintf(p, 10*(size-i), "0x%02x ", cdb[i]);
-}
-condlog(0, "%s", buf);
-}
-
-static void process_sg_error(struct sg_io_hdr *io_hdr)
-{
-int i;
-char buf[128*5+1];
-char * p = &amp;buf[0];
-
-condlog(0, "- masked_status=0x%02x, host_status=0x%02x, "
-"driver_status=0x%02x", io_hdr-&gt;masked_status,
-io_hdr-&gt;host_status, io_hdr-&gt;driver_status);
-if (io_hdr-&gt;sb_len_wr &gt; 0) {
-condlog(0, "- SCSI sense data: ");
-for (i=0; i&lt;io_hdr-&gt;sb_len_wr; i++) {
-p += snprintf(p, 128*(io_hdr-&gt;sb_len_wr-i), "0x%02x ",
-      io_hdr-&gt;sbp[i]);
-}
-condlog(0, "%s", buf);
-}
-}
-
-/*
- * Returns:
- * -1: error, errno set
- *  0: success
- */
-static int send_gva(const char *dev, int fd, unsigned char pg,
-    unsigned char *results, int *results_size)
-{
-unsigned char sb[128];
-unsigned char cdb[10] = {0xc0, 0, 0x1, 0xa, 0x98, 0xa,
- pg, sizeof(sb), 0, 0};
-struct sg_io_hdr io_hdr;
-int ret = -1;
-
-memset(&amp;io_hdr, 0, sizeof (struct sg_io_hdr));
-io_hdr.interface_id = 'S';
-io_hdr.cmd_len = sizeof (cdb);
-io_hdr.mx_sb_len = sizeof (sb);
-io_hdr.dxfer_direction = SG_DXFER_FROM_DEV;
-io_hdr.dxfer_len = *results_size;
-io_hdr.dxferp = results;
-io_hdr.cmdp = cdb;
-io_hdr.sbp = sb;
-io_hdr.timeout = SG_TIMEOUT;
-io_hdr.pack_id = 0;
-if (ioctl(fd, SG_IO, &amp;io_hdr) &lt; 0) {
-pp_netapp_log(0, "SG_IO ioctl failed, errno=%d", errno);
-dump_cdb(cdb, sizeof(cdb));
-goto out;
-}
-if (io_hdr.info &amp; SG_INFO_OK_MASK) {
-pp_netapp_log(0, "SCSI error");
-dump_cdb(cdb, sizeof(cdb));
-process_sg_error(&amp;io_hdr);
-goto out;
-}
-
-if (results[4] != 0x0a || results[5] != 0x98 ||
-    results[6] != 0x0a ||results[7] != 0x01) {
-dump_cdb(cdb, sizeof(cdb));
-pp_netapp_log(0, "GVA return wrong format ");
-pp_netapp_log(0, "results[4-7] = 0x%02x 0x%02x 0x%02x 0x%02x",
-results[4], results[5], results[6], results[7]);
-goto out;
-}
-ret = 0;
- out:
-return(ret);
-}
-
-/*
- * Retuns:
- * -1: Unable to obtain proxy info
- *  0: Device _not_ proxy path
- *  1: Device _is_ proxy path
- */
-static int get_proxy(const char *dev, int fd)
-{
-unsigned char results[256];
-unsigned char sb[128];
-unsigned char cdb[INQUIRY_CMDLEN] = {INQUIRY_CMD, 1, 0xc1, 0,
-   sizeof(sb), 0};
-struct sg_io_hdr io_hdr;
-int ret = -1;
-
-memset(&amp;results, 0, sizeof (results));
-memset(&amp;io_hdr, 0, sizeof (struct sg_io_hdr));
-io_hdr.interface_id = 'S';
-io_hdr.cmd_len = sizeof (cdb);
-io_hdr.mx_sb_len = sizeof (sb);
-io_hdr.dxfer_direction = SG_DXFER_FROM_DEV;
-io_hdr.dxfer_len = sizeof (results);
-io_hdr.dxferp = results;
-io_hdr.cmdp = cdb;
-io_hdr.sbp = sb;
-io_hdr.timeout = SG_TIMEOUT;
-io_hdr.pack_id = 0;
-if (ioctl(fd, SG_IO, &amp;io_hdr) &lt; 0) {
-pp_netapp_log(0, "ioctl sending inquiry command failed, "
-"errno=%d", errno);
-dump_cdb(cdb, sizeof(cdb));
-goto out;
-}
-if (io_hdr.info &amp; SG_INFO_OK_MASK) {
-pp_netapp_log(0, "SCSI error");
-dump_cdb(cdb, sizeof(cdb));
-process_sg_error(&amp;io_hdr);
-goto out;
-}
-
-if (results[1] != 0xc1 || results[8] != 0x0a ||
-    results[9] != 0x98 || results[10] != 0x0a ||
-    results[11] != 0x0 || results[12] != 0xc1 ||
-    results[13] != 0x0) {
-pp_netapp_log(0,"proxy info page in unknown format - ");
-pp_netapp_log(0,"results[8-13]=0x%02x 0x%02x 0x%02x 0x%02x "
-"0x%02x 0x%02x",
-results[8], results[9], results[10],
-results[11], results[12], results[13]);
-dump_cdb(cdb, sizeof(cdb));
-goto out;
-}
-ret = (results[19] &amp; 0x02) &gt;&gt; 1;
-
- out:
-return(ret);
-}
-
-/*
- * Returns priority of device based on device info.
- *
- * 4: FCP non-proxy, FCP proxy unknown, or unable to determine protocol
- * 3: iSCSI HBA
- * 2: iSCSI software
- * 1: FCP proxy
- */
-static int netapp_prio(const char *dev, int fd)
-{
-unsigned char results[RESULTS_MAX];
-int results_size=RESULTS_MAX;
-int rc;
-int is_proxy;
-int is_iscsi_software;
-int is_iscsi_hardware;
-int tot_len;
-
-is_iscsi_software = is_iscsi_hardware = is_proxy = 0;
-
-memset(&amp;results, 0, sizeof (results));
-rc = send_gva(dev, fd, 0x41, results, &amp;results_size);
-if (rc == 0) {
-tot_len = results[0] &lt;&lt; 24 | results[1] &lt;&lt; 16 |
-  results[2] &lt;&lt; 8 | results[3];
-if (tot_len &lt;= 8) {
-goto try_fcp_proxy;
-}
-if (results[8] != 0x41) {
-pp_netapp_log(0, "GVA page 0x41 error - "
-"results[8] = 0x%x", results[8]);
-goto try_fcp_proxy;
-}
-if ((strncmp((char *)&amp;results[12], "ism_sw", 6) == 0) ||
-    (strncmp((char *)&amp;results[12], "iswt", 4) == 0)) {
-is_iscsi_software = 1;
-goto prio_select;
-}
-else if (strncmp((char *)&amp;results[12], "ism_sn", 6) == 0) {
-is_iscsi_hardware = 1;
-goto prio_select;
-}
-}
-
- try_fcp_proxy:
-rc = get_proxy(dev, fd);
-if (rc &gt;= 0) {
-is_proxy = rc;
-}
-
- prio_select:
-if (is_iscsi_hardware) {
-return 3;
-} else if (is_iscsi_software) {
-return 2;
-} else {
-if (is_proxy) {
-return 1;
-} else {
-/* Either non-proxy, or couldn't get proxy info */
-return 4;
-}
-}
-}
-
-int getprio (struct path * pp)
-{
-return netapp_prio(pp-&gt;dev, pp-&gt;fd);
-}
diff --git a/libmultipath/prioritizers/netapp.h b/libmultipath/prioritizers/netapp.h
deleted file mode 100644
index ec38821..0000000
--- a/libmultipath/prioritizers/netapp.h
+++ /dev/null
&lt; at &gt;&lt; at &gt; -1,7 +0,0 &lt; at &gt;&lt; at &gt;
-#ifndef _NETAPP_H
-#define _NETAPP_H
-
-#define PRIO_NETAPP "netapp"
-int prio_netapp(struct path * pp);
-
-#endif
diff --git a/libmultipath/prioritizers/ontap.c b/libmultipath/prioritizers/ontap.c
new file mode 100644
index 0000000..812ecf6
--- /dev/null
+++ b/libmultipath/prioritizers/ontap.c
&lt; at &gt;&lt; at &gt; -0,0 +1,243 &lt; at &gt;&lt; at &gt;
+/* 
+ * Copyright 2005 Network Appliance, Inc., All Rights Reserved
+ * Author:  David Wysochanski available at davidw&lt; at &gt;netapp.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of 
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License v2 for more details.
+ */
+
+#include &lt;stdio.h&gt;
+#include &lt;stdlib.h&gt;
+#include &lt;string.h&gt;
+#include &lt;sys/ioctl.h&gt;
+#include &lt;errno.h&gt;
+#include &lt;assert.h&gt;
+
+#include &lt;sg_include.h&gt;
+#include &lt;debug.h&gt;
+#include &lt;prio.h&gt;
+
+#define INQUIRY_CMD0x12
+#define INQUIRY_CMDLEN6
+#define DEFAULT_PRIOVAL10
+#define RESULTS_MAX256
+#define SG_TIMEOUT30000
+
+#define pp_netapp_log(prio, fmt, args...) \
+        condlog(prio, "%s: netapp prio: " fmt, dev, ##args)
+
+static void dump_cdb(unsigned char *cdb, int size)
+{
+int i;
+char buf[10*5+1];
+char * p = &amp;buf[0];
+
+condlog(0, "- SCSI CDB: ");
+for (i=0; i&lt;size; i++) {
+p += snprintf(p, 10*(size-i), "0x%02x ", cdb[i]);
+}
+condlog(0, "%s", buf);
+}
+
+static void process_sg_error(struct sg_io_hdr *io_hdr)
+{
+int i;
+char buf[128*5+1];
+char * p = &amp;buf[0];
+
+condlog(0, "- masked_status=0x%02x, host_status=0x%02x, "
+"driver_status=0x%02x", io_hdr-&gt;masked_status,
+io_hdr-&gt;host_status, io_hdr-&gt;driver_status);
+if (io_hdr-&gt;sb_len_wr &gt; 0) {
+condlog(0, "- SCSI sense data: ");
+for (i=0; i&lt;io_hdr-&gt;sb_len_wr; i++) {
+p += snprintf(p, 128*(io_hdr-&gt;sb_len_wr-i), "0x%02x ",
+      io_hdr-&gt;sbp[i]);
+}
+condlog(0, "%s", buf);
+}
+}
+
+/*
+ * Returns:
+ * -1: error, errno set
+ *  0: success
+ */
+static int send_gva(const char *dev, int fd, unsigned char pg,
+    unsigned char *results, int *results_size)
+{
+unsigned char sb[128];
+unsigned char cdb[10] = {0xc0, 0, 0x1, 0xa, 0x98, 0xa,
+ pg, sizeof(sb), 0, 0};
+struct sg_io_hdr io_hdr;
+int ret = -1;
+
+memset(&amp;io_hdr, 0, sizeof (struct sg_io_hdr));
+io_hdr.interface_id = 'S';
+io_hdr.cmd_len = sizeof (cdb);
+io_hdr.mx_sb_len = sizeof (sb);
+io_hdr.dxfer_direction = SG_DXFER_FROM_DEV;
+io_hdr.dxfer_len = *results_size;
+io_hdr.dxferp = results;
+io_hdr.cmdp = cdb;
+io_hdr.sbp = sb;
+io_hdr.timeout = SG_TIMEOUT;
+io_hdr.pack_id = 0;
+if (ioctl(fd, SG_IO, &amp;io_hdr) &lt; 0) {
+pp_netapp_log(0, "SG_IO ioctl failed, errno=%d", errno);
+dump_cdb(cdb, sizeof(cdb));
+goto out;
+}
+if (io_hdr.info &amp; SG_INFO_OK_MASK) {
+pp_netapp_log(0, "SCSI error");
+dump_cdb(cdb, sizeof(cdb));
+process_sg_error(&amp;io_hdr);
+goto out;
+}
+
+if (results[4] != 0x0a || results[5] != 0x98 ||
+    results[6] != 0x0a ||results[7] != 0x01) {
+dump_cdb(cdb, sizeof(cdb));
+pp_netapp_log(0, "GVA return wrong format ");
+pp_netapp_log(0, "results[4-7] = 0x%02x 0x%02x 0x%02x 0x%02x",
+results[4], results[5], results[6], results[7]);
+goto out;
+}
+ret = 0;
+ out:
+return(ret);
+}
+
+/*
+ * Retuns:
+ * -1: Unable to obtain proxy info
+ *  0: Device _not_ proxy path
+ *  1: Device _is_ proxy path
+ */
+static int get_proxy(const char *dev, int fd)
+{
+unsigned char results[256];
+unsigned char sb[128];
+unsigned char cdb[INQUIRY_CMDLEN] = {INQUIRY_CMD, 1, 0xc1, 0,
+   sizeof(sb), 0};
+struct sg_io_hdr io_hdr;
+int ret = -1;
+
+memset(&amp;results, 0, sizeof (results));
+memset(&amp;io_hdr, 0, sizeof (struct sg_io_hdr));
+io_hdr.interface_id = 'S';
+io_hdr.cmd_len = sizeof (cdb);
+io_hdr.mx_sb_len = sizeof (sb);
+io_hdr.dxfer_direction = SG_DXFER_FROM_DEV;
+io_hdr.dxfer_len = sizeof (results);
+io_hdr.dxferp = results;
+io_hdr.cmdp = cdb;
+io_hdr.sbp = sb;
+io_hdr.timeout = SG_TIMEOUT;
+io_hdr.pack_id = 0;
+if (ioctl(fd, SG_IO, &amp;io_hdr) &lt; 0) {
+pp_netapp_log(0, "ioctl sending inquiry command failed, "
+"errno=%d", errno);
+dump_cdb(cdb, sizeof(cdb));
+goto out;
+}
+if (io_hdr.info &amp; SG_INFO_OK_MASK) {
+pp_netapp_log(0, "SCSI error");
+dump_cdb(cdb, sizeof(cdb));
+process_sg_error(&amp;io_hdr);
+goto out;
+}
+
+if (results[1] != 0xc1 || results[8] != 0x0a ||
+    results[9] != 0x98 || results[10] != 0x0a ||
+    results[11] != 0x0 || results[12] != 0xc1 ||
+    results[13] != 0x0) {
+pp_netapp_log(0,"proxy info page in unknown format - ");
+pp_netapp_log(0,"results[8-13]=0x%02x 0x%02x 0x%02x 0x%02x "
+"0x%02x 0x%02x",
+results[8], results[9], results[10],
+results[11], results[12], results[13]);
+dump_cdb(cdb, sizeof(cdb));
+goto out;
+}
+ret = (results[19] &amp; 0x02) &gt;&gt; 1;
+
+ out:
+return(ret);
+}
+
+/*
+ * Returns priority of device based on device info.
+ *
+ * 4: FCP non-proxy, FCP proxy unknown, or unable to determine protocol
+ * 3: iSCSI HBA
+ * 2: iSCSI software
+ * 1: FCP proxy
+ */
+static int netapp_prio(const char *dev, int fd)
+{
+unsigned char results[RESULTS_MAX];
+int results_size=RESULTS_MAX;
+int rc;
+int is_proxy;
+int is_iscsi_software;
+int is_iscsi_hardware;
+int tot_len;
+
+is_iscsi_software = is_iscsi_hardware = is_proxy = 0;
+
+memset(&amp;results, 0, sizeof (results));
+rc = send_gva(dev, fd, 0x41, results, &amp;results_size);
+if (rc == 0) {
+tot_len = results[0] &lt;&lt; 24 | results[1] &lt;&lt; 16 |
+  results[2] &lt;&lt; 8 | results[3];
+if (tot_len &lt;= 8) {
+goto try_fcp_proxy;
+}
+if (results[8] != 0x41) {
+pp_netapp_log(0, "GVA page 0x41 error - "
+"results[8] = 0x%x", results[8]);
+goto try_fcp_proxy;
+}
+if ((strncmp((char *)&amp;results[12], "ism_sw", 6) == 0) ||
+    (strncmp((char *)&amp;results[12], "iswt", 4) == 0)) {
+is_iscsi_software = 1;
+goto prio_select;
+}
+else if (strncmp((char *)&amp;results[12], "ism_sn", 6) == 0) {
+is_iscsi_hardware = 1;
+goto prio_select;
+}
+}
+
+ try_fcp_proxy:
+rc = get_proxy(dev, fd);
+if (rc &gt;= 0) {
+is_proxy = rc;
+}
+
+ prio_select:
+if (is_iscsi_hardware) {
+return 3;
+} else if (is_iscsi_software) {
+return 2;
+} else {
+if (is_proxy) {
+return 1;
+} else {
+/* Either non-proxy, or couldn't get proxy info */
+return 4;
+}
+}
+}
+
+int getprio (struct path * pp)
+{
+return netapp_prio(pp-&gt;dev, pp-&gt;fd);
+}
diff --git a/libmultipath/prioritizers/ontap.h b/libmultipath/prioritizers/ontap.h
new file mode 100644
index 0000000..ec38821
--- /dev/null
+++ b/libmultipath/prioritizers/ontap.h
&lt; at &gt;&lt; at &gt; -0,0 +1,7 &lt; at &gt;&lt; at &gt;
+#ifndef _NETAPP_H
+#define _NETAPP_H
+
+#define PRIO_NETAPP "netapp"
+int prio_netapp(struct path * pp);
+
+#endif
</description>
    <dc:creator>Ritesh Raj Sarraf</dc:creator>
    <dc:date>2008-11-20T09:59:16</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6846">
    <title>[Patch 2 of 2]: Cluster-aware dirty log - log module</title>
    <link>http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6846</link>
    <description>Please note that I have not yet tested this module (outside of
compiling) with the latest kernel.  The only concern I have is with the
newly refactored region hash code.  I'll probably get to testing this
next week.

brassow

This patch contains a cluster-aware log module.  When used
by dm-raid1, device-mapper mirroring can be cluster-aware.

There is a kernel component (provided in this patch) and a
user space component.  The kernel component implements the
logging interface and passes all requests to userspace via
'connector' (a netlink wrapper).  The userspace daemon is
built upon OpenAIS for cluster communication and is fault
tolerant.

The logging interface code (found in dm-clog.c) is fairly
simple and should be close to completion.  I would really
enjoy feedback on:
1) Connector code (found in dm-clog-tfr.c)
2) Suggestions on possible break-ups of dm-clog-tfr.h,
   since some of that code is meant to be used by both
   userspace and kernel space, and some is meant only
   for kernel space...  (I think the code meant for both
   should go in include/linux, while the code just meant
   for kernel space should go in drivers/md... yes?)

Still-needs-testing-by: Jonathan Brassow &lt;jbrassow&lt; at &gt;redhat.com&gt;
Index: linux-2.6/drivers/md/dm-clog-tfr.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/md/dm-clog-tfr.c
&lt; at &gt;&lt; at &gt; -0,0 +1,272 &lt; at &gt;&lt; at &gt;
+/*
+ * Copyright (C) 2006-2008 Red Hat, Inc.
+ *
+ * This file is released under the LGPL.
+ */
+
+#include &lt;linux/kernel.h&gt;
+#include &lt;linux/module.h&gt;
+#include &lt;net/sock.h&gt;
+#include &lt;linux/workqueue.h&gt;
+#include &lt;linux/connector.h&gt;
+#include &lt;linux/device-mapper.h&gt;
+
+#include &lt;linux/dm-cluster-log.h&gt;
+#include "dm-clog-tfr.h"
+
+#include &lt;asm/div64.h&gt; /* Unnecessary */
+
+#define SHORT_UUID(x) (strlen(x) &gt; 8) ? ((x) + (strlen(x) - 8)) : (x)
+
+static uint32_t seq = 0;
+
+/*
+ * Pre-allocated space for speed
+ */
+#define DM_CLOG_PREALLOCED_SIZE 512
+static struct cn_msg *prealloced_cn_msg = NULL;
+static struct clog_tfr *prealloced_clog_tfr = NULL;
+
+static struct cb_id cn_clog_id = { 0x4, 0x1 };
+static DEFINE_MUTEX(_lock);
+
+struct receiving_pkg {
+struct list_head list;
+struct completion complete;
+
+uint32_t seq;
+
+int error;
+int *data_size;
+char *data;
+};
+
+static spinlock_t receiving_list_lock = SPIN_LOCK_UNLOCKED;
+static struct list_head receiving_list;
+
+static int dm_clog_sendto_server(struct clog_tfr *tfr)
+{
+int r;
+int size;
+struct cn_msg *msg = prealloced_cn_msg;
+
+if (tfr != prealloced_clog_tfr) {
+size = sizeof(struct cn_msg) +
+sizeof(struct clog_tfr) + tfr-&gt;data_size;
+msg = kmalloc(size, GFP_NOIO);
+if (!msg)
+return -ENOMEM;
+memcpy((msg + 1), tfr,
+       sizeof(struct clog_tfr) + tfr-&gt;data_size);
+}
+
+memset(msg, 0, sizeof(struct cn_msg));
+
+msg-&gt;id.idx = cn_clog_id.idx;
+msg-&gt;id.val = cn_clog_id.val;
+msg-&gt;ack = 0;
+msg-&gt;seq = tfr-&gt;seq;
+msg-&gt;len = sizeof(struct clog_tfr) + tfr-&gt;data_size;
+
+r = cn_netlink_send(msg, 0, gfp_any());
+
+if (msg != prealloced_cn_msg)
+kfree(msg);
+
+return r;
+}
+
+/*
+ * fill_pkg
+ * &lt; at &gt;msg
+ * &lt; at &gt;tfr
+ *
+ * Parameters can be either msg or tfr, but not both.  This
+ * function fills in the reply for a waiting request.  If just
+ * msg is given, then the reply is simply an ACK from userspace
+ * that the request was received.
+ *
+ * Returns: 0 on success, -ENOENT on failure
+ */
+static int fill_pkg(struct cn_msg *msg, struct clog_tfr *tfr)
+{
+uint32_t rtn_seq = (msg) ? msg-&gt;seq : (tfr) ? tfr-&gt;seq : 0;
+struct receiving_pkg *pkg;
+
+list_for_each_entry(pkg, &amp;receiving_list, list) {
+if (rtn_seq != pkg-&gt;seq)
+continue;
+
+if (msg) {
+pkg-&gt;error = -msg-&gt;ack;
+/*
+ * If we are trying again, we will need to know our
+ * storage capacity.  Otherwise, along with the
+ * error code, we make explicit that we have no data.
+ */
+if (pkg-&gt;error != -EAGAIN)
+*(pkg-&gt;data_size) = 0;
+} else if (tfr-&gt;data_size &gt; *(pkg-&gt;data_size)) {
+DMERR("Insufficient space to receive package [%s]::",
+      RQ_TYPE(tfr-&gt;request_type));
+DMERR("  tfr-&gt;data_size    = %u", tfr-&gt;data_size);
+DMERR("  *(pkg-&gt;data_size) = %u", *(pkg-&gt;data_size));
+
+*(pkg-&gt;data_size) = 0;
+pkg-&gt;error = -ENOSPC;
+} else {
+pkg-&gt;error = tfr-&gt;error;
+memcpy(pkg-&gt;data, tfr-&gt;data, tfr-&gt;data_size);
+*(pkg-&gt;data_size) = tfr-&gt;data_size;
+}
+complete(&amp;pkg-&gt;complete);
+return 0;
+}
+
+return -ENOENT;
+}
+
+/*
+ * cn_clog_callback
+ * &lt; at &gt;data
+ *
+ * This is the connector callback that delivers data
+ * that was sent from userspace.
+ */
+static void cn_clog_callback(void *data)
+{
+struct cn_msg *msg = (struct cn_msg *)data;
+struct clog_tfr *tfr = (struct clog_tfr *)(msg + 1);
+
+spin_lock(&amp;receiving_list_lock);
+if (msg-&gt;len == 0)
+fill_pkg(msg, NULL);
+else if (msg-&gt;len &lt; sizeof(*tfr))
+DMERR("Incomplete message received: [%u]", msg-&gt;seq);
+else
+fill_pkg(NULL, tfr);
+spin_unlock(&amp;receiving_list_lock);
+}
+
+/*
+ * dm_clog_consult_server
+ * &lt; at &gt;uuid: log's uuid (must be DM_UUID_LEN in size)
+ * &lt; at &gt;request_type:
+ * &lt; at &gt;data: data to tx to the server
+ * &lt; at &gt;data_size: size of data in bytes
+ * &lt; at &gt;rdata: place to put return data from server
+ * &lt; at &gt;rdata_size: value-result (amount of space given/amount of space used)
+ *
+ * Only one process at a time can communicate with the server.
+ * rdata_size is undefined on failure.
+ *
+ * Returns: 0 on success, -EXXX on failure
+ */
+int dm_clog_consult_server(const char *uuid, int request_type,
+   char *data, int data_size,
+   char *rdata, int *rdata_size)
+{
+int r = 0;
+int dummy = 0;
+int overhead_size = sizeof(struct clog_tfr *) + sizeof(struct cn_msg);
+struct clog_tfr *tfr = prealloced_clog_tfr;
+struct receiving_pkg pkg;
+
+if (data_size &gt; (DM_CLOG_PREALLOCED_SIZE - overhead_size)) {
+DMINFO("Size of tfr exceeds preallocated size");
+/* FIXME: is kmalloc sufficient if we need this much space? */
+tfr = kzalloc(data_size + sizeof(*tfr), GFP_NOIO);
+}
+
+if (!tfr)
+return -ENOMEM;
+
+if (!rdata_size)
+rdata_size = &amp;dummy;
+resend:
+/*
+ * We serialize the sending of requests so we can
+ * use the preallocated space.
+ */
+mutex_lock(&amp;_lock);
+
+memset(tfr, 0, DM_CLOG_PREALLOCED_SIZE - overhead_size);
+memcpy(tfr-&gt;uuid, uuid, DM_UUID_LEN);
+tfr-&gt;seq = seq++;
+tfr-&gt;request_type = request_type;
+tfr-&gt;data_size = data_size;
+if (data &amp;&amp; data_size)
+memcpy(tfr-&gt;data, data, data_size);
+
+memset(&amp;pkg, 0, sizeof(pkg));
+init_completion(&amp;pkg.complete);
+pkg.seq = tfr-&gt;seq;
+pkg.data_size = rdata_size;
+pkg.data = rdata;
+spin_lock(&amp;receiving_list_lock);
+list_add(&amp;(pkg.list), &amp;receiving_list);
+spin_unlock(&amp;receiving_list_lock);
+
+r = dm_clog_sendto_server(tfr);
+
+mutex_unlock(&amp;_lock);
+
+if (r) {
+DMERR("Unable to send cluster log request [%s] to server: %d",
+      RQ_TYPE(request_type), r);
+spin_lock(&amp;receiving_list_lock);
+list_del_init(&amp;(pkg.list));
+spin_unlock(&amp;receiving_list_lock);
+
+goto out;
+}
+
+r = wait_for_completion_timeout(&amp;(pkg.complete), 15 * HZ);
+spin_lock(&amp;receiving_list_lock);
+list_del_init(&amp;(pkg.list));
+spin_unlock(&amp;receiving_list_lock);
+if (!r) {
+DMWARN("[%s] Request timed out: [%s/%u] - retrying",
+       SHORT_UUID(uuid), RQ_TYPE(request_type), pkg.seq);
+goto resend;
+}
+
+r = pkg.error;
+if (r == -EAGAIN)
+goto resend;
+
+out:
+if (tfr != (struct clog_tfr *)prealloced_clog_tfr)
+kfree(tfr);
+
+return r;
+}
+
+int dm_clog_tfr_init(void)
+{
+int r;
+void *prealloced;
+
+INIT_LIST_HEAD(&amp;receiving_list);
+
+prealloced = kmalloc(DM_CLOG_PREALLOCED_SIZE, GFP_KERNEL);
+if (!prealloced)
+return -ENOMEM;
+
+prealloced_cn_msg = prealloced;
+prealloced_clog_tfr = prealloced + sizeof(struct cn_msg);
+
+r = cn_add_callback(&amp;cn_clog_id, "clulog", cn_clog_callback);
+if (r) {
+cn_del_callback(&amp;cn_clog_id);
+return r;
+}
+
+return 0;
+}
+
+void dm_clog_tfr_exit(void)
+{
+cn_del_callback(&amp;cn_clog_id);
+kfree(prealloced_cn_msg);
+}
Index: linux-2.6/drivers/md/dm-clog-tfr.h
===================================================================
--- /dev/null
+++ linux-2.6/drivers/md/dm-clog-tfr.h
&lt; at &gt;&lt; at &gt; -0,0 +1,18 &lt; at &gt;&lt; at &gt;
+/*
+ * Copyright (C) 2006-2008 Red Hat, Inc.
+ *
+ * This file is released under the LGPL.
+ */
+
+#ifndef __DM_CLOG_TFR_H__
+#define __DM_CLOG_TFR_H__
+
+#define DM_MSG_PREFIX "dm-log-clustered"
+
+int dm_clog_tfr_init(void);
+void dm_clog_tfr_exit(void);
+int dm_clog_consult_server(const char *uuid, int request_type,
+   char *data, int data_size,
+   char *rdata, int *rdata_size);
+
+#endif /* __DM_CLOG_TFR_H__ */
Index: linux-2.6/drivers/md/dm-clog.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/md/dm-clog.c
&lt; at &gt;&lt; at &gt; -0,0 +1,786 &lt; at &gt;&lt; at &gt;
+/*
+ * Copyright (C) 2006-2008 Red Hat, Inc.
+ *
+ * This file is released under the LGPL.
+ */
+
+#include &lt;linux/blkdev.h&gt; /* for sector_div, which is used in dm-dirty-log.h */
+#include &lt;linux/bio.h&gt;
+#include &lt;linux/dm-dirty-log.h&gt;
+#include &lt;linux/device-mapper.h&gt;
+
+#include &lt;linux/dm-cluster-log.h&gt;
+#include "dm-clog-tfr.h"
+
+struct flush_entry {
+int type;
+region_t region;
+struct list_head list;
+};
+
+struct log_c {
+struct dm_target *ti;
+uint32_t region_size;
+region_t region_count;
+char uuid[DM_UUID_LEN];
+
+char *ctr_str; /* Gives ability to restart if userspace dies */
+uint32_t ctr_size;
+
+/*
+ * in_sync_hint gets set when doing is_remote_recovering.  It
+ * represents the first region that needs recovery.  IOW, the
+ * first zero bit of sync_bits.  This can be useful for to limit
+ * traffic for calls like is_remote_recovering and get_resync_work,
+ * but be take care in its use for anything else.
+ */
+uint64_t in_sync_hint;
+
+spinlock_t flush_lock;
+struct list_head flush_list;  /* only for clear and mark requests */
+
+struct dm_dev *disk_log;
+};
+
+static mempool_t *flush_entry_pool = NULL;
+
+static void *flush_entry_alloc(gfp_t gfp_mask, void *pool_data)
+{
+return kmalloc(sizeof(struct flush_entry), gfp_mask);
+}
+
+static void flush_entry_free(void *element, void *pool_data)
+{
+kfree(element);
+}
+
+int cluster_do_request(struct log_c *lc, const char *uuid, int request_type,
+       char *data, int data_size, char *rdata, int *rdata_size)
+{
+int r;
+
+/*
+ * If the server isn't there, -ESRCH is returned,
+ * and we must keep trying until the server is
+ * restored.
+ */
+retry:
+r = dm_clog_consult_server(uuid, request_type, data,
+   data_size, rdata, rdata_size);
+
+if (r != -ESRCH)
+return r;
+
+DMERR(" Userspace cluster log server not found.");
+while (1) {
+set_current_state(TASK_INTERRUPTIBLE);
+schedule_timeout(2*HZ);
+DMWARN("Attempting to contact cluster log server...");
+r = dm_clog_consult_server(uuid, DM_CLOG_CTR, lc-&gt;ctr_str,
+   lc-&gt;ctr_size, NULL, NULL);
+if (!r)
+break;
+}
+DMINFO("Reconnected to cluster log server... CTR complete");
+r = dm_clog_consult_server(uuid, DM_CLOG_RESUME, NULL,
+   0, NULL, NULL);
+if (!r)
+goto retry;
+
+DMERR("Error trying to resume cluster log: %d", r);
+
+return -ESRCH;
+}
+
+static int cluster_ctr(struct dm_dirty_log *log, struct dm_target *ti,
+       unsigned int argc, char **argv,
+       struct dm_dev *disk_log)
+{
+int i;
+int r = 0;
+int str_size;
+int offset = (disk_log) ? 1 : 0;
+char *ctr_str = NULL;
+struct log_c *lc = NULL;
+uint32_t region_size;
+region_t region_count;
+
+/* Already checked argument count */
+
+if (sscanf(argv[offset], "%u", &amp;region_size) != 1) {
+DMWARN("Invalid region size string");
+return -EINVAL;
+}
+
+region_count = dm_sector_div_up(ti-&gt;len, region_size);
+
+lc = kmalloc(sizeof(*lc), GFP_KERNEL);
+if (!lc) {
+DMWARN("Unable to allocate cluster log context.");
+return -ENOMEM;
+}
+
+lc-&gt;ti = ti;
+lc-&gt;region_size = region_size;
+lc-&gt;region_count = region_count;
+lc-&gt;disk_log = disk_log;
+
+/* FIXME: Need to check size of uuid arg */
+memcpy(lc-&gt;uuid, argv[1 + offset], DM_UUID_LEN);
+spin_lock_init(&amp;lc-&gt;flush_lock);
+INIT_LIST_HEAD(&amp;lc-&gt;flush_list);
+
+for (i = 0, str_size = 0; i &lt; argc; i++)
+str_size += strlen(argv[i]) + 1; /* +1 for space between args */
+
+str_size += 20; /* Max number of chars in a printed u64 number */
+
+ctr_str = kzalloc(str_size, GFP_KERNEL);
+if (!ctr_str) {
+DMWARN("Unable to allocate memory for constructor string");
+kfree(lc);
+return -ENOMEM;
+}
+
+for (i = 0, str_size = 0; i &lt; argc; i++)
+str_size += sprintf(ctr_str + str_size, "%s ", argv[i]);
+str_size += sprintf(ctr_str + str_size, "%llu",
+    (unsigned long long)ti-&gt;len);
+
+/* Send table string */
+r = dm_clog_consult_server(lc-&gt;uuid, DM_CLOG_CTR,
+   ctr_str, str_size, NULL, NULL);
+
+if (r == -ESRCH)
+DMERR(" Userspace cluster log server not found");
+
+if (r) {
+kfree(lc);
+kfree(ctr_str);
+} else {
+lc-&gt;ctr_str = ctr_str;
+lc-&gt;ctr_size = str_size;
+log-&gt;context = lc;
+}
+
+return r;
+}
+
+/*
+ * cluster_core_ctr
+ * &lt; at &gt;log
+ * &lt; at &gt;ti
+ * &lt; at &gt;argc
+ * &lt; at &gt;argv
+ *
+ * argv contains:
+ *   &lt;region_size&gt; &lt;uuid&gt; [[no]sync]
+ *
+ * Returns: 0 on success, -XXX on failure
+ */
+static int cluster_core_ctr(struct dm_dirty_log *log, struct dm_target *ti,
+    unsigned int argc, char **argv)
+{
+int i, r;
+if ((argc &lt; 2) || (argc &gt; 3)) {
+DMERR("Too %s arguments to clustered-core mirror log type.",
+      (argc &lt; 2) ? "few" : "many");
+DMERR("  %d arguments supplied:", argc);
+for (i = 0; i &lt; argc; i++)
+DMERR("    %s", argv[i]);
+return -EINVAL;
+}
+
+r = cluster_ctr(log, ti, argc, argv, NULL);
+
+return r;
+}
+
+
+/*
+ * cluster_core_ctr
+ * &lt; at &gt;log
+ * &lt; at &gt;ti
+ * &lt; at &gt;argc
+ * &lt; at &gt;argv
+ *
+ * argv contains:
+ *   &lt;disk&gt; &lt;region_size&gt; &lt;uuid&gt; [[no]sync]
+ *
+ * Returns: 0 on success, -XXX on failure
+ */
+static int cluster_disk_ctr(struct dm_dirty_log *log, struct dm_target *ti,
+    unsigned int argc, char **argv)
+{
+int r, i;
+struct dm_dev *dev;
+
+if ((argc &lt; 3) || (argc &gt; 4)) {
+DMERR("Too %s arguments to clustered-disk mirror log type.",
+      (argc &lt; 3) ? "few" : "many");
+DMERR("  %d arguments supplied:", argc);
+for (i = 0; i &lt; argc; i++)
+DMERR("    %s", argv[i]);
+return -EINVAL;
+}
+
+r = dm_get_device(ti, argv[0], 0, 0, FMODE_READ | FMODE_WRITE, &amp;dev);
+if (r)
+return r;
+
+r = cluster_ctr(log, ti, argc, argv, dev);
+if (r)
+dm_put_device(ti, dev);
+
+return r;
+}
+
+/*
+ * cluster_dtr
+ * &lt; at &gt;log
+ */
+static void cluster_dtr(struct dm_dirty_log *log)
+{
+int r;
+struct log_c *lc = (struct log_c *)log-&gt;context;
+
+r = dm_clog_consult_server(lc-&gt;uuid, DM_CLOG_DTR,
+   NULL, 0,
+   NULL, NULL);
+
+if (lc-&gt;disk_log)
+dm_put_device(lc-&gt;ti, lc-&gt;disk_log);
+kfree(lc-&gt;ctr_str);
+kfree(lc);
+
+return;
+}
+
+/*
+ * cluster_presuspend
+ * &lt; at &gt;log
+ */
+static int cluster_presuspend(struct dm_dirty_log *log)
+{
+int r;
+struct log_c *lc = (struct log_c *)log-&gt;context;
+
+r = dm_clog_consult_server(lc-&gt;uuid, DM_CLOG_PRESUSPEND,
+   NULL, 0,
+   NULL, NULL);
+
+return r;
+}
+
+/*
+ * cluster_postsuspend
+ * &lt; at &gt;log
+ */
+static int cluster_postsuspend(struct dm_dirty_log *log)
+{
+int r;
+struct log_c *lc = (struct log_c *)log-&gt;context;
+
+r = dm_clog_consult_server(lc-&gt;uuid, DM_CLOG_POSTSUSPEND,
+   NULL, 0,
+   NULL, NULL);
+
+return r;
+}
+
+/*
+ * cluster_resume
+ * &lt; at &gt;log
+ */
+static int cluster_resume(struct dm_dirty_log *log)
+{
+int r;
+struct log_c *lc = (struct log_c *)log-&gt;context;
+
+lc-&gt;in_sync_hint = 0;
+r = dm_clog_consult_server(lc-&gt;uuid, DM_CLOG_RESUME,
+   NULL, 0,
+   NULL, NULL);
+
+return r;
+}
+
+/*
+ * cluster_get_region_size
+ * &lt; at &gt;log
+ *
+ * Only called during mirror construction, ok to block.
+ *
+ * Returns: region size (doesn't fail)
+ */
+static uint32_t cluster_get_region_size(struct dm_dirty_log *log)
+{
+struct log_c *lc = (struct log_c *)log-&gt;context;
+
+return lc-&gt;region_size;
+}
+
+/*
+ * cluster_is_clean
+ * &lt; at &gt;log
+ * &lt; at &gt;region
+ *
+ * Check whether a region is clean.  If there is any sort of
+ * failure when consulting the server, we return not clean.
+ *
+ * Returns: 1 if clean, 0 otherwise
+ */
+static int cluster_is_clean(struct dm_dirty_log *log, region_t region)
+{
+int r;
+int is_clean;
+int rdata_size;
+struct log_c *lc = (struct log_c *)log-&gt;context;
+
+rdata_size = sizeof(is_clean);
+r = cluster_do_request(lc, lc-&gt;uuid, DM_CLOG_IS_CLEAN,
+       (char *)&amp;region, sizeof(region),
+       (char *)&amp;is_clean, &amp;rdata_size);
+
+return (r) ? 0 : is_clean;
+}
+
+/*
+ * cluster_in_sync
+ * &lt; at &gt;log
+ * &lt; at &gt;region
+ * &lt; at &gt;can_block: if set, return immediately
+ *
+ * Check if the region is in-sync.  If there is any sort
+ * of failure when consulting the server, we assume that
+ * the region is not in sync.
+ *
+ * Returns: 1 if in-sync, 0 if not-in-sync, -EWOULDBLOCK
+ */
+static int cluster_in_sync(struct dm_dirty_log *log, region_t region, int can_block)
+{
+int r;
+int in_sync;
+int rdata_size;
+struct log_c *lc = (struct log_c *)log-&gt;context;
+
+/*
+ * We can never respond directly - even if in_sync_hint is
+ * set.  This is because another machine could see a device
+ * failure and mark the region out-of-sync.  If we don't go
+ * to userspace to ask, we might think the region is in-sync
+ * and allow a read to pick up data that is stale.  (This is
+ * very unlikely if a device actually fails; but it is very
+ * likely if a connection to one device from one machine fails.)
+ *
+ * There still might be a problem if the mirror caches the region
+ * state as in-sync... but then this call would not be made.  So,
+ * that is a mirror problem.
+ */
+if (!can_block)
+return -EWOULDBLOCK;
+
+rdata_size = sizeof(in_sync);
+r = cluster_do_request(lc, lc-&gt;uuid, DM_CLOG_IN_SYNC,
+       (char *)&amp;region, sizeof(region),
+       (char *)&amp;in_sync, &amp;rdata_size);
+return (r) ? 0 : in_sync;
+}
+
+/*
+ * cluster_flush
+ * &lt; at &gt;log
+ *
+ * This function is ok to block.
+ * The flush happens in two stages.  First, it sends all
+ * clear/mark requests that are on the list.  Then it
+ * tells the server to commit them.  This gives the
+ * server a chance to optimise the commit to the cluster
+ * and/or disk, instead of doing it for every request.
+ *
+ * Additionally, we could implement another thread that
+ * sends the requests up to the server - reducing the
+ * load on flush.  Then the flush would have less in
+ * the list and be responsible for the finishing commit.
+ *
+ * Returns: 0 on success, &lt; 0 on failure
+ */
+static int cluster_flush(struct dm_dirty_log *log)
+{
+int r = 0;
+unsigned long flags;
+struct log_c *lc = (struct log_c *)log-&gt;context;
+LIST_HEAD(flush_list);
+struct flush_entry *fe, *tmp_fe;
+
+spin_lock_irqsave(&amp;lc-&gt;flush_lock, flags);
+list_splice_init(&amp;lc-&gt;flush_list, &amp;flush_list);
+spin_unlock_irqrestore(&amp;lc-&gt;flush_lock, flags);
+
+if (list_empty(&amp;flush_list))
+return 0;
+
+/*
+ * FIXME: Count up requests, group request types,
+ * allocate memory to stick all requests in and
+ * send to server in one go.  Failing the allocation,
+ * do it one by one.
+ */
+
+list_for_each_entry(fe, &amp;flush_list, list) {
+r = cluster_do_request(lc, lc-&gt;uuid, fe-&gt;type,
+       (char *)&amp;fe-&gt;region,
+       sizeof(fe-&gt;region),
+       NULL, NULL);
+if (r)
+goto fail;
+}
+
+r = cluster_do_request(lc, lc-&gt;uuid, DM_CLOG_FLUSH,
+       NULL, 0, NULL, NULL);
+
+fail:
+/*
+ * We can safely remove these entries, even if failure.
+ * Calling code will receive an error and will know that
+ * the log facility has failed.
+ */
+list_for_each_entry_safe(fe, tmp_fe, &amp;flush_list, list) {
+list_del(&amp;fe-&gt;list);
+mempool_free(fe, flush_entry_pool);
+}
+
+if (r)
+dm_table_event(lc-&gt;ti-&gt;table);
+
+return r;
+}
+
+/*
+ * cluster_mark_region
+ * &lt; at &gt;log
+ * &lt; at &gt;region
+ *
+ * This function should avoid blocking unless absolutely required.
+ * (Memory allocation is valid for blocking.)
+ */
+static void cluster_mark_region(struct dm_dirty_log *log, region_t region)
+{
+unsigned long flags;
+struct log_c *lc = (struct log_c *)log-&gt;context;
+struct flush_entry *fe;
+
+/* Wait for an allocation, but _never_ fail */
+fe = mempool_alloc(flush_entry_pool, GFP_NOIO);
+BUG_ON(!fe);
+
+spin_lock_irqsave(&amp;lc-&gt;flush_lock, flags);
+fe-&gt;type = DM_CLOG_MARK_REGION;
+fe-&gt;region = region;
+list_add(&amp;fe-&gt;list, &amp;lc-&gt;flush_list);
+spin_unlock_irqrestore(&amp;lc-&gt;flush_lock, flags);
+
+return;
+}
+
+/*
+ * cluster_clear_region
+ * &lt; at &gt;log
+ * &lt; at &gt;region
+ *
+ * This function must not block.
+ * So, the alloc can't block.  In the worst case, it is ok to
+ * fail.  It would simply mean we can't clear the region.
+ * Does nothing to current sync context, but does mean
+ * the region will be re-sync'ed on a reload of the mirror
+ * even though it is in-sync.
+ */
+static void cluster_clear_region(struct dm_dirty_log *log, region_t region)
+{
+unsigned long flags;
+struct log_c *lc = (struct log_c *)log-&gt;context;
+struct flush_entry *fe;
+
+/*
+ * If we fail to allocate, we skip the clearing of
+ * the region.  This doesn't hurt us in any way, except
+ * to cause the region to be resync'ed when the
+ * device is activated next time.
+ */
+fe = mempool_alloc(flush_entry_pool, GFP_ATOMIC);
+if (!fe) {
+DMERR("Failed to allocate memory to clear region.");
+return;
+}
+
+spin_lock_irqsave(&amp;lc-&gt;flush_lock, flags);
+fe-&gt;type = DM_CLOG_CLEAR_REGION;
+fe-&gt;region = region;
+list_add(&amp;fe-&gt;list, &amp;lc-&gt;flush_list);
+spin_unlock_irqrestore(&amp;lc-&gt;flush_lock, flags);
+
+return;
+}
+
+/*
+ * cluster_get_resync_work
+ * &lt; at &gt;log
+ * &lt; at &gt;region
+ *
+ * Get a region that needs recovery.  It is valid to return
+ * an error for this function.
+ *
+ * Returns: 1 if region filled, 0 if no work, &lt;0 on error
+ */
+static int cluster_get_resync_work(struct dm_dirty_log *log, region_t *region)
+{
+int r;
+int rdata_size;
+struct log_c *lc = (struct log_c *)log-&gt;context;
+struct { int i; region_t r; } pkg;
+
+if (lc-&gt;in_sync_hint &gt;= lc-&gt;region_count)
+return 0;
+
+rdata_size = sizeof(pkg);
+r = cluster_do_request(lc, lc-&gt;uuid, DM_CLOG_GET_RESYNC_WORK,
+       NULL, 0,
+       (char *)&amp;pkg, &amp;rdata_size);
+
+*region = pkg.r;
+return (r) ? r : pkg.i;
+}
+
+/*
+ * cluster_set_region_sync
+ * &lt; at &gt;log
+ * &lt; at &gt;region
+ * &lt; at &gt;in_sync
+ *
+ * Set the sync status of a given region.  This function
+ * must not fail.
+ */
+static void cluster_set_region_sync(struct dm_dirty_log *log,
+    region_t region, int in_sync)
+{
+int r;
+struct log_c *lc = (struct log_c *)log-&gt;context;
+struct { region_t r; int i; } pkg;
+
+pkg.r = region;
+pkg.i = in_sync;
+
+r = cluster_do_request(lc, lc-&gt;uuid, DM_CLOG_SET_REGION_SYNC,
+       (char *)&amp;pkg, sizeof(pkg),
+       NULL, NULL);
+
+/*
+ * It would be nice to be able to report failures.
+ * However, it is easy emough to detect and resolve.
+ */
+return;
+}
+
+/*
+ * cluster_get_sync_count
+ * &lt; at &gt;log
+ *
+ * If there is any sort of failure when consulting the server,
+ * we assume that the sync count is zero.
+ *
+ * Returns: sync count on success, 0 on failure
+ */
+static region_t cluster_get_sync_count(struct dm_dirty_log *log)
+{
+int r;
+int rdata_size;
+region_t sync_count;
+struct log_c *lc = (struct log_c *)log-&gt;context;
+
+rdata_size = sizeof(sync_count);
+r = cluster_do_request(lc, lc-&gt;uuid, DM_CLOG_GET_SYNC_COUNT,
+       NULL, 0,
+       (char *)&amp;sync_count, &amp;rdata_size);
+
+if (r)
+return 0;
+
+if (sync_count &gt;= lc-&gt;region_count)
+lc-&gt;in_sync_hint = lc-&gt;region_count;
+
+return sync_count;
+}
+
+/*
+ * cluster_status
+ * &lt; at &gt;log
+ * &lt; at &gt;status_type
+ * &lt; at &gt;result
+ * &lt; at &gt;maxlen
+ *
+ * Returns: amount of space consumed
+ */
+static int cluster_status(struct dm_dirty_log *log, status_type_t status_type,
+  char *result, unsigned int maxlen)
+{
+int r = 0;
+unsigned int sz = maxlen;
+struct log_c *lc = (struct log_c *)log-&gt;context;
+
+switch(status_type) {
+case STATUSTYPE_INFO:
+r = cluster_do_request(lc, lc-&gt;uuid, DM_CLOG_STATUS_INFO,
+       NULL, 0,
+       result, &amp;sz);
+/*
+ * FIXME: If we fail to contact server, we should still
+ * populate this with parsible results
+ */
+break;
+case STATUSTYPE_TABLE:
+r = cluster_do_request(lc, lc-&gt;uuid, DM_CLOG_STATUS_TABLE,
+       NULL, 0,
+       result, &amp;sz);
+break;
+}
+return (r) ? 0: sz;
+}
+
+/*
+ * cluster_is_remote_recovering
+ * &lt; at &gt;log
+ * &lt; at &gt;region
+ *
+ * Returns: 1 if region recovering, 0 otherwise
+ */
+static int cluster_is_remote_recovering(struct dm_dirty_log *log,
+region_t region)
+{
+int r;
+struct log_c *lc = (struct log_c *)log-&gt;context;
+static unsigned long long limit = 0;
+struct { int is_recovering; uint64_t in_sync_hint; } pkg;
+int rdata_size = sizeof(pkg);
+
+/*
+ * Once the mirror has been reported to be in-sync,
+ * it will never again ask for recovery work.  So,
+ * we can safely say there is not a remote machine
+ * recovering if the device is in-sync.  (in_sync_hint
+ * must be reset at resume time.)
+ */
+if (region &lt; lc-&gt;in_sync_hint)
+return 0;
+else if (jiffies &lt; limit)
+return 1;
+
+limit = jiffies + (HZ / 4);
+r = cluster_do_request(lc, lc-&gt;uuid, DM_CLOG_IS_REMOTE_RECOVERING,
+       (char *)&amp;region, sizeof(region),
+       (char *)&amp;pkg, &amp;rdata_size);
+if (r)
+return 1;
+
+lc-&gt;in_sync_hint = pkg.in_sync_hint;
+
+return pkg.is_recovering;
+}
+
+static struct dm_dirty_log_type _clustered_core_type = {
+.name = "clustered-core",
+.module = THIS_MODULE,
+.ctr = cluster_core_ctr,
+.dtr = cluster_dtr,
+.presuspend = cluster_presuspend,
+.postsuspend = cluster_postsuspend,
+.resume = cluster_resume,
+.get_region_size = cluster_get_region_size,
+.is_clean = cluster_is_clean,
+.in_sync = cluster_in_sync,
+.flush = cluster_flush,
+.mark_region = cluster_mark_region,
+.clear_region = cluster_clear_region,
+.get_resync_work = cluster_get_resync_work,
+.set_region_sync = cluster_set_region_sync,
+.get_sync_count = cluster_get_sync_count,
+.status = cluster_status,
+.is_remote_recovering = cluster_is_remote_recovering,
+};
+
+static struct dm_dirty_log_type _clustered_disk_type = {
+.name = "clustered-disk",
+.module = THIS_MODULE,
+.ctr = cluster_disk_ctr,
+.dtr = cluster_dtr,
+.presuspend = cluster_presuspend,
+.postsuspend = cluster_postsuspend,
+.resume = cluster_resume,
+.get_region_size = cluster_get_region_size,
+.is_clean = cluster_is_clean,
+.in_sync = cluster_in_sync,
+.flush = cluster_flush,
+.mark_region = cluster_mark_region,
+.clear_region = cluster_clear_region,
+.get_resync_work = cluster_get_resync_work,
+.set_region_sync = cluster_set_region_sync,
+.get_sync_count = cluster_get_sync_count,
+.status = cluster_status,
+.is_remote_recovering = cluster_is_remote_recovering,
+};
+
+static int __init cluster_dirty_log_init(void)
+{
+int r = 0;
+
+flush_entry_pool = mempool_create(100, flush_entry_alloc,
+  flush_entry_free, NULL);
+
+if (!flush_entry_pool) {
+DMWARN("Unable to create flush_entry_pool:  No memory.");
+return -ENOMEM;
+}
+
+r = dm_clog_tfr_init();
+if (r) {
+DMWARN("Unable to initialize cluster log communications");
+mempool_destroy(flush_entry_pool);
+return r;
+}
+
+r = dm_dirty_log_type_register(&amp;_clustered_core_type);
+if (r) {
+DMWARN("Couldn't register clustered-core dirty log type");
+dm_clog_tfr_exit();
+mempool_destroy(flush_entry_pool);
+return r;
+}
+
+r = dm_dirty_log_type_register(&amp;_clustered_disk_type);
+if (r) {
+DMWARN("Couldn't register clustered-disk dirty log type");
+dm_dirty_log_type_unregister(&amp;_clustered_core_type);
+dm_clog_tfr_exit();
+mempool_destroy(flush_entry_pool);
+return r;
+}
+
+DMINFO("(built %s %s) installed", __DATE__, __TIME__);
+return 0;
+}
+
+static void __exit cluster_dirty_log_exit(void)
+{
+dm_dirty_log_type_unregister(&amp;_clustered_disk_type);
+dm_dirty_log_type_unregister(&amp;_clustered_core_type);
+dm_clog_tfr_exit();
+mempool_destroy(flush_entry_pool);
+DMINFO("(built %s %s) removed", __DATE__, __TIME__);
+return;
+}
+
+module_init(cluster_dirty_log_init);
+module_exit(cluster_dirty_log_exit);
+
+MODULE_DESCRIPTION(DM_NAME " cluster-aware dirty log");
+MODULE_AUTHOR("Jonathan Brassow &lt;dm-devel&lt; at &gt;redhat.com&gt;");
+MODULE_LICENSE("GPL");
Index: linux-2.6/drivers/md/Kconfig
===================================================================
--- linux-2.6.orig/drivers/md/Kconfig
+++ linux-2.6/drivers/md/Kconfig
&lt; at &gt;&lt; at &gt; -264,6 +264,15 &lt; at &gt;&lt; at &gt; config DM_MIRROR
          Allow volume managers to mirror logical volumes, also
          needed for live data migration tools such as 'pvmove'.
 
+config DM_CLOG
+       tristate "Mirror cluster logging (EXPERIMENTAL)"
+       depends on DM_MIRROR &amp;&amp; EXPERIMENTAL
+       ---help---
+         Cluster logging allows device-mapper mirroring to be
+         cluster-aware.  Mirror devices can be used by multiple
+         machines at the same time.  Note: this will not make
+         your applications cluster-aware.
+
 config DM_ZERO
 tristate "Zero target"
 depends on BLK_DEV_DM
Index: linux-2.6/drivers/md/Makefile
===================================================================
--- linux-2.6.orig/drivers/md/Makefile
+++ linux-2.6/drivers/md/Makefile
&lt; at &gt;&lt; at &gt; -8,6 +8,7 &lt; at &gt;&lt; at &gt; dm-multipath-objs := dm-path-selector.o 
 dm-snapshot-objs := dm-snap.o dm-exception-store.o
 dm-exstore-shared-objs := dm-ex-store-shared.o
 dm-mirror-objs:= dm-raid1.o
+dm-log-clustered-objs  := dm-clog.o dm-clog-tfr.o
 md-mod-objs     := md.o bitmap.o
 raid456-objs:= raid5.o raid6algos.o raid6recov.o raid6tables.o \
    raid6int1.o raid6int2.o raid6int4.o \
&lt; at &gt;&lt; at &gt; -37,6 +38,7 &lt; at &gt;&lt; at &gt; obj-$(CONFIG_DM_MULTIPATH)+= dm-multipa
 obj-$(CONFIG_DM_SNAPSHOT)+= dm-snapshot.o
 obj-$(CONFIG_DM_EXSTORE_SHARED)+= dm-exstore-shared.o
 obj-$(CONFIG_DM_MIRROR)+= dm-mirror.o dm-log.o dm-region-hash.o
+obj-$(CONFIG_DM_CLOG)+= dm-log-clustered.o
 obj-$(CONFIG_DM_ZERO)+= dm-zero.o
 
 quiet_cmd_unroll = UNROLL  $&lt; at &gt;
Index: linux-2.6/include/linux/dm-cluster-log.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/dm-cluster-log.h
&lt; at &gt;&lt; at &gt; -0,0 +1,66 &lt; at &gt;&lt; at &gt;
+/*
+ * Copyright (C) 2006-2008 Red Hat, Inc.
+ *
+ * This file is released under the LGPL.
+ */
+
+#ifndef __DM_CLUSTER_LOG_H__
+#define __DM_CLUSTER_LOG_H__
+
+#include &lt;linux/dm-ioctl.h&gt; /* For DM_UUID_LEN */
+
+#define DM_CLOG_TFR_SIZE 1024
+
+#define DM_CLOG_CTR                    1
+#define DM_CLOG_DTR                    2
+#define DM_CLOG_PRESUSPEND             3
+#define DM_CLOG_POSTSUSPEND            4
+#define DM_CLOG_RESUME                 5
+#define DM_CLOG_GET_REGION_SIZE        6
+#define DM_CLOG_IS_CLEAN               7
+#define DM_CLOG_IN_SYNC                8
+#define DM_CLOG_FLUSH                  9
+#define DM_CLOG_MARK_REGION           10
+#define DM_CLOG_CLEAR_REGION          11
+#define DM_CLOG_GET_RESYNC_WORK       12
+#define DM_CLOG_SET_REGION_SYNC       13
+#define DM_CLOG_GET_SYNC_COUNT        14
+#define DM_CLOG_STATUS_INFO           15
+#define DM_CLOG_STATUS_TABLE          16
+#define DM_CLOG_IS_REMOTE_RECOVERING  17
+
+#define RQ_TYPE(x) \
+((x) == DM_CLOG_CTR) ? "DM_CLOG_CTR" : \
+((x) == DM_CLOG_DTR) ? "DM_CLOG_DTR" : \
+((x) == DM_CLOG_PRESUSPEND) ? "DM_CLOG_PRESUSPEND" : \
+((x) == DM_CLOG_POSTSUSPEND) ? "DM_CLOG_POSTSUSPEND" : \
+((x) == DM_CLOG_RESUME) ? "DM_CLOG_RESUME" : \
+((x) == DM_CLOG_GET_REGION_SIZE) ? "DM_CLOG_GET_REGION_SIZE" : \
+((x) == DM_CLOG_IS_CLEAN) ? "DM_CLOG_IS_CLEAN" : \
+((x) == DM_CLOG_IN_SYNC) ? "DM_CLOG_IN_SYNC" : \
+((x) == DM_CLOG_FLUSH) ? "DM_CLOG_FLUSH" : \
+((x) == DM_CLOG_MARK_REGION) ? "DM_CLOG_MARK_REGION" : \
+((x) == DM_CLOG_CLEAR_REGION) ? "DM_CLOG_CLEAR_REGION" : \
+((x) == DM_CLOG_GET_RESYNC_WORK) ? "DM_CLOG_GET_RESYNC_WORK" : \
+((x) == DM_CLOG_SET_REGION_SYNC) ? "DM_CLOG_SET_REGION_SYNC" : \
+((x) == DM_CLOG_GET_SYNC_COUNT) ? "DM_CLOG_GET_SYNC_COUNT" : \
+((x) == DM_CLOG_STATUS_INFO) ? "DM_CLOG_STATUS_INFO" : \
+((x) == DM_CLOG_STATUS_TABLE) ? "DM_CLOG_STATUS_TABLE" : \
+((x) == DM_CLOG_IS_REMOTE_RECOVERING) ? \
+"DM_CLOG_IS_REMOTE_RECOVERING" : NULL
+
+struct clog_tfr {
+uint64_t private[2];
+char uuid[DM_UUID_LEN]; /* Ties a request to a specific mirror log */
+
+int error;              /* Used by server to inform of errors */
+uint32_t originator;    /* Cluster ID of this machine */
+
+uint32_t seq;           /* Sequence number for request */
+uint32_t request_type;  /* DM_CLOG_* */
+uint32_t data_size;     /* How much data (not including this struct) */
+
+char data[0];
+};
+
+#endif /* __DM_CLUSTER_LOG_H__ */


</description>
    <dc:creator>Jonathan Brassow</dc:creator>
    <dc:date>2008-11-14T17:44:27</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6845">
    <title>[Patch 1 of 2]: Cluster-aware dirty log - Log APIaddition</title>
    <link>http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6845</link>
    <description>This patch can be considered by itself.  The second patch I am sending
will be the actual cluster log module, but these 2 patches don't
necessarily need to go together.

 brassow

The logging API needs an extra function to make cluster mirroring
possible.  This new function allows us to check whether a mirror
region is being recovered on another machine in the cluster.  This
helps us prevent simultaneous recovery I/O and process I/O to the
same locations on disk.

Cluster-aware log modules will implement this function.  Single
machine log modules will not.  So, there is no performance
penalty for single machine mirrors.

Signed-off-by: Jonathan Brassow &lt;jbrassow&lt; at &gt;redhat.com&gt;

Index: linux-2.6/drivers/md/dm-raid1.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-raid1.c
+++ linux-2.6/drivers/md/dm-raid1.c
&lt; at &gt;&lt; at &gt; -588,6 +588,9 &lt; at &gt;&lt; at &gt; static void do_writes(struct mirror_set 
 int state;
 struct bio *bio;
 struct bio_list sync, nosync, recover, *this_list = NULL;
+struct bio_list requeue;
+struct dm_dirty_log *log = dm_rh_dirty_log(ms-&gt;rh);
+region_t region;
 
 if (!writes-&gt;head)
 return;
&lt; at &gt;&lt; at &gt; -598,10 +601,18 &lt; at &gt;&lt; at &gt; static void do_writes(struct mirror_set 
 bio_list_init(&amp;sync);
 bio_list_init(&amp;nosync);
 bio_list_init(&amp;recover);
+bio_list_init(&amp;requeue);
 
 while ((bio = bio_list_pop(writes))) {
-state = dm_rh_get_state(ms-&gt;rh,
-dm_rh_bio_to_region(ms-&gt;rh, bio), 1);
+region = dm_rh_bio_to_region(ms-&gt;rh, bio);
+
+if (log-&gt;type-&gt;is_remote_recovering &amp;&amp;
+    log-&gt;type-&gt;is_remote_recovering(log, region)) {
+bio_list_add(&amp;requeue, bio);
+continue;
+}
+
+state = dm_rh_get_state(ms-&gt;rh, region, 1);
 switch (state) {
 case DM_RH_CLEAN:
 case DM_RH_DIRTY:
&lt; at &gt;&lt; at &gt; -621,6 +632,16 &lt; at &gt;&lt; at &gt; static void do_writes(struct mirror_set 
 }
 
 /*
+ * Add bios that are delayed due to remote recovery
+ * back on to the write queue
+ */
+if (unlikely(requeue.head)) {
+spin_lock_irq(&amp;ms-&gt;lock);
+bio_list_merge(&amp;ms-&gt;writes, &amp;requeue);
+spin_unlock_irq(&amp;ms-&gt;lock);
+}
+
+/*
  * Increment the pending counts for any regions that will
  * be written to (writes to recover regions are going to
  * be delayed).
Index: linux-2.6/include/linux/dm-dirty-log.h
===================================================================
--- linux-2.6.orig/include/linux/dm-dirty-log.h
+++ linux-2.6/include/linux/dm-dirty-log.h
&lt; at &gt;&lt; at &gt; -113,6 +113,16 &lt; at &gt;&lt; at &gt; struct dm_dirty_log_type {
  */
 int (*status)(struct dm_dirty_log *log, status_type_t status_type,
       char *result, unsigned maxlen);
+
+/*
+ * is_remote_recovering is necessary for cluster mirroring. It provides
+ * a way to detect recovery on another node, so we aren't writing
+ * concurrently.  This function is likely to block (when a cluster log
+ * is used).
+ *
+ * Returns: 0, 1
+ */
+int (*is_remote_recovering)(struct dm_dirty_log *log, region_t region);
 };
 
 int dm_dirty_log_type_register(struct dm_dirty_log_type *type);


</description>
    <dc:creator>Jonathan Brassow</dc:creator>
    <dc:date>2008-11-14T17:44:21</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6843">
    <title>[PATCHES] new solution for dm_any_congested crash</title>
    <link>http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6843</link>
    <description>Hi

The Chandra's patch was correct, but the problem is more serious (the same 
crash could happen in dm_merge_bvec, dm_unplug_all or at some other dm 
places), so I had to rework reference counting.

These are three patches.
1. reverts Chadra's changes
2. just a little swap of two calls, to prepare for the third
3. the reference counting rework

Chandra, please test the patches at your system (without your original 
patch) and verify that they avoid the crashes as well as your patch does.

MikulasRevert dm-avoid-destroying-table-in-dm_any_congested.patch

The patch is correct, however the problem is more serious.

The same lazy destructor calls can happen in dm_merge_bvec,
dm_unplug_all and in many other device mapper places. So, a more
generalized solution needs to be developed.

Signed-off-by: Mikulas Patocka &lt;mpatocka&lt; at &gt;redhat.com&gt;

---
 drivers/md/dm.c |   24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

Index: linux-2.6.28-rc4-devel/drivers/md/dm.c
===================================================================
--- linux-2.6.28-rc4-devel.orig/drivers/md/dm.c2008-11-14 02:39:36.000000000 +0100
+++ linux-2.6.28-rc4-devel/drivers/md/dm.c2008-11-14 02:39:09.000000000 +0100
&lt; at &gt;&lt; at &gt; -941,24 +941,16 &lt; at &gt;&lt; at &gt; static void dm_unplug_all(struct request
 
 static int dm_any_congested(void *congested_data, int bdi_bits)
 {
-int r = bdi_bits;
-struct mapped_device *md = congested_data;
-struct dm_table *map;
-
-atomic_inc(&amp;md-&gt;pending);
-
-if (!test_bit(DMF_BLOCK_IO, &amp;md-&gt;flags)) {
-map = dm_get_table(md);
-if (map) {
-r = dm_table_any_congested(map, bdi_bits);
-dm_table_put(map);
-}
-}
+int r;
+struct mapped_device *md = (struct mapped_device *) congested_data;
+struct dm_table *map = dm_get_table(md);
 
-if (!atomic_dec_return(&amp;md-&gt;pending))
-/* nudge anyone waiting on suspend queue */
-wake_up(&amp;md-&gt;wait);
+if (!map || test_bit(DMF_BLOCK_IO, &amp;md-&gt;flags))
+r = bdi_bits;
+else
+r = dm_table_any_congested(map, bdi_bits);
 
+dm_table_put(map);
 return r;
 }
 
Swap these two reference decrements ... so that the last reference to
the table is dropped in __unbind.

This is required for the following patch,
dm-table-rework-reference-counting.patch, which will change the logic in
such a way that table destructor is called only at specific points in
the code.

Signed-off-by: Mikulas Patocka &lt;mpatocka&lt; at &gt;redhat.com&gt;

---
 drivers/md/dm.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.28-rc4-devel/drivers/md/dm.c
===================================================================
--- linux-2.6.28-rc4-devel.orig/drivers/md/dm.c2008-11-14 02:39:09.000000000 +0100
+++ linux-2.6.28-rc4-devel/drivers/md/dm.c2008-11-14 02:43:10.000000000 +0100
&lt; at &gt;&lt; at &gt; -1322,8 +1322,8 &lt; at &gt;&lt; at &gt; void dm_put(struct mapped_device *md)
 dm_table_presuspend_targets(map);
 dm_table_postsuspend_targets(map);
 }
-__unbind(md);
 dm_table_put(map);
+__unbind(md);
 free_dev(md);
 }
 }
Rework table reference counting.

The code before the patch uses a reference counters on a table. When the
last reference is dropped and the counter reaches zero, the table
destructor is called.
Table reference counters are acquired/released from upcalls from other
kernel code (dm_any_congested, dm_merge_bvec, dm_unplug_all).
If the reference counter reaches zero in one of the upcalls, the table
destructor is called from almost random kernel code.

This leads to various problems:
* dm_any_congested being called under a spinlock, which calls the
  destructor, which calls some sleeping function.
* the destructor attempting to take a lock that is already taken by the
  same process.
* stale reference from some other kernel code keeps the table
  constructed, which keeps some devices open, even after successful
  return from "dmsetup remove". This can confuse lvm and prevent closing
  of underlying devices or reusing device minor numbers.

The patch changes reference counting so that there is the table
destructor can be called only at predetermined places.

The table has always exactly one reference from either mapped_device-&gt;map
or hash_cell-&gt;new_map. This reference is not counted in table-&gt;holders.
A pair of dm_create_table/dm_destroy_table functions is used for table
creation/destruction.

Temporary references from the other code increase table-&gt;holders. A pair
of dm_table_get/dm_table_put functions is used to manipulate it.

When the table is about to be destroyed, we wait for table-&gt;holders to
reach 0. Then, we call the table destructor.

This way, the destructor is called only at specific point
(dm_table_destroy function) and the above problems associated with lazy
destructing can't happen.

Signed-off-by: Mikulas Patocka &lt;mpatocka&lt; at &gt;redhat.com&gt;

---
 drivers/md/dm-ioctl.c |   10 ++++------
 drivers/md/dm-table.c |   27 +++++++++++++++++++++++----
 drivers/md/dm.c       |    7 ++++---
 drivers/md/dm.h       |    1 +
 4 files changed, 32 insertions(+), 13 deletions(-)

Index: linux-2.6.28-rc4-devel/drivers/md/dm-ioctl.c
===================================================================
--- linux-2.6.28-rc4-devel.orig/drivers/md/dm-ioctl.c2008-11-14 02:36:45.000000000 +0100
+++ linux-2.6.28-rc4-devel/drivers/md/dm-ioctl.c2008-11-14 02:43:14.000000000 +0100
&lt; at &gt;&lt; at &gt; -233,7 +233,7 &lt; at &gt;&lt; at &gt; static void __hash_remove(struct hash_ce
 }
 
 if (hc-&gt;new_map)
-dm_table_put(hc-&gt;new_map);
+dm_table_destroy(hc-&gt;new_map);
 dm_put(hc-&gt;md);
 free_cell(hc);
 }
&lt; at &gt;&lt; at &gt; -827,8 +827,8 &lt; at &gt;&lt; at &gt; static int do_resume(struct dm_ioctl *pa
 
 r = dm_swap_table(md, new_map);
 if (r) {
+dm_table_destroy(new_map);
 dm_put(md);
-dm_table_put(new_map);
 return r;
 }
 
&lt; at &gt;&lt; at &gt; -836,8 +836,6 &lt; at &gt;&lt; at &gt; static int do_resume(struct dm_ioctl *pa
 set_disk_ro(dm_disk(md), 0);
 else
 set_disk_ro(dm_disk(md), 1);
-
-dm_table_put(new_map);
 }
 
 if (dm_suspended(md))
&lt; at &gt;&lt; at &gt; -1080,7 +1078,7 &lt; at &gt;&lt; at &gt; static int table_load(struct dm_ioctl *p
 }
 
 if (hc-&gt;new_map)
-dm_table_put(hc-&gt;new_map);
+dm_table_destroy(hc-&gt;new_map);
 hc-&gt;new_map = t;
 up_write(&amp;_hash_lock);
 
&lt; at &gt;&lt; at &gt; -1109,7 +1107,7 &lt; at &gt;&lt; at &gt; static int table_clear(struct dm_ioctl *
 }
 
 if (hc-&gt;new_map) {
-dm_table_put(hc-&gt;new_map);
+dm_table_destroy(hc-&gt;new_map);
 hc-&gt;new_map = NULL;
 }
 
Index: linux-2.6.28-rc4-devel/drivers/md/dm-table.c
===================================================================
--- linux-2.6.28-rc4-devel.orig/drivers/md/dm-table.c2008-11-14 02:36:45.000000000 +0100
+++ linux-2.6.28-rc4-devel/drivers/md/dm-table.c2008-11-14 02:43:14.000000000 +0100
&lt; at &gt;&lt; at &gt; -60,6 +60,21 &lt; at &gt;&lt; at &gt; struct dm_table {
 };
 
 /*
+ * New table reference rules:
+ *
+ * The table has always exactly one reference from either mapped_device-&gt;map
+ * or hash_cell-&gt;new_map. This reference is not counted in table-&gt;holders.
+ * A pair of dm_create_table/dm_destroy_table functions is used for table
+ * creation/destruction.
+ *
+ * Temporary references from the other code increase table-&gt;holders. A pair
+ * of dm_table_get/dm_table_put functions is used to manipulate it.
+ *
+ * When the table is about to be destroyed, we wait for table-&gt;holders to
+
+ */
+
+/*
  * Similar to ceiling(log_size(n))
  */
 static unsigned int int_log(unsigned int n, unsigned int base)
&lt; at &gt;&lt; at &gt; -223,7 +238,7 &lt; at &gt;&lt; at &gt; int dm_table_create(struct dm_table **re
 return -ENOMEM;
 
 INIT_LIST_HEAD(&amp;t-&gt;devices);
-atomic_set(&amp;t-&gt;holders, 1);
+atomic_set(&amp;t-&gt;holders, 0);
 
 if (!num_targets)
 num_targets = KEYS_PER_NODE;
&lt; at &gt;&lt; at &gt; -253,10 +268,14 &lt; at &gt;&lt; at &gt; static void free_devices(struct list_hea
 }
 }
 
-static void table_destroy(struct dm_table *t)
+void dm_table_destroy(struct dm_table *t)
 {
 unsigned int i;
 
+while (atomic_read(&amp;t-&gt;holders))
+yield();
+smp_mb();
+
 /* free the indexes (see dm_table_complete) */
 if (t-&gt;depth &gt;= 2)
 vfree(t-&gt;index[t-&gt;depth - 2]);
&lt; at &gt;&lt; at &gt; -294,8 +313,8 &lt; at &gt;&lt; at &gt; void dm_table_put(struct dm_table *t)
 if (!t)
 return;
 
-if (atomic_dec_and_test(&amp;t-&gt;holders))
-table_destroy(t);
+smp_mb__before_atomic_dec();
+atomic_dec(&amp;t-&gt;holders);
 }
 
 /*
Index: linux-2.6.28-rc4-devel/drivers/md/dm.c
===================================================================
--- linux-2.6.28-rc4-devel.orig/drivers/md/dm.c2008-11-14 02:43:10.000000000 +0100
+++ linux-2.6.28-rc4-devel/drivers/md/dm.c2008-11-14 02:43:14.000000000 +0100
&lt; at &gt;&lt; at &gt; -1208,10 +1208,11 &lt; at &gt;&lt; at &gt; static int __bind(struct mapped_device *
 
 __set_size(md, size);
 
-if (size == 0)
+if (size == 0) {
+dm_table_destroy(t);
 return 0;
+}
 
-dm_table_get(t);
 dm_table_event_callback(t, event_callback, md);
 
 write_lock(&amp;md-&gt;map_lock);
&lt; at &gt;&lt; at &gt; -1233,7 +1234,7 &lt; at &gt;&lt; at &gt; static void __unbind(struct mapped_devic
 write_lock(&amp;md-&gt;map_lock);
 md-&gt;map = NULL;
 write_unlock(&amp;md-&gt;map_lock);
-dm_table_put(map);
+dm_table_destroy(map);
 }
 
 /*
Index: linux-2.6.28-rc4-devel/drivers/md/dm.h
===================================================================
--- linux-2.6.28-rc4-devel.orig/drivers/md/dm.h2008-11-14 02:36:45.000000000 +0100
+++ linux-2.6.28-rc4-devel/drivers/md/dm.h2008-11-14 02:43:14.000000000 +0100
&lt; at &gt;&lt; at &gt; -36,6 +36,7 &lt; at &gt;&lt; at &gt; struct dm_table;
 /*-----------------------------------------------------------------
  * Internal table functions.
  *---------------------------------------------------------------*/
+void dm_table_destroy(struct dm_table *t);
 void dm_table_event_callback(struct dm_table *t,
      void (*fn)(void *), void *context);
 struct dm_target *dm_table_get_target(struct dm_table *t, unsigned int index);
</description>
    <dc:creator>Mikulas Patocka</dc:creator>
    <dc:date>2008-11-14T01:55:27</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6840">
    <title>[git pull] 2.6.28-rc5 device-mapper fixes</title>
    <link>http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6840</link>
    <description>Please pull the following device-mapper fixes from:

  master.kernel.org:/pub/scm/linux/kernel/git/agk/linux-2.6-dm.git

Chandra Seetharaman (3):
      dm mpath: avoid attempting to activate null path
      dm mpath: warn if args ignored
      dm: avoid destroying table in dm_any_congested

Heinz Mauelshagen (1):
      dm stripe: fix init failure

Mikulas Patocka (2):
      dm raid1: flush workqueue before destruction
      dm: move pending queue wake_up end_io_acct

 drivers/md/dm-mpath.c  |    8 ++++++--
 drivers/md/dm-raid1.c  |    1 +
 drivers/md/dm-stripe.c |    4 +++-
 drivers/md/dm.c        |   34 +++++++++++++++++++++-------------
 4 files changed, 31 insertions(+), 16 deletions(-)

Thanks,
Alasdair
</description>
    <dc:creator>Alasdair G Kergon</dc:creator>
    <dc:date>2008-11-13T23:49:33</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6831">
    <title>[PATCH 5 of 5] dm-exception-store API</title>
    <link>http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/6831</link>
    <description> brassow

I took the liberty of refactoring the shared exception store
code.  It now aligns nicely with the interface instead of altering
all sorts of things that it should not have been.  'message' interface
is gone, no need to pass around snapid's, it's now a module, etc...

I didn't spend huge amounts of time on this.  It compiles, but there
are probably some bugs there now.  I would also like to see a
'UUID -&gt; snapid' translation added to the code.  This would eliminate
the need to go around asking which indices are open for use.  The
translation table could be stored in the on-disk header.

get_[shared|unique]_info functions need to be macros... function args
should be cleaned up... etc.

- brassow

Index: linux-2.6/drivers/md/Kconfig
===================================================================
--- linux-2.6.orig/drivers/md/Kconfig
+++ linux-2.6/drivers/md/Kconfig
&lt; at &gt;&lt; at &gt; -249,6 +249,14 &lt; at &gt;&lt; at &gt; config DM_SNAPSHOT
        ---help---
          Allow volume managers to take writable snapshots of a device.
 
+config DM_EXSTORE_SHARED
+tristate "Shared exception store (EXPERIMENTAL)"
+depends on DM_SNAPSHOT &amp;&amp; EXPERIMENTAL
+---help---
+  Exception stores are used primarily by snapshots to track
+  COW data.  A shared exception store allows multiple
+  snapshots to utilize the same exception store device.
+
 config DM_MIRROR
        tristate "Mirror target"
        depends on BLK_DEV_DM
Index: linux-2.6/drivers/md/Makefile
===================================================================
--- linux-2.6.orig/drivers/md/Makefile
+++ linux-2.6/drivers/md/Makefile
&lt; at &gt;&lt; at &gt; -6,6 +6,7 &lt; at &gt;&lt; at &gt; dm-mod-objs:= dm.o dm-table.o dm-target
    dm-ioctl.o dm-io.o dm-kcopyd.o
 dm-multipath-objs := dm-path-selector.o dm-mpath.o
 dm-snapshot-objs := dm-snap.o dm-exception-store.o
+dm-exstore-shared-objs := dm-ex-store-shared.o
 dm-mirror-objs:= dm-raid1.o
 md-mod-objs     := md.o bitmap.o
 raid456-objs:= raid5.o raid6algos.o raid6recov.o raid6tables.o \
&lt; at &gt;&lt; at &gt; -34,6 +35,7 &lt; at &gt;&lt; at &gt; obj-$(CONFIG_DM_CRYPT)+= dm-crypt.o
 obj-$(CONFIG_DM_DELAY)+= dm-delay.o
 obj-$(CONFIG_DM_MULTIPATH)+= dm-multipath.o dm-round-robin.o
 obj-$(CONFIG_DM_SNAPSHOT)+= dm-snapshot.o
+obj-$(CONFIG_DM_EXSTORE_SHARED)+= dm-exstore-shared.o
 obj-$(CONFIG_DM_MIRROR)+= dm-mirror.o dm-log.o dm-region-hash.o
 obj-$(CONFIG_DM_ZERO)+= dm-zero.o
 
Index: linux-2.6/drivers/md/dm-ex-store-shared.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/md/dm-ex-store-shared.c
&lt; at &gt;&lt; at &gt; -0,0 +1,1870 &lt; at &gt;&lt; at &gt;
+#include "dm-exception-store.h"
+
+#include &lt;linux/mm.h&gt;
+#include &lt;linux/pagemap.h&gt;
+#include &lt;linux/vmalloc.h&gt;
+#include &lt;linux/slab.h&gt;
+#include &lt;linux/dm-io.h&gt;
+#include &lt;linux/dm-kcopyd.h&gt;
+
+#define DM_MSG_PREFIX "dm-exstore-shared"
+
+#define SHARED_EXCEPTION_MAGIC 0x10161974
+#define SHARED_EXCEPTION_DISK_VERSION 1
+#define FIRST_BITMAP_CHUNK 1
+
+#define MAX_SNAPSHOTS 64
+#define ORIGIN_SNAPID ULLONG_MAX
+
+struct disk_header {
+uint32_t magic;
+
+/*
+ * Is this snapshot valid.  There is no way of recovering
+ * an invalid snapshot.
+ */
+uint32_t valid;
+
+/*
+ * Simple, incrementing version. no backward
+ * compatibility.
+ */
+uint32_t version;
+
+/* In sectors */
+uint32_t chunk_size;
+
+/*
+ * for shared exception
+ */
+uint64_t root_tree_chunk;
+uint64_t snapmask;
+uint32_t tree_level;
+};
+
+struct disk_exception {
+uint64_t old_chunk;
+uint64_t new_chunk;
+};
+
+struct commit_callback {
+void (*callback)(void *, int success);
+void *context;
+};
+
+static struct list_head shared_store_list;
+static spinlock_t shared_store_list_lock;
+
+/*
+ * The top level structure for a shared exception store.
+ */
+struct shared_store {
+struct list_head list; /* link into shared_store_list */
+int version;
+int valid;
+
+dev_t key; /* the exception store device defines this store */
+
+uint32_t exceptions_per_area;
+
+/*
+ * Now that we have an asynchronous kcopyd there is no
+ * need for large chunk sizes, so it wont hurt to have a
+ * whole chunks worth of metadata in memory at once.
+ */
+void *area;
+
+/*
+ * An area of zeros used to clear the next area.
+ */
+void *zero_area;
+
+/*
+ * Used to keep track of which metadata area the data in
+ * 'chunk' refers to.
+ */
+chunk_t current_area;
+
+/*
+ * The next free chunk for an exception.
+ */
+chunk_t next_free;
+
+/*
+ * The index of next free exception in the current
+ * metadata area.
+ */
+uint32_t current_committed;
+
+struct rw_semaphore lock;
+
+atomic_t pending_count;
+uint32_t callback_count;
+struct commit_callback *callbacks;
+struct dm_io_client *io_client;
+
+struct workqueue_struct *metadata_wq;
+
+/*
+ * for shared exception
+ */
+u64 root_tree_chunk;
+u64 snapmask;
+u32 tree_level;
+
+u32 ref_count;  /* number of active users of this share */
+
+unsigned long nr_chunks;
+unsigned long nr_bitmap_chunks;
+unsigned long *bitmap;
+unsigned long cur_bitmap_chunk;
+unsigned long cur_bitmap_index;
+
+struct list_head chunk_buffer_list;
+struct list_head chunk_buffer_dirty_list;
+
+int header_dirty;
+int nr_chunk_buffers;
+};
+
+/*
+ * defines the unique portions of a shared_store, like the snapid.
+ */
+struct unique_store {
+struct dm_exception_store *store;
+struct shared_store *ss;
+
+uint64_t id;
+};
+
+struct chunk_buffer {
+struct list_head list;
+struct list_head dirty_list;
+u64 chunk;
+void *data;
+};
+
+struct node {
+__le32 count;
+__le32 unused;
+struct index_entry {
+__le64 key; /* note: entries[0].key never accessed */
+__le64 chunk; /* node sector address goes here */
+} entries[];
+};
+
+struct leaf {
+__le16 magic;
+__le16 version;
+__le32 count;
+/* !!! FIXME the code doesn't use the base_chunk properly */
+__le64 base_chunk;
+__le64 using_mask;
+
+struct tree_map{
+__le32 offset;
+__le32 rchunk;
+} map[];
+};
+
+struct exception {
+__le64 share;
+__le64 chunk;
+};
+
+static unsigned sectors_to_pages(unsigned sectors)
+{
+return DIV_ROUND_UP(sectors, PAGE_SIZE &gt;&gt; 9);
+}
+
+static int alloc_area(struct shared_store *ss, chunk_t chunk_size)
+{
+int r = -ENOMEM;
+size_t len;
+
+len = chunk_size &lt;&lt; SECTOR_SHIFT;
+
+/*
+ * Allocate the chunk_size block of memory that will hold
+ * a single metadata area.
+ */
+ss-&gt;area = vmalloc(len);
+if (!ss-&gt;area)
+return r;
+
+ss-&gt;zero_area = vmalloc(len);
+if (!ss-&gt;zero_area) {
+vfree(ss-&gt;area);
+return r;
+}
+memset(ss-&gt;zero_area, 0, len);
+
+return 0;
+}
+
+static void free_area(struct shared_store *ss)
+{
+vfree(ss-&gt;area);
+ss-&gt;area = NULL;
+vfree(ss-&gt;zero_area);
+ss-&gt;zero_area = NULL;
+}
+
+struct mdata_req {
+struct dm_io_region *where;
+struct dm_io_request *io_req;
+struct work_struct work;
+int result;
+};
+
+static void do_metadata(struct work_struct *work)
+{
+struct mdata_req *req = container_of(work, struct mdata_req, work);
+
+req-&gt;result = dm_io(req-&gt;io_req, 1, req-&gt;where, NULL);
+}
+
+static struct unique_store *get_unique_info(struct dm_exception_store *store)
+{
+return (struct unique_store *) store-&gt;context;
+}
+
+static struct shared_store *get_shared_info(struct dm_exception_store *store)
+{
+struct unique_store *us = store-&gt;context;
+
+return us-&gt;ss;
+}
+
+/*
+ * Read or write a chunk aligned and sized block of data from a device.
+ */
+static int chunk_io(struct dm_exception_store *store,
+    chunk_t chunk, int rw, int metadata,
+    void *data)
+{
+struct shared_store *ss = get_shared_info(store);
+struct dm_io_region where = {
+.bdev = store-&gt;cow-&gt;bdev,
+.sector = store-&gt;chunk_size * chunk,
+.count = store-&gt;chunk_size,
+};
+struct dm_io_request io_req = {
+.bi_rw = rw,
+.mem.type = DM_IO_VMA,
+.mem.ptr.vma = data,
+.client = ss-&gt;io_client,
+.notify.fn = NULL,
+};
+struct mdata_req req;
+
+if (!metadata)
+return dm_io(&amp;io_req, 1, &amp;where, NULL);
+
+req.where = &amp;where;
+req.io_req = &amp;io_req;
+
+/*
+ * Issue the synchronous I/O from a different thread
+ * to avoid generic_make_request recursion.
+ */
+INIT_WORK(&amp;req.work, do_metadata);
+queue_work(ss-&gt;metadata_wq, &amp;req.work);
+flush_workqueue(ss-&gt;metadata_wq);
+
+return req.result;
+}
+
+static int write_header(struct dm_exception_store *store)
+{
+struct shared_store *ss = get_shared_info(store);
+struct disk_header *dh;
+
+memset(ss-&gt;area, 0, store-&gt;chunk_size &lt;&lt; SECTOR_SHIFT);
+
+dh = (struct disk_header *) ss-&gt;area;
+dh-&gt;magic = cpu_to_le32(SHARED_EXCEPTION_MAGIC);
+dh-&gt;valid = cpu_to_le32(ss-&gt;valid);
+dh-&gt;version = cpu_to_le32(ss-&gt;version);
+dh-&gt;chunk_size = cpu_to_le32(store-&gt;chunk_size);
+
+dh-&gt;root_tree_chunk = cpu_to_le64(ss-&gt;root_tree_chunk);
+dh-&gt;snapmask = cpu_to_le64(ss-&gt;snapmask);
+dh-&gt;tree_level = cpu_to_le32(ss-&gt;tree_level);
+
+ss-&gt;header_dirty = 0;
+
+return chunk_io(store, 0, WRITE, 1, ss-&gt;area);
+}
+
+static inline struct node *buffer2node(struct chunk_buffer *buffer)
+{
+return (struct node *)buffer-&gt;data;
+}
+
+static inline struct leaf *buffer2leaf(struct chunk_buffer *buffer)
+{
+return (struct leaf *)buffer-&gt;data;
+}
+
+static struct chunk_buffer *alloc_chunk_buffer(struct dm_exception_store *store)
+{
+struct shared_store *ss = get_shared_info(store);
+struct chunk_buffer *b;
+
+b = kzalloc(sizeof(*b), GFP_NOFS);
+if (!b) {
+DMERR("%s %d: out of memory", __func__, __LINE__);
+return NULL;
+}
+
+b-&gt;data = vmalloc(store-&gt;chunk_size &lt;&lt; SECTOR_SHIFT);
+if (!b-&gt;data) {
+DMERR("%s %d: out of memory", __func__, __LINE__);
+kfree(b);
+return NULL;
+}
+
+memset(b-&gt;data, 0, store-&gt;chunk_size &lt;&lt; SECTOR_SHIFT);
+
+list_add(&amp;b-&gt;list, &amp;ss-&gt;chunk_buffer_list);
+INIT_LIST_HEAD(&amp;b-&gt;dirty_list);
+
+ss-&gt;nr_chunk_buffers++;
+
+return b;
+}
+
+static void free_chunk_buffer(struct shared_store *ss, struct chunk_buffer *b)
+{
+list_del(&amp;b-&gt;list);
+vfree(b-&gt;data);
+kfree(b);
+
+ss-&gt;nr_chunk_buffers--;
+}
+
+static void shared_drop(struct dm_exception_store *store)
+{
+struct shared_store *ss = get_shared_info(store);
+
+chunk_io(store, ss-&gt;cur_bitmap_chunk, WRITE, 1, ss-&gt;bitmap);
+write_header(store);
+
+ss-&gt;valid = 0;
+}
+
+static struct shared_store *__shared_store_lookup(dev_t key)
+{
+struct shared_store *ss;
+
+list_for_each_entry(ss, &amp;shared_store_list, list)
+if (ss-&gt;key == key)
+return ss;
+return NULL;
+}
+
+/*
+ * __shared_store_alloc
+ * &lt; at &gt;key
+ *
+ * Allocate a new 'struct shared_store' and add it to the
+ * global list.
+ *
+ * Returns: ss on success, NULL on failure
+ */
+static struct shared_store *__shared_store_alloc(dev_t key)
+{
+struct shared_store *ss, *old_ss;
+
+/* allocate the shared store */
+ss = kzalloc(sizeof(*ss), GFP_KERNEL);
+if (!ss)
+return NULL;
+
+INIT_LIST_HEAD(&amp;ss-&gt;list);
+ss-&gt;version = SHARED_EXCEPTION_DISK_VERSION;
+ss-&gt;valid = 1;
+ss-&gt;key = key;
+
+ss-&gt;area = NULL;
+ss-&gt;next_free = 2;/* skipping the header and first area */
+ss-&gt;current_committed = 0;
+init_rwsem(&amp;ss-&gt;lock);
+
+
+
+
+
+INIT_LIST_HEAD(&amp;ss-&gt;chunk_buffer_list);
+INIT_LIST_HEAD(&amp;ss-&gt;chunk_buffer_dirty_list);
+
+
+
+
+
+ss-&gt;callback_count = 0;
+atomic_set(&amp;ss-&gt;pending_count, 0);
+ss-&gt;callbacks = NULL;
+
+ss-&gt;metadata_wq = create_singlethread_workqueue("ksnaphd");
+if (!ss-&gt;metadata_wq) {
+kfree(ss);
+DMERR("couldn't start header metadata update thread");
+return NULL;
+}
+
+spin_lock(&amp;shared_store_list_lock);
+old_ss = __shared_store_lookup(key);
+if (old_ss) {
+/* We lost the race */
+destroy_workqueue(ss-&gt;metadata_wq);
+kfree(ss);
+ss = old_ss;
+} else
+list_add(&amp;ss-&gt;list, &amp;shared_store_list);
+spin_unlock(&amp;shared_store_list_lock);
+
+return ss;
+}
+
+static struct shared_store *__shared_store_find(dev_t key)
+{
+struct shared_store *ss;
+
+ss = __shared_store_lookup(key);
+if (!ss) {
+spin_unlock(&amp;shared_store_list_lock);
+ss = __shared_store_alloc(key);
+spin_lock(&amp;shared_store_list_lock);
+}
+
+return ss;
+}
+
+/*
+ * get_shared_store
+ * &lt; at &gt;key
+ *
+ * Find a 'struct shared_store' in the global list, and
+ * increment the reference count.
+ *
+ * Returns: ss on success, NULL on error
+ */
+static struct shared