<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.comp.linux.drbd">
    <title>gmane.comp.linux.drbd</title>
    <link>http://blog.gmane.org/gmane.comp.linux.drbd</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24145"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24144"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24143"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24142"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24141"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24140"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24139"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24138"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24137"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24136"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24135"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24134"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24133"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24132"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24131"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24130"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24129"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24128"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24127"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.linux.drbd/24126"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24145">
    <title>Re: Recovering from erroneous sync state</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24145</link>
    <description>&lt;pre&gt;
On May 23, 2012, at 4:48 PM, Lars Ellenberg wrote:


One would think, yes...but that's all it shows.  (No idea why.)


Ah, hadn't thought to reject outgoing packets as well -- and doing so seems to have finally had the intended effect (got both sides into StandAlone, then able to force one back to primary and reattach).  All back to normal now.

Thanks!

Zev
&lt;/pre&gt;</description>
    <dc:creator>Zev Weiss</dc:creator>
    <dc:date>2012-05-24T20:03:03</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24144">
    <title>re source Primary but inconsistent</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24144</link>
    <description>&lt;pre&gt;
I was creating a new resource on our servers and ended up in an interesting
state and I'm wondering if DRBD (8.3.12) just handles this correctly, or if
this is bad.

I created a stacked resource on server1, using steps like this:
1) created an lvm to back the resource lvcreate -n store1 -L 2048g backingvg
2) added the resource configuration to /etc/drbd.d directory for both the
lower and upper resources and copied the config to the other nodes.
3) ran drbdadm create-md on the lower resource: drbdadm create-md
store1_lower
4) brought up the lower device: drbdadm up store1_lower
5) made the resource primary: drbdsetup /dev/drbd1 primary -o
6) ran drbdadm create-md on the upper resource: drbdadm --stacked create-md
store1
7) brought up the upper resource: drbdadm up --stacked store1
8) made the upper resource primary: drbdsetup /dev/drbd11 primary -o
9) formatted the upper resource: mkfs.ext4 /dev/drbd11

It should be noted that all my other resources are primary on the other node
(server2), I thought it would be less risky to work on this on the
non-active node initially.  I used a different ip for the stacked ip than my
active nodes while configuring it on server1.  After getting the resource
formatted I stopped all the resources, edited the config to use the same ip
I use for the stacked ip on the rest of my resources and then just brought
up the lower resource and left it secondary.

next on server2 I did steps 1, 3 and 4 to bring up the resource and let it
start syncing.

So at this point i have the resource secondary/UpToDate on server1.  Then,
just to see if it would, I tried to make the resource primary on server2. 
To my surprise drbd happily made the resource primary on server2!  I didn't
issue the drbdsetup command that forces it primary as above, I only issued
the standard command: 
drbdadm primary storage1_lower 

Intrigued, I went ahead and brought up the upper resource and made it
primary also.  I then brought up the resource on the remote node (storage3)
and let it start syncing.

So, now i have the following:
server1:     1: cs:SyncSource ro:Secondary/Primary ds:UpToDate/Inconsistent
C r-----
server2:     1: cs:SyncTarget ro:Primary/Secondary ds:Inconsistent/UpToDate
C r-----
server2:    11: cs:SyncSource ro:Secondary/Secondary
ds:UpToDate/Inconsistent A r-----
server3:    11: cs:SyncTarget ro:Secondary/Secondary
ds:Inconsistent/UpToDate A r-----

So, my question is, did i just bork the resource totally and I should start
over, or is DRBD smart enough to handle this situation by grabbing data not
yet synced to server2 from server1 if it is accessed, and updating data on
server1 if it is written to the primary on server2?

I did make the upper resource primary and mounted it on server2, just to see
if I could, and it worked also.  Since the drive is empty all i could see
was the lost and found directory.

&lt;/pre&gt;</description>
    <dc:creator>envisionrx</dc:creator>
    <dc:date>2012-05-24T17:05:57</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24143">
    <title>harddisk problems</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24143</link>
    <description>&lt;pre&gt;i have so much problems with my (virtual) disks.
i see articles from virtual box

ext4 and kvm going error?
i need to set the Host IO ?

But i don't understand?
Is there a solution?

2 x 
quad core super micro.
32GB 
RAID5 6TB (4TB)



Buffer I/O error on device sdc1, logical block 65273728
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 65273729
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 65273730
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 65273731
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 65273732
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 65273733
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 65273734
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 65273735
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 65273736
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 65273737
lost page write due to I/O error on sdc1
JBD2: Detected IO errors while flushing file data on sdc1-8
JBD2: Detected IO errors while flushing file data on sdc1-8
JBD2: Detected IO errors while flushing file data on sdc1-8
JBD2: Detected IO errors while flushing file data on sdc1-8
JBD2: Detected IO errors while flushing file data on sdc1-8
JBD2: Detected IO errors while flushing file data on sdc1-8
JBD2: Detected IO errors while flushing file data on sdc1-8
JBD2: Detected IO errors while flushing file data on sdc1-8
JBD2: Detected IO errors while flushing file data on sdc1-8
JBD2: Detected IO errors while flushing file data on sdc1-8
JBD2: Detected IO errors while flushing file data on sdc1-8
JBD2: Detected IO errors while flushing file data on sdc1-8
ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata2.01: failed command: WRITE DMA EXT
ata2.01: cmd 35/00:00:3f:fc:cf/00:04:1f:00:00/f0 tag 0 dma 524288 out
         res 40/00:01:00:00:00/00:00:00:00:00/b0 Emask 0x4 (timeout)
ata2.01: status: { DRDY }
ata2: soft resetting link
ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata1.01: failed command: READ DMA
ata1.01: cmd c8/00:08:c7:10:68/00:00:00:00:00/f4 tag 0 dma 4096 in
         res 40/00:01:00:00:00/00:00:00:00:00/b0 Emask 0x4 (timeout)
ata1.01: status: { DRDY }
ata1: soft resetting link
ata2.01: configured for MWDMA2
ata2.01: device reported invalid CHS sector 0
ata2: EH complete
ata1.00: configured for MWDMA2
ata1.01: configured for MWDMA2
ata1.01: device reported invalid CHS sector 0
ata1: EH complete
sd 0:0:1:0: [sdb] Unhandled error code
sd 0:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
sd 0:0:1:0: [sdb] CDB: Read(10): 28 00 04 95 55 3f 00 01 00 00
ata2: lost interrupt (Status 0x50)
ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
sd 1:0:1:0: [sdc] Unhandled error code
sd 1:0:1:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
sd 1:0:1:0: [sdc] CDB: Write(10): 2a 00 00 00 00 87 00 00 08 00
__ratelimit: 118 callbacks suppressed
Buffer I/O error on device sdc1, logical block 9
ata1.01: failed command: READ DMA EXT
ata1.01: cmd 25/00:00:67:e7:89/00:01:27:00:00/f0 tag 0 dma 131072 in
         res 40/00:01:00:00:00/00:00:00:00:00/b0 Emask 0x4 (timeout)
ata1.01: status: { DRDY }
ata1: soft resetting link
lost page write due to I/O error on sdc1
ata1.00: configured for MWDMA2
ata1.01: configured for MWDMA2
ata1.01: device reported invalid CHS sector 0
&lt;/pre&gt;</description>
    <dc:creator>Marcel Kraan</dc:creator>
    <dc:date>2012-05-24T16:22:59</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24142">
    <title>Re: DRBD initial settings for two disks</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24142</link>
    <description>&lt;pre&gt;-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 24.5.2012 15:23, Florian Haas wrote:
in the example titled "Multi-volume DRBD resource configuration".
So if I don't want to or can't install DRBD 8.4 could I use two
resources to get around this multi-volume limitation in 8.3:

/etc/drbd.conf|:
resource r0 {
        protocol C;
        startup {
                wfc-timeout  15;
                degr-wfc-timeout 60;
        }
        net {
                cram-hmac-alg sha1;
                shared-secret "secret";
        }
        on drbd01 {
                device /dev/drbd0;
                disk /dev/sdb1;
                address 192.168.0.1:7788;
                meta-disk internal;
        }
        on drbd02 {
                device /dev/drbd0;
                disk /dev/sdb1;
                address 192.168.0.2:7788;
                meta-disk internal;
        }
}

resource r1 {
        protocol C;
        startup {
                wfc-timeout  15;
                degr-wfc-timeout 60;
        }
        net {
                cram-hmac-alg sha1;
                shared-secret "secret";
        }
        on drbd01 {
                device /dev/drbd1;
                disk /dev/sdb2;
                address 192.168.0.1:7788;
                meta-disk internal;
        }
        on drbd02 {
                device /dev/drbd1;
                disk /dev/sdb2;
                address 192.168.0.2:7788;
                meta-disk internal;
        }
}


Greetings,
Tero Mäntyvaara
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPvjhjAAoJEHIK6cOQy5X4IH0H/RzZ9Zr0+cCFUcI4u9U+76/6
0mHf8QiS0uR981LL0F+2uDJX4ShMJ3WqNHZMvrgHxyZ+sb4A85TZWuBNR+sWj/Dy
JKkmwvMnyl3JWaWqpsca5JX1yDH33HwcbDvJo4mQv/1q5odRzGV5BbGOVzM4zIWE
7MWLu0gT6fWPvBVxX/HLjUgSPY5kZcvLgZ6/qVZUksSrG8C/TWqQIzjeoyxTDQRj
MUd1K1v3B1B23mcLZFhYLyErq43NTw2D3to9WZKLhMFmBi4Yne6Gu54ocxkEKgTy
juIVd4c0DY3wJ/ZMSk1uTymQCLl610KJRzFoyHq7IVvH0gcXLAobPt9ObvoQV/A=
=2T+d
-----END PGP SIGNATURE-----
_______________________________________________
drbd-user mailing list
drbd-user&amp;lt; at &amp;gt;lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
&lt;/pre&gt;</description>
    <dc:creator>Tero Mäntyvaara</dc:creator>
    <dc:date>2012-05-24T13:32:19</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24141">
    <title>Re: "PingAck not received" messages</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24141</link>
    <description>&lt;pre&gt;
I've seen the "I/O frozen for no reason" problem which seems to be an
upstream XFS issue, which Debian is hardly to blame for. The others I
personally haven't encountered. Just for clarification, what seemed
odd to me was not that you would update off the Debian stock squeeze
kernel, but that you'd consider pulling a CentOS kernel, of ostensibly
thejsame kernel version, into a Debian system. I'd just go to the
current Debian backports kernel. But we're going off topic. :)


Indeed.


It would still be interesting to find out whether you ever saw these
random disconnects with protocol C, or whether this appears to be a
B-only issue.

Cheers,
Florian

&lt;/pre&gt;</description>
    <dc:creator>Florian Haas</dc:creator>
    <dc:date>2012-05-24T13:32:46</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24140">
    <title>Re: "PingAck not received" messages</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24140</link>
    <description>&lt;pre&gt;
No indeed, nothing external that we could detect after hours of layer 2 
tracing, and no messages that would indicate a malfunction on either of 
the hosts.  But this network problem was only visible via DRBD's 
messages, and if it's gone it's hard to reason about it any further (not 
that I miss it).  As I said I couldn't see any symptoms via ICMP or 
TCP-based tests between the hosts.


Then you've not hit the "scheduler divide by zero" bug or the "I/O 
frozen for 120s for no reason" bug or the "CPU#x stuck for 9999999s" 
bug?  These are all things that are filed vaguely on the Redhat bug 
trackers, as far as I know, and usually closed a few kernel versions 
later with "well I haven't seen it for a few kernel versions so it's 
probably OK"!

These are relatively rare bugs, except for some of our customers, when 
they're not at all rare and we haul them up to e.g. whatever wheezy has. 
  Except in this case they broke the briding code in 3.2.0 which is 
going to cause a virtualising customer some problems :-)


Sure, but neither do we pay a penalty for doing it externally.  It's all 
on LVM and proper battery-backed RAID.


10-60 minutes per system.  Long enough that the I/O sensitive VMs 
notice.  And the customer has customers who are up 24 hours a day, so 
there is no reliable "quiet time" when we can reduce their I/O bandwidth 
and not have it commented on.


The DRBDs don't really get hammered at any one time - the backups happen 
direct from LVs on the host, and go over the main (not replication) 
interface.  So the host system's I/O is stressed, sure.

Previously the disconnects happened several times a day, not just when 
the backups ran - this is a separate issue from the one I asked about 
while still being relevant to the list.

Arguably a customer running a heavily interactive system to very remote 
destinations shouldn't be using such a complex I/O stack and should use 
dedicated hardware.  This is a pragmatic, expensive, unambitious 
arguemnt :-)  But drbd+LVM has worked very well for them for 18 months, 
and the peace of mind of being able to start their customers' VMs in one 
of two places makes diagnosing this properly worth the effort.

&lt;/pre&gt;</description>
    <dc:creator>Matthew Bloch</dc:creator>
    <dc:date>2012-05-24T13:09:34</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24139">
    <title>Re: "PingAck not received" messages</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24139</link>
    <description>&lt;pre&gt;On Thu, May 24, 2012 at 1:53 PM, Matthew Bloch &amp;lt;matthew-WkKPyC9aYE69FHfhHBbuYA&amp;lt; at &amp;gt;public.gmane.org&amp;gt;
wrote:

Sure, if you have kernel-induced network problems on one of your
nodes, that would definitely explain the issues you're seeing. But you
insisted from the start that there were no network issues. :)


Might as well not be a DRBD bug at all, just DRBD trying to do the
right thing in the face of a flaky network stack.


That would seem like an odd thing to do. FWIW, we've been running
happily on squeeze kernels for months.


You can always to that from a device with metadata as well. kpartx is
your friend.


That's a fair point, but realistically, how long does it take you to
take the backup off your snapshot? And does this normally coincide
with the DRBD device getting hammered, which is pretty much the only
situation in which a downstream client would likely feel any
disruption?

Just my two cents. Or pence. :)

Cheers,
Florian

&lt;/pre&gt;</description>
    <dc:creator>Florian Haas</dc:creator>
    <dc:date>2012-05-24T12:55:10</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24138">
    <title>Re: DRBD initial settings for two disks</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24138</link>
    <description>&lt;pre&gt;
That looks fine; take a look at
http://www.drbd.org/users-guide/s-configure-resource.html#s-drbdconf-example
in the example titled "Multi-volume DRBD resource configuration".

Do recall, however, that multiple volumes per resource are a DRBD 8.4
feature. So if you're actually trying to do this on Ubuntu lucid,
you'll have to make sure you install 8.4 before attempting this.

Cheers,
Florian

&lt;/pre&gt;</description>
    <dc:creator>Florian Haas</dc:creator>
    <dc:date>2012-05-24T12:23:32</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24137">
    <title>Re: "PingAck not received" messages</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24137</link>
    <description>&lt;pre&gt;
Hmm, thanks.  Unrelated to any of this, the v3a kernel (Debian 2.6.32-4) 
crashed pretty badly 48hrs ago.  Since it has been rebooted - there have 
been no "PingAck not received" messages.

So assuming we get a week free of these messages, I'm guessing there was 
a drbd bug of some kind but the reboot cleared it up.

We are preparing to jump to a 2.6.32 sourced from CentOS because this 
Debian kernel seems to crash with one bug or another every few months.

The reason we're using external meta-devices is for backup: without the 
metadata at the end, the underlying disk image represents exactly what 
the VMs see.  We can then snapshot this and take a reasonably consistent 
backup without bothering DRBD.  We later verify this backup by booting 
it back up, disconnected, and taking a snapshot of the VNC console!

The reason I picked protocol B is because LVM snaphots kill the local 
DRBD performance if we snapshot the LVM device underlying the DRBD 
Primary.  If we snapshot the Secondary and used protocol B where we 
weren't dependent on local write speeds, my working theory was that the 
performance hit wouldn't be as noticeable, and the customer seemed to 
concur (previously we were using C).

&lt;/pre&gt;</description>
    <dc:creator>Matthew Bloch</dc:creator>
    <dc:date>2012-05-24T11:53:04</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24136">
    <title>DRBD initial settings for two disks</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24136</link>
    <description>&lt;pre&gt;
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 
Hi!

I was reading document
https://help.ubuntu.com/10.04/serverguide/drbd.html. Would the following
configuration be correct with two disks?

|/etc/drbd.conf|:
resource r0 {
  volume 0 {
    device    /dev/drbd1;
    disk      /dev/sdb1;
    meta-disk internal;
  }
  volume 1 {
    device    /dev/drbd2;
    disk      /dev/sdb2;
    meta-disk internal;
  }

  on drbd01 {
    address   192.168.0.1:7788;
  }
  on drbd02 {
    address   192.168.0.2:7788;
  }
}


Tero Mäntyvaara
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
iQEcBAEBAgAGBQJPvhq1AAoJEHIK6cOQy5X4UtEIAL4cy0f8u40/Jf3hdsaV3aUX
KSpweN0hmcTYXKAhraJmR3PTSzuBG3fIF7FI6GaxVY9EcIsSolWbYW2wnNSS0kCV
inc9stT3J07sW5j5t6UvPg9LkI8+ptLv+WJfy1FEZCjVC2dMMK1gxzZf/Ha9+tj9
DC9g7V9bTZB1aDxS5U3r/R2NDKbKYKgsxaZYRHcFLwnh9zJNvbYtuy6tLQj8L7sf
Sma16/j4yEfOHXFbs8VZZR4kMo346G4pt7zD3UOY1b2ZNjHOVuA1oBF7QAezMlwz
ImjV4yJNmhQQKVAQbPVRQAWl4M/wgsD2DjQLbO2eF4E+TMhhrb2vcRKPWBbijjM=
=Yl+F
-----END PGP SIGNATURE-----
&lt;/pre&gt;</description>
    <dc:creator>Tero Mäntyvaara</dc:creator>
    <dc:date>2012-05-24T11:25:42</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24135">
    <title>Changing the name of a resource</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24135</link>
    <description>&lt;pre&gt;All:

Wanted to change the name of the drbd resource to better reflect its
functional use.

In test, Tried changing the file and resource name on the secondary side,
adjusting and re-connecting. And other slightly different steps, to see
what can work.

The results are that when re-connecting after changing the file name - the
primary side seems to think that the device is out of sync and becomes the
sync source but the secondary side will have none of it and goes it
'WFConnection' state.

Find below the log extract when the secondary side is connected after the
resource is renamed. ( From the log it seems that it does become the sync
target and actually syncs, but does not like something).

Is there a known sequence of steps that can work in this case ? Need I
experiment any further ? What else can I try ?

-JA

block drbd0: conn( StandAlone -&amp;gt; Unconnected )
block drbd0: Starting receiver thread (from drbd0_worker [32028])
block drbd0: receiver (re)started
block drbd0: conn( Unconnected -&amp;gt; WFConnection )
block drbd0: Handshake successful: Agreed network protocol version 96
block drbd0: Peer authenticated using 16 bytes of 'md5' HMAC
block drbd0: conn( WFConnection -&amp;gt; WFReportParams )
block drbd0: Starting asender thread (from drbd0_receiver [14379])
block drbd0: data-integrity-alg: md5
block drbd0: drbd_sync_handshake:
block drbd0: self
A3DA23FAA568B544:0000000000000000:CC820BAF75D3008A:CC810BAF75D3008B bits:0
flags:0
block drbd0: peer
518B27BC03329A91:A3DB23FAA568B545:A3DA23FAA568B545:CC820BAF75D3008B
bits:721870 flags:0
block drbd0: Did not got last syncUUID packet, corrected:
block drbd0: peer
518B27BC03329A91:A3DA23FAA568B545:CC820BAF75D3008B:CC820BAF75D3008B
bits:721870 flags:0
block drbd0: uuid_compare()=-1 by rule 51
block drbd0: peer( Unknown -&amp;gt; Primary ) conn( WFReportParams -&amp;gt; WFBitMapT )
disk( UpToDate -&amp;gt; Outdated ) pdsk( DUnknown -&amp;gt; UpToDate )
block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 239(1),
total 239; compression: 99.9%
block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 239(1),
total 239; compression: 99.9%
block drbd0: conn( WFBitMapT -&amp;gt; WFSyncUUID )
block drbd0: updated sync uuid
A3DB23FAA568B544:0000000000000000:CC820BAF75D3008A:CC810BAF75D3008B
block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
exit code 0 (0x0)
block drbd0: conn( WFSyncUUID -&amp;gt; SyncTarget ) disk( Outdated -&amp;gt;
Inconsistent )
block drbd0: Began resync as SyncTarget (will sync 2887480 KB [721870 bits
set]).
block drbd0: Resync done (total 9 sec; paused 0 sec; 320828 K/sec)
block drbd0: 100 % had equal check sums, eliminated: 2887480K; transferred
0K total 2887480K
block drbd0: updated UUIDs
518B27BC03329A90:0000000000000000:A3DB23FAA568B544:A3DA23FAA568B545
block drbd0: conn( SyncTarget -&amp;gt; Connected ) disk( Inconsistent -&amp;gt; UpToDate
)
block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0
block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit
code 0 (0x0)
block drbd0: bitmap WRITE of 0 pages took 0 jiffies
block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
block drbd0: peer( Primary -&amp;gt; Unknown ) conn( Connected -&amp;gt; TearDown ) pdsk(
UpToDate -&amp;gt; DUnknown )
block drbd0: asender terminated
block drbd0: Terminating asender thread
block drbd0: Connection closed
block drbd0: conn( TearDown -&amp;gt; Unconnected )
block drbd0: receiver terminated
block drbd0: Restarting receiver thread
block drbd0: receiver (re)started
block drbd0: conn( Unconnected -&amp;gt; WFConnection )
block drbd0: conn( WFConnection -&amp;gt; Disconnecting )
block drbd0: Discarding network configuration.
block drbd0: Connection closed
block drbd0: conn( Disconnecting -&amp;gt; StandAlone )
block drbd0: receiver terminated
block drbd0: Terminating receiver thread
block drbd0: conn( StandAlone -&amp;gt; Unconnected )
block drbd0: Starting receiver thread (from drbd0_worker [32028])
block drbd0: receiver (re)started
block drbd0: conn( Unconnected -&amp;gt; WFConnection )
_______________________________________________
drbd-user mailing list
drbd-user-cunTk1MwBs8qoQakbn7OcQ&amp;lt; at &amp;gt;public.gmane.org
http://lists.linbit.com/mailman/listinfo/drbd-user
&lt;/pre&gt;</description>
    <dc:creator>John Anthony</dc:creator>
    <dc:date>2012-05-23T21:50:44</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24134">
    <title>Re: Recovering from erroneous sync state</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24134</link>
    <description>&lt;pre&gt;
hu? should be two as well?

anyways:
port=7789;
for chain in INPUT OUTPUT ; do
    for direction in dport sport ; do
iptables -I $chain -p tcp --$direction $port -j REJECT --reject-with tcp-reset
    done
done

should do it usually.




&lt;/pre&gt;</description>
    <dc:creator>Lars Ellenberg</dc:creator>
    <dc:date>2012-05-23T21:48:08</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24133">
    <title>Re: Recovering from erroneous sync state</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24133</link>
    <description>&lt;pre&gt;
On May 23, 2012, at 4:34 PM, Zev Weiss wrote:


"(at which point I plan on updating to 8.3.13 anyway)", was what that unfinished parenthetical was supposed to say there.

Zev

_______________________________________________
drbd-user mailing list
drbd-user-cunTk1MwBs8qoQakbn7OcQ&amp;lt; at &amp;gt;public.gmane.org
http://lists.linbit.com/mailman/listinfo/drbd-user
&lt;/pre&gt;</description>
    <dc:creator>Zev Weiss</dc:creator>
    <dc:date>2012-05-23T21:39:42</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24132">
    <title>Re: Recovering from erroneous sync state</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24132</link>
    <description>&lt;pre&gt;
On May 23, 2012, at 4:18 PM, Lars Ellenberg wrote:


Sorry, should have worded that more carefully -- it wasn't strictly a DROP, but a REJECT, with icmp-port-unreachable.  I've since tweaked it to reject with tcp-reset instead.  No changes in DRBD state on either side as far as I can see, though both nodes still seem to have (or at least think they have) a tcp connection or two involving that port, despite my efforts with iptables:

[root&amp;lt; at &amp;gt;node1 ~]# netstat -tn | fgrep 7789
tcp      156      0 192.168.1.2:37324           192.168.1.1:7789            ESTABLISHED 
tcp        0      0 192.168.1.2:7789            192.168.1.1:47548           ESTABLISHED 

[root&amp;lt; at &amp;gt;node2 ~]# netstat -tn | fgrep 7789
tcp        0      0 192.168.1.1:47548           192.168.1.2:7789            ESTABLISHED 


For what it's worth, node1 is the one that thinks the resource is Secondary/Secondary, node2 is the one that shows it as Secondary/Primary.


Right, but I don't really want to disrupt replication on all the other (production) resources, so just leaving it as-is until my next scheduled maintenance reboot (at which point I plan on  would be preferable.


Thanks,
Zev
&lt;/pre&gt;</description>
    <dc:creator>Zev Weiss</dc:creator>
    <dc:date>2012-05-23T21:34:28</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24131">
    <title>Re: Recovering from erroneous sync state</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24131</link>
    <description>&lt;pre&gt;
You would not need to "block" as in DROP, but to REJECT with tcp reset.

You could also try "ifdown" sleep a while and ifup again.
(which obviously will impact the other resources, and everything going
via that interface).



&lt;/pre&gt;</description>
    <dc:creator>Lars Ellenberg</dc:creator>
    <dc:date>2012-05-23T21:18:37</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24130">
    <title>Re: Recovering from erroneous sync state</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24130</link>
    <description>&lt;pre&gt;
On May 23, 2012, at 3:45 PM, Lars Ellenberg wrote:


I think I had tried a non-forced disconnect previously (and perhaps also implicitly as part of a 'down' attempt, though I'm not sure whether it would have gotten to that step if the disconnect operation didn't complete), but 'drbdsetup 9 disconnect --force' also just hangs.


As mentioned in another message in response to Florian, blocking the replication port via iptables doesn't seem to have had any effect.


Thanks,
Zev

_______________________________________________
drbd-user mailing list
drbd-user-cunTk1MwBs8qoQakbn7OcQ&amp;lt; at &amp;gt;public.gmane.org
http://lists.linbit.com/mailman/listinfo/drbd-user
&lt;/pre&gt;</description>
    <dc:creator>Zev Weiss</dc:creator>
    <dc:date>2012-05-23T21:12:23</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24129">
    <title>Re: Recovering from erroneous sync state</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24129</link>
    <description>&lt;pre&gt;
On May 23, 2012, at 3:47 PM, Florian Haas wrote:


I've now inserted iptables rules on both sides for the relevant replication port (reject-with icmp-port-unreachable), but no transition to WFConnection -- it's still stuck in the same state (and unsurprisingly, 'down' still just hangs).


Zev
&lt;/pre&gt;</description>
    <dc:creator>Zev Weiss</dc:creator>
    <dc:date>2012-05-23T21:07:28</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24128">
    <title>Re: need discribtion and help</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24128</link>
    <description>&lt;pre&gt;Hi,

On 23.05.2012 17:12, Mahmoud Alshinhab wrote:

The full error message already contains everything. I have quoted the
relevant parts below and am going to interpret it for you:

Your disk already contains a filesystem and uses the space up to the
last bit.
If you don't want to preserve the fs, copy some random stuff or zeros to
the first couple of MB of the partition.
If you want to preserve the data and just add redundancy, shrink the
filesystem a bit to make room for drbd's internal meta data and then try
the create-md again. Or learn about and use external meta-data.

Hope that helps,

Arnold

&amp;lt;snip&amp;gt;
&lt;/pre&gt;</description>
    <dc:creator>Arnold Krille</dc:creator>
    <dc:date>2012-05-23T20:55:04</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24127">
    <title>Re: Recovering from erroneous sync state</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24127</link>
    <description>&lt;pre&gt;
Ugh. Can you force the device into the WFConnection state by injecting
a couple of iptables rules blocking the replication port, and then
"down" the resource?

Also, Lars, can you shed a little more light on the bug, and its
8.3.13 fix? I had thought the fix was in commit 305dce2c, but it
apparently fixes c19050f4 (which as per git describe was some thirty
commits after 8.3.12, so it shouldn't affect an 8.3.12 user).

Cheers,
Florian

&lt;/pre&gt;</description>
    <dc:creator>Florian Haas</dc:creator>
    <dc:date>2012-05-23T20:47:39</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24126">
    <title>Re: Recovering from erroneous sync state</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24126</link>
    <description>&lt;pre&gt;
drbdsetup 9 disconnect --force
may work,
if you did not try a non-forced disconnect or similar before,
that is to say, if the drbd worker thread is not blocked yet.

You can always cut the tcp connection using iptables,
which should at least get the worker into a responsive state again.


&lt;/pre&gt;</description>
    <dc:creator>Lars Ellenberg</dc:creator>
    <dc:date>2012-05-23T20:45:19</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.linux.drbd/24125">
    <title>Re: Recovering from erroneous sync state</title>
    <link>http://permalink.gmane.org/gmane.comp.linux.drbd/24125</link>
    <description>&lt;pre&gt;
On May 23, 2012, at 3:22 PM, Florian Haas wrote:


Sure -- here's one node:

version: 8.3.12 (api:88/proto:86-96)
GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by zweiss&amp;lt; at &amp;gt;mydomain, 2012-03-14 19:52:38

&amp;lt;snip other resources&amp;gt;
 9: cs:SyncSource ro:Secondary/Primary ds:UpToDate/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:65536
        [&amp;gt;...................] sync'ed:  5.9% (65536/65536)K
        finish: 19046:04:53 speed: 0 (0 -- 0) K/sec (stalled)
          0% sector pos: 0/10698352
        resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/3389 hits:0 misses:0 starving:0 dirty:0 changed:0


And here's the other:

version: 8.3.12 (api:88/proto:86-96)
GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by zweiss-VeTH3JB28IWuorB+Ufeg+tADJCGDAxhI&amp;lt; at &amp;gt;public.gmane.org, 2012-03-14 19:52:38

&amp;lt;snip other resources&amp;gt;
 9: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:65536
        [&amp;gt;...................] sync'ed:  5.9% (65536/65536)K
        finish: 18987:55:05 speed: 0 (0 -- 0) K/sec (stalled)
          0% sector pos: 0/10698352
        resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/3389 hits:0 misses:0 starving:0 dirty:0 changed:0



That just hangs, unfortunately (on both nodes).


Thanks,
Zev
&lt;/pre&gt;</description>
    <dc:creator>Zev Weiss</dc:creator>
    <dc:date>2012-05-23T20:34:27</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.comp.linux.drbd">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.comp.linux.drbd</link>
  </textinput>
</rdf:RDF>

