<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.linux.redhat.cluster">
    <title>gmane.linux.redhat.cluster</title>
    <link>http://blog.gmane.org/gmane.linux.redhat.cluster</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21111"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21110"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21109"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21108"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21107"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21106"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21105"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21104"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21103"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21102"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21101"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21100"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21099"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21098"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21097"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21096"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21095"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21094"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21093"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.linux.redhat.cluster/21092"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21111">
    <title>Is it possible to use quorum for CTDB to prevent split-brain and removing lockfile in the cluster file system</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21111</link>
    <description>&lt;pre&gt;Hello list,

We know that CTDB uses lockfile in the cluster file system to prevent
split-brain.
It is a really good design when all nodes in the cluster can mount the
cluster file system (e.g. GPFS/GFS/GlusterFS) and CTDB can work happily in
this assumption.
However, when split-brain happens, the disconnected private network
violates this assumption usually.
For example, we have four nodes (A, B, C, D) in the cluster and GlusterFS
is the beckend.
GlusterFS and CTDB on all nodes communicate to each other via private
network and CTDB manages the public network.
If node A is disconnected in the private network, there will be group (A)
and group (B,C,D) in our cluster.
The election of recovery master will be triggered after the disconnected
determination of CTDB, i.e. the CTDB elects a new recovery master for each
group after 26 (KeepaliveInterval*KeepaliveLimits+1 by default) seconds.
Then node A will be the recovery master of group (A) and some node (e.g. B)
will be the recovery master of group (B,C,D).
Now, A and B will try to lock the lockfile but GlusterFS also communicates
to each other via private network.
A big problem arises since the lockfile can be locked or not depends on the
lock implementation and disconnected determination of GlusterFS (or other
cluster file system). In my knowledge, GlusterFS will determine some node
is disconnected after 42 seconds and release its lock. In this
configuration, node A and B will ban themselves and the newly elected
recovery master will ban itslef. It's a really bad thing and we can not
treat the cluster file system as a blackbox using the lockfile design.

Hence, I have an idea about the opportunity to build CTDB with split-brain
prevention without lockfile.
Using quorum concepts to ban a node might be an option and I do a little
modification of the CTDB source code.
The modification checks whether there are more than (nodemap-&amp;gt;num)/2
connected nodes in main_loop of server/ctdb_recoverd.c.
If not, ban the node itslef and logs an error "Node %u in the group without
quorum".

In server/ctdb_recoverd.c:
static void main_loop(struct ctdb_context *ctdb, struct ctdb_recoverd *rec,
TALLOC_CTX *mem_ctx)
...
        /* count how many active nodes there are */
        rec-&amp;gt;num_active    = 0;
        rec-&amp;gt;num_connected = 0;
        for (i=0; i&amp;lt;nodemap-&amp;gt;num; i++) {
                if (!(nodemap-&amp;gt;nodes[i].flags &amp;amp; NODE_FLAGS_INACTIVE)) {
                        rec-&amp;gt;num_active++;
                }
                if (!(nodemap-&amp;gt;nodes[i].flags &amp;amp; NODE_FLAGS_DISCONNECTED)) {
                        rec-&amp;gt;num_connected++;
                }
        }

+       if (rec-&amp;gt;num_connected &amp;lt; ((nodemap-&amp;gt;num)/2+1)){
+               DEBUG(DEBUG_ERR, ("Node %u in the group without quorum\n",
pnn));
+               ctdb_ban_node(rec, pnn, ctdb-&amp;gt;tunable.recovery_ban_period);
+       }

This modification seems to provide a split-brain prevention without
lockfile in my tests(more than 3 nodes).
Does this modification cause any side-effect or is that a stupid design?
Please kindly answer me and I appreciate to receive new inputs from smart
people like you guys.

Thanks,
Az
&lt;/pre&gt;</description>
    <dc:creator>黃曉偉</dc:creator>
    <dc:date>2012-05-24T04:22:20</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21110">
    <title>Re: RHEL/CentOS-6 HA NFS Configuration Question</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21110</link>
    <description>&lt;pre&gt;
My apologies, this second location is the same as the first, and should be
http://marc.info/?l=linux-nfs&amp;amp;m=131914563520472&amp;amp;w=2

Ben

&lt;/pre&gt;</description>
    <dc:creator>Benjamin Coddington</dc:creator>
    <dc:date>2012-05-22T13:26:21</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21109">
    <title>Re: RHEL/CentOS-6 HA NFS Configuration Question</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21109</link>
    <description>&lt;pre&gt;Those running HA NFS should be aware of the following two NFSD open leaks.

The first is the nfs4_open_downgrade leak:
http://marc.info/?l=linux-nfs&amp;amp;m=131077202109185&amp;amp;w=2
https://bugzilla.redhat.com/show_bug.cgi?id=714153

Redhat supposedly fixed this, but I never saw the errata go by.. while we
waited for them to fix it, we went to an upstream kernel and got bit
by this one:

http://marc.info/?l=linux-nfs&amp;amp;m=131077202109185&amp;amp;w=2

NFSD open leaks will cause your filesystems to fail to umount, even after
waiting through your lease time.  You'll see the device's open count
will be non-zero (dmsetup info &amp;lt;device&amp;gt;), even though the filesystem
is unexported, and kernel nfsds are stopped.

We've been running our NFS4 HA cluster for a few months now on
a 3.2.5 kernel, and failover/recovery works well.

Ben

On May 16, 2012, at 2:19 PM, Colin Simpson wrote:



&lt;/pre&gt;</description>
    <dc:creator>Benjamin Coddington</dc:creator>
    <dc:date>2012-05-22T13:05:32</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21108">
    <title>Re: RHEL/CentOS-6 HA NFS Configuration Question</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21108</link>
    <description>&lt;pre&gt;Hi Colin,

On 5/17/2012 11:47 AM, Colin Simpson wrote:

Understood. I can´t really say if userland or kernel would make any
difference in this specific unmount issue, but for "safety reasons" I
need to assume their design is the same and behave the same way. when/if
there will be a switch, we will need to look more deeply into it. With
current kernel implementation we (cluster guys) need to use this approach.


Yes one for rhel5 and one for rhel6, but they are both private at the
moment because they have customer data in it.

I expect that the workaround/fix (whatever you want to label it) will be
available via RHN in 2/3 weeks.

Fabio


--
Linux-cluster mailing list
Linux-cluster&amp;lt; at &amp;gt;redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster&lt;/pre&gt;</description>
    <dc:creator>Fabio M. Di Nitto</dc:creator>
    <dc:date>2012-05-17T09:57:22</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21107">
    <title>Re: RHEL/CentOS-6 HA NFS Configuration Question</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21107</link>
    <description>&lt;pre&gt;Thanks for all the useful information on this.

I realise the bz is not for this issue, I just included it as it has the
suggestion that nfsd should actually live in user space (which seems
sensible).

Out of interest is there a bz # for this issue?

Colin


On Thu, 2012-05-17 at 10:26 +0200, Fabio M. Di Nitto wrote:


________________________________


This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.


&lt;/pre&gt;</description>
    <dc:creator>Colin Simpson</dc:creator>
    <dc:date>2012-05-17T09:47:29</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21106">
    <title>Re: RHEL/CentOS-6 HA NFS Configuration Question</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21106</link>
    <description>&lt;pre&gt;Sadly that doesn't work (usually). With nfsd the filesystem will still
refuse to umount (despite the force) as it's locked in kernel space
(where nfsd lives). And the service will just go to failed.

The best you can so is the probably self_fence="1" which is pretty
brutal and halts the node with the stuck fs mount, I believe.

Colin

On Thu, 2012-05-17 at 00:45 +0200, emmanuel segura wrote:


________________________________


This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.


&lt;/pre&gt;</description>
    <dc:creator>Colin Simpson</dc:creator>
    <dc:date>2012-05-17T09:47:00</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21105">
    <title>Re: Linux-cluster Digest, Vol 97, Issue 5</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21105</link>
    <description>&lt;pre&gt;Emmanuel,

On 5/17/2012 10:38 AM, emmanuel segura wrote:

Wow. that´s good planning of your.. so instead of rely on an explicit
order in cluster.conf, you add an extra layer of obfuscation by assuming
service.sh will always be the same.

Explicit config is more readable and make the service clear for the user
vs expecting a user to dig into some file where priorities are described
and that can change.


I seriously hope this is sarcasm.

Fabio

&lt;/pre&gt;</description>
    <dc:creator>Fabio M. Di Nitto</dc:creator>
    <dc:date>2012-05-17T09:38:04</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21104">
    <title>Re: Linux-cluster Digest, Vol 97, Issue 5</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21104</link>
    <description>&lt;pre&gt;Fabio

The Ip it's the last to start, as sayed before look vim
/usr/share/cluster/service.sh

a have a cluster configured like that and i can tell i never found the
problem
==============================
=============================

Look this, the order start of resources isn't based on the order in the xml
/etc/cluster.conf, i see you work in Redhat and you don't know this Mama Mia
===========================================================
 &amp;lt;special tag="rgmanager"&amp;gt;
        &amp;lt;attributes root="1" maxinstances="1"/&amp;gt;
        &amp;lt;child type="lvm" start="1" stop="9"/&amp;gt;
        &amp;lt;child type="fs" start="2" stop="8"/&amp;gt;
        &amp;lt;child type="clusterfs" start="3" stop="7"/&amp;gt;
        &amp;lt;child type="netfs" start="4" stop="6"/&amp;gt;
        &amp;lt;child type="nfsexport" start="5" stop="5"/&amp;gt;

        &amp;lt;child type="nfsclient" start="6" stop="4"/&amp;gt;

        &amp;lt;child type="ip" start="7" stop="2"/&amp;gt;
        &amp;lt;child type="smb" start="8" stop="3"/&amp;gt;
        &amp;lt;child type="script" start="9" stop="1"/&amp;gt;
    &amp;lt;/special&amp;gt;
==========================================================


2012/5/17 Fabio M. Di Nitto &amp;lt;fdinitto&amp;lt; at &amp;gt;redhat.com&amp;gt;




&lt;/pre&gt;</description>
    <dc:creator>emmanuel segura</dc:creator>
    <dc:date>2012-05-17T08:38:21</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21103">
    <title>Re: Linux-cluster Digest, Vol 97, Issue 5</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21103</link>
    <description>&lt;pre&gt;Fabio

The Ip it's the last to start, as sayed before look vim
/usr/share/cluster/service.sh

a have a cluster configured like that and i can tell i never found the
problem
===========================================================

Look this, the order start of resources isn't based on the order in the xml
/etc/cluster.conf, i see you work in Redhat and i don't know this Mama Mia
===========================================================
 &amp;lt;special tag="rgmanager"&amp;gt;
        &amp;lt;attributes root="1" maxinstances="1"/&amp;gt;
        &amp;lt;child type="lvm" start="1" stop="9"/&amp;gt;
        &amp;lt;child type="fs" start="2" stop="8"/&amp;gt;
        &amp;lt;child type="clusterfs" start="3" stop="7"/&amp;gt;
        &amp;lt;child type="netfs" start="4" stop="6"/&amp;gt;
        &amp;lt;child type="nfsexport" start="5" stop="5"/&amp;gt;

        &amp;lt;child type="nfsclient" start="6" stop="4"/&amp;gt;

        &amp;lt;child type="ip" start="7" stop="2"/&amp;gt;
        &amp;lt;child type="smb" start="8" stop="3"/&amp;gt;
        &amp;lt;child type="script" start="9" stop="1"/&amp;gt;
    &amp;lt;/special&amp;gt;
==========================================================

2012/5/17 Fabio M. Di Nitto &amp;lt;fdinitto&amp;lt; at &amp;gt;redhat.com&amp;gt;




&lt;/pre&gt;</description>
    <dc:creator>emmanuel segura</dc:creator>
    <dc:date>2012-05-17T08:37:24</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21102">
    <title>Re: RHEL/CentOS-6 HA NFS Configuration Question</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21102</link>
    <description>&lt;pre&gt;
Yes, I am aware the issue since I have been investigating it in details
for the past couple of weeks.


First, the bz you mention below is unrelated to the unmount problem we
are discussing. clustered nfsd locks are a slightly different story.

There are two issues here:

1) cluster users expectations
2) nfsd internal design

(and note I am not blaming either cluster or nfsd here)

Generally cluster users expect to be able to do things like (fake meta
config):

&amp;lt;service1..
 &amp;lt;fs1..
  &amp;lt;nfsexport1..
   &amp;lt;nfsclient1..
    &amp;lt;ip1..
....
&amp;lt;service2
 &amp;lt;fs2..
  &amp;lt;nfsexport2..
   &amp;lt;nfsclient2..
    &amp;lt;ip2..

and be able to move services around cluster nodes without problem. Note
that it is irrelevant of the fs used. It can be clustered or not.

This setup does unfortunately clash with nfsd design.

When shutdown of a service happens (due to stop or relocation is
indifferent):

ip is removed
exportfs -u .....
(and that's where we hit the nfsd design limitation)
umount fs..

By design (tho I can't say exactly why it is done this way without
speculating), nfsd will continue to serve open sessions via rpc.
exportfs -u will only stop new incoming requests.

If nfsd is serving a client, it will continue to hold a lock on the
filesystem (in kernel) that would prevent the fs to be unmounted.

The only way to effectively close the sessions are:

- drop the VIP and wait for connections timeout (nfsd would effectively
  also drop the lock on the fs) but it is slow and not always consistent
  on how long it would take

- restart nfsd.


The "real fix" here would be to wait for nfsd containers that do support
exactly this scenario. Allowing unexport of single fs and lock drops
etc. etc. This work is still in very early stages upstream, that doesn't
make it suitable yet for production.

The patch I am working on, is basically a way to handle the clash in the
best way as possible.

A new nfsrestart="" option will be added to both fs and clusterfs, that,
if the filesystem cannot be unmounted, if force_unmount is set, it will
perform an extremely fast restart of nfslock and nfsd.

We can argue that it is not the final solution, i think we can agree
that it is more of a workaround, but:

1) it will allow service migration instead of service failure
2) it will match cluster users expectations (allowing different exports
and live peacefully together).

The only negative impact that we have been able to evaluate so far (the
patch is still under heavy testing phase), beside having to add a config
option to enable it, is that there will be a small window in which all
clients connect to a certain node for all nfs services, will not be
served because nfsd is restarting.

So if you are migrating export1 and there are clients using export2,
export2 will also be affected for those few ms required to restart nfsd.
(assuming export1 and 2 are running on the same node of course).

Placing things in perspective for a cluster, I think that it is a lot
better to be able to unmount a fs and relocate services as necessary vs
a service failing completely and maybe node being fenced.





No that's a totally different issue.


&lt;/pre&gt;</description>
    <dc:creator>Fabio M. Di Nitto</dc:creator>
    <dc:date>2012-05-17T08:26:46</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21101">
    <title>Re: Linux-cluster Digest, Vol 97, Issue 5</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21101</link>
    <description>&lt;pre&gt;

That is also wrong.

&amp;lt;ip has to be the last one to start or you will hit race conditions at
service startup.

Fabio

&lt;/pre&gt;</description>
    <dc:creator>Fabio M. Di Nitto</dc:creator>
    <dc:date>2012-05-17T08:02:54</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21100">
    <title>Re: RHEL/CentOS-6 HA NFS Configuration Question</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21100</link>
    <description>&lt;pre&gt;Colin

Use force_unmount options

2012/5/16 Colin Simpson &amp;lt;Colin.Simpson&amp;lt; at &amp;gt;iongeo.com&amp;gt;




&lt;/pre&gt;</description>
    <dc:creator>emmanuel segura</dc:creator>
    <dc:date>2012-05-16T22:45:16</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21099">
    <title>Re: RHEL/CentOS-6 HA NFS Configuration Question</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21099</link>
    <description>&lt;pre&gt;quick reply, I´ll send you the full details monday. I am heading off for
a couple of days of vacation + weekend.. i am not ignoring the question :)

Fabio

On 5/16/2012 8:19 PM, Colin Simpson wrote:

&lt;/pre&gt;</description>
    <dc:creator>Fabio M. Di Nitto</dc:creator>
    <dc:date>2012-05-16T20:00:03</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21098">
    <title>Re: Linux-cluster Digest, Vol 97, Issue 5</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21098</link>
    <description>&lt;pre&gt;Yes Randy

I rember in my jobs i found this problem in a cluster

This it's wrong
==============================================
&amp;lt;service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1"
nfslock="1" recovery="relocate"&amp;gt;
                            &amp;lt;ip ref="192.168.1.1"&amp;gt;
                                    &amp;lt;fs __independent_subtree="1"
ref="volume01"&amp;gt;
                                            &amp;lt;nfsexport name="nfs-volume01"&amp;gt;
                                                    &amp;lt;nfsclient name=" "
ref="local-subnet"/&amp;gt;
                                            &amp;lt;/nfsexport&amp;gt;
                                    &amp;lt;/fs&amp;gt;
                            &amp;lt;/ip&amp;gt;
================================================
it's must be
================================================
&amp;lt;service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1"
nfslock="1" recovery="relocate"&amp;gt;
                            &amp;lt;ip ref="192.168.1.1"/&amp;gt;
                                    &amp;lt;fs __independent_subtree="1"
ref="volume01"&amp;gt;
                                            &amp;lt;nfsexport name="nfs-volume01"&amp;gt;
                                                    &amp;lt;nfsclient name=" "
ref="local-subnet"/&amp;gt;
                                            &amp;lt;/nfsexport&amp;gt;
                                    &amp;lt;/fs&amp;gt;
================================================

A give a little explaination, Redhat have a internar order and knows i
which sequense start the resource
For more information read the script /usr/share/cluster/service.sh under
the metadata session



2012/5/16 Randy Zagar &amp;lt;zagar&amp;lt; at &amp;gt;arlut.utexas.edu&amp;gt;




&lt;/pre&gt;</description>
    <dc:creator>emmanuel segura</dc:creator>
    <dc:date>2012-05-16T18:20:08</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21097">
    <title>Re: RHEL/CentOS-6 HA NFS Configuration Question</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21097</link>
    <description>&lt;pre&gt;This is interesting.

We very often see the filesystems fail to umount on busy clustered NFS
servers.

What is the nature of the "real fix"?

I like the idea of NFSD fully being in user space, so killing it would
definitely free the fs.

Alan Brown (who's on this list) recently posted to a RH BZ that he was
one of the people who moved it into kernel space for performance reasons
in the past (that are no longer relevant):

https://bugzilla.redhat.com/show_bug.cgi?id=580863#c9

, but I doubt this is the fix you have in mind.

Colin

On Tue, 2012-05-15 at 20:21 +0200, Fabio M. Di Nitto wrote:


________________________________


This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.


&lt;/pre&gt;</description>
    <dc:creator>Colin Simpson</dc:creator>
    <dc:date>2012-05-16T18:19:04</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21096">
    <title>Re: Linux-cluster Digest, Vol 97, Issue 5</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21096</link>
    <description>&lt;pre&gt;
Click the "Preferences" right at the top right corner, and tick the
'Enable "expert" mode' checkbox there. If you have that enabled, it'll
show you the nfslock option for fs resources.


Increment the config_version in cluster.conf and run cman_tool version -r


Ryan

&lt;/pre&gt;</description>
    <dc:creator>Ryan McCabe</dc:creator>
    <dc:date>2012-05-16T18:16:13</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21095">
    <title>Re: Linux-cluster Digest, Vol 97, Issue 5</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21095</link>
    <description>&lt;pre&gt;
Yes.


I don´t know how to do that in Luci. All my work is done via manual editing.

Fabio


&lt;/pre&gt;</description>
    <dc:creator>Fabio M. Di Nitto</dc:creator>
    <dc:date>2012-05-16T17:25:12</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21094">
    <title>Re: Linux-cluster Digest, Vol 97, Issue 5</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21094</link>
    <description>&lt;pre&gt;Also, it looks like the resource manager tries to disable the IP address 
when it's a child of the nfsclient resource.  Is that going to be a 
problem when I have 16 NFS exports hosted on a single IP?

-RZ

On 05/16/2012 11:00 AM, fdinitto&amp;lt; at &amp;gt;redhat.com wrote:

&lt;/pre&gt;</description>
    <dc:creator>Randy Zagar</dc:creator>
    <dc:date>2012-05-16T17:18:11</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21093">
    <title>Re: Linux-cluster Digest, Vol 97, Issue 5</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21093</link>
    <description>&lt;pre&gt;Are you sure that nfslock="1" is a valid option for "&amp;lt;fs ...&amp;gt;"?

There doesn't appear to be a way to add that through LUCI, which means 
I'll have to make and propagate those changes manually.  I used to do 
this in EL5

    /sbin/ccs_tool update /etc/cluster/cluster.conf

but it looks like it's handled differently now.

How?

-RZ

On 05/16/2012 11:00 AM, fdinitto&amp;lt; at &amp;gt;redhat.com wrote:

&lt;/pre&gt;</description>
    <dc:creator>Randy Zagar</dc:creator>
    <dc:date>2012-05-16T17:02:19</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21092">
    <title>Re: RHEL/CentOS-6 HA NFS Configuration Question</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21092</link>
    <description>&lt;pre&gt;

I have an open ticket with support because our cluster fails everytime 
on failover but caught this thread.  I set up my fs resources according 
to the docs as such:

&amp;lt;clusterfs device="/dev/vg1/zcb" fsid="7275" fstype="gfs2" 
mountpoint="/mnt/research/zcb" name="gfs22_zcb" 
options="defaults,noatime,localflocks"/&amp;gt;

Can you please verify my options look ok?  I can't find anything in the 
official documentation on setting up NFS on the cluster services with 
the nfslock="1" option.  Thanks!

&lt;/pre&gt;</description>
    <dc:creator>Doug Tucker</dc:creator>
    <dc:date>2012-05-16T14:36:47</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.linux.redhat.cluster/21091">
    <title>Re: RHEL/CentOS-6 HA NFS Configuration Question</title>
    <link>http://permalink.gmane.org/gmane.linux.redhat.cluster/21091</link>
    <description>&lt;pre&gt;

For the &amp;lt;fs resources you want nfslock="1" option too.


For all services you need to change the order.

&amp;lt;fs..
 &amp;lt;nfsexport..
  &amp;lt;nfsclient..
   &amp;lt;ip..
  &amp;lt;/nfsclient..
 &amp;lt;/nfsexport..
&amp;lt;/fs

This solves different issues at startup, relocation and recovery

Also note that there is known limitation in nfsd (both rhel5/6) that
could cause some problems in some conditions in your current
configuration. A permanent fix is being worked on atm.

Without extreme details, you might have 2 of those services running on
the same node and attempting to relocate one of them can fail because
the fs cannot be unmounted. This is due to nfsd holding a lock (at
kernel level) to the FS. Changing config to the suggested one, mask the
problem pretty well, but more testing for a real fix is in progress.

Fabio

&lt;/pre&gt;</description>
    <dc:creator>Fabio M. Di Nitto</dc:creator>
    <dc:date>2012-05-15T18:21:28</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.linux.redhat.cluster">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.linux.redhat.cluster</link>
  </textinput>
</rdf:RDF>

