<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs">
    <title>gmane.os.solaris.opensolaris.zfs</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49239"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49238"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49237"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49236"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49235"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49234"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49233"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49232"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49231"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49230"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49229"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49228"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49227"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49226"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49225"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49224"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49223"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49222"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49221"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49220"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49239">
    <title>Re: How does resilver/scrub work?</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49239</link>
    <description>&lt;pre&gt;
Kind of, it was - the motivation for feeling insecure and
ultimately for clearing the CKSUM errors every minute (that
there is a nonzero error count), at least - the script you
said should never be used in "normal" practice, and I agree
to that conclusion. (Manual) DDing is not the normal practice
sanely covered by the degradation/hotsparing mechanism.

As I wrote, the first time I saw the message, the pool did not
have an assigned hotspare, but it got marked degraded. Just in
case, I came up with that "cksum-mismatch-cleansing" script and
restarted the scrub, since I knew the errors on-disk were due
to an unfinished "proper" resilver onto it. I was not convinced
whether the new disk still fully operates in the pool when it
is marked as degraded, and I did not want the scrub to continue
just to find out whether the disk won't be actively used and
"repaired".

To say that in other words, I know that sometimes docs can lag
behind or hop ahead of implemented features, and the latter can
also be buggy or incomp&lt;/pre&gt;</description>
    <dc:creator>Jim Klimov</dc:creator>
    <dc:date>2012-05-25T22:01:34</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49238">
    <title>Re: How does resilver/scrub work?</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49238</link>
    <description>&lt;pre&gt;

By the time you could read such a message, the hot spare would have already
kicked in. Obviously, this was not the OP's issue.
 -- richard


--
ZFS Performance and Training
Richard.Elling&amp;lt; at &amp;gt;RichardElling.com
+1-760-896-4422







_______________________________________________
zfs-discuss mailing list
zfs-discuss&amp;lt; at &amp;gt;opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
&lt;/pre&gt;</description>
    <dc:creator>Richard Elling</dc:creator>
    <dc:date>2012-05-25T21:07:55</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49237">
    <title>Re: How does resilver/scrub work?</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49237</link>
    <description>&lt;pre&gt;
The man page seems to not mention the critical part of the FMA msg that OP is 
worried about.
OP said that his motivation for clearing the errors and fearing the degraded 
state was because he feared this:

 &amp;gt;&amp;gt; AUTO-RESPONSE: The device has been marked as degraded.  An attempt
 &amp;gt;&amp;gt; will be made to activate a hot spare if available.

he doesn't want his dd'd new device kicked out of the vdev and replaced by a 
hot spare (if avaialable) due to the number of errors and the scarlet letter 
of "degraded" at the device level - I don't think he cares about the pool 
level degraded status since it doesn't "do" anything.

&lt;/pre&gt;</description>
    <dc:creator>zfs user</dc:creator>
    <dc:date>2012-05-25T20:53:42</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49236">
    <title>Re: MPxIO n00b question</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49236</link>
    <description>&lt;pre&gt;
Good Lord, that was it! It never occurred to me that the drives had a
say in this. Thanks a billion!

Cheers,
--
Saso
&lt;/pre&gt;</description>
    <dc:creator>Sašo Kiselkov</dc:creator>
    <dc:date>2012-05-25T18:54:20</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49235">
    <title>Re: MPxIO n00b question</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49235</link>
    <description>&lt;pre&gt;See the soluion at https://www.illumos.org/issues/644
 -- richard

On May 25, 2012, at 10:07 AM, Sašo Kiselkov wrote:


--
ZFS Performance and Training
Richard.Elling&amp;lt; at &amp;gt;RichardElling.com
+1-760-896-4422







_______________________________________________
zfs-discuss mailing list
zfs-discuss&amp;lt; at &amp;gt;opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
&lt;/pre&gt;</description>
    <dc:creator>Richard Elling</dc:creator>
    <dc:date>2012-05-25T18:40:35</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49234">
    <title>Re: MPxIO n00b question</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49234</link>
    <description>&lt;pre&gt;
Yes, I know - I just don't have hands-on experience with that
in Solaris (and limited in Linux), not so many double-link
boxes around here :)

 &amp;gt; The data on that array is not very important, the primary design
 &amp;gt; parameter is low cost per MB.

Why not just stripe it all then? That would give good speeds ;)
Arguably, mirroring would indeed cost about twice as much per MB,
but as a tradeoff which may be useful to you - it can also give
a lot more IOPS due to more TLVDEVs being available for striping,
and doubling read speeds due to mirroring.


Does your array include SSD L2ARC caches?

I guess (and want to be corrected if wrong) - since ZFS can tolerate
loss of L2ARCs so much that mirroring them is not even supported,
you may get away with several single-link SSDs connected to one or
another controller (or likely a dedicated one other than those
driving the disk arrays - since IOPS on the few SSDs will be higher
than on tens of disks). Likely you shouldn't connect those single
link (SATA) SSDs to the dual-l&lt;/pre&gt;</description>
    <dc:creator>Jim Klimov</dc:creator>
    <dc:date>2012-05-25T18:13:52</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49233">
    <title>Re: MPxIO n00b question</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49233</link>
    <description>&lt;pre&gt;
Unfortunately, it isn't the same thing. MPxIO provides redundant
signaling to the drives, independent of the storage/RAID layer above
it, so it does have its place (besides simply increasing throughput).


I'd use lower protection if it were available :) The data on that
array is not very important, the primary design parameter is low cost
per MB. We're in a very demanding IO environment, we need large
quantities of high-throughput, high-IOPS storage, but we don't need
stellar reliability. If the pool gets corrupted due to unfortunate
double-drive failure, well, that's tough, but not unbearable (the pool
stores customer channel recordings for nPVR, so nothing critical really).

--
Saso
&lt;/pre&gt;</description>
    <dc:creator>Sašo Kiselkov</dc:creator>
    <dc:date>2012-05-25T17:45:33</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49232">
    <title>Re: MPxIO n00b question</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49232</link>
    <description>&lt;pre&gt;Sorry I can't comment on MPxIO, except that I thought zfs
could by itself discern two paths to the same drive, if
only to protect against double-importing the disk into pool.

2012-05-25 21:07, Sašo Kiselkov wrote:
 &amp;gt; I'd like to create a 5x 9-drive raidz's on the JBOD, but

I am not sure it is a good idea to use such low protection
(raidz1) with large drives. At least, I was led to believe
that after 2Tb in size raidz2 is preferable, and raidz3 is
optimal due to long scrub/resilver times leading to large
timeframes that a pool with an error is exposed to possible
fatal errors (due to double-failures with single-protection).

//Jim
&lt;/pre&gt;</description>
    <dc:creator>Jim Klimov</dc:creator>
    <dc:date>2012-05-25T17:35:51</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49231">
    <title>MPxIO n00b question</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49231</link>
    <description>&lt;pre&gt;I'm currently trying to get a SuperMicro JBOD with dual SAS expander
chips running in MPxIO, but I'm a total amateur to this and would like
to ask about how to detect whether MPxIO is working (or not).

My SAS topology is:

 *) One LSI SAS2008-equipped HBA (running the latest IT firmware from
    LSI) with two external ports.
 *) Two SAS cables running from the HBA to the SuperMicro JBOD, where
    they enter the JBOD's rear backplane (which is equipped with two
    LSI SAS expander chips).
 *) From the rear backplane, via two internal SAS cables to the front
    backplane (also with two SAS expanders on it)
 *) The JBOD is populated with 45 2TB Toshiba SAS 7200rpm drives

The machine also has a PERC H700 for the boot media, configured into a
hardware RAID-1 (on which rpool resides).

Here is the relevant part from cfgadm -al for the MPxIO bits:

c5                             scsi-sas     connected    configured
unknown
c5::dsk/c5t50000393D8CB4452d0  disk         connected    configured
unknown
c5::dsk/c5t5&lt;/pre&gt;</description>
    <dc:creator>Sašo Kiselkov</dc:creator>
    <dc:date>2012-05-25T17:07:15</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49230">
    <title>Anybody running S10 or 11 on AMD bulldozer 8 core?</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49230</link>
    <description>&lt;pre&gt;I have a chance to pick up a system at a reasonable price built with an AMD
FX8120 8 core 3.1 GHz on a Gigabyte motherboard. Is anybody runing with this
combo? Looking for info from an actual user. Thanks.
&lt;/pre&gt;</description>
    <dc:creator>Nomen Nescio</dc:creator>
    <dc:date>2012-05-25T16:01:30</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49229">
    <title>Re: How does resilver/scrub work?</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49229</link>
    <description>&lt;pre&gt;
Indeed it is, and I've covered this in the thread earlier -
the bulk copying phase ("DD-phase") should monitor its real
progress, and if it detects lags in comparison to the average
or expected speeds (expected = some tuning variable i.e. 50Mb/s),
the process should skip over some (arbitrary) range of sectors
and go on from another location (such skipped sectors are in
danger indeed, until the scrub-phase detects and reconstructs
them) or fall back to the original resilver method completely.
That was already described in some detail I thought of at the
time of the posting, and I can't add much to that yet.

 From what I've seen with faulty sectors is that they are usually
either single errors or a "scratched" range which can be worked
around with i.e. partitioning for legacy FSes (if the SMART
relocation doesn't deal with them properly for any reason),
while most of the rest of the disk is okay. Retries may be
lengthy, ranging from several seconds up to a minute, but
they are often constrained in a few loca&lt;/pre&gt;</description>
    <dc:creator>Jim Klimov</dc:creator>
    <dc:date>2012-05-24T15:53:11</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49228">
    <title>Re: How does resilver/scrub work?</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49228</link>
    <description>&lt;pre&gt;big assumption below...

On May 24, 2012, at 6:06 AM, Jim Klimov wrote:


This is a big assumption -- that the disk will operate normally, even
for data it cannot read. In my experience, this assumption is not valid
for the majority of HDD failure modes. Also, in the case of consumer-grade
disks, a single sector media error could take a very long time to retry/fail.


np 
 -- richard

--
ZFS Performance and Training
Richard.Elling&amp;lt; at &amp;gt;RichardElling.com
+1-760-896-4422







_______________________________________________
zfs-discuss mailing list
zfs-discuss&amp;lt; at &amp;gt;opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
&lt;/pre&gt;</description>
    <dc:creator>Richard Elling</dc:creator>
    <dc:date>2012-05-24T14:55:35</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49227">
    <title>Re: How does resilver/scrub work?</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49227</link>
    <description>&lt;pre&gt;Let me try to formulate my idea again... You called a similar
process "pushing the rope" some time ago, I think.

I feel like I'm passing some exam and am trying to pick answers
for a discipline like philosophy and I have no idea about the
examinator's preferences - is he an ex-Communism teacher or an
eager new religion fanatic? The same answer can lead to an A
or to an F on a state exam. Ah, that was some fun experience :)

Well, what we know is what remains after we forget everything
that we were taught, while the exams are our last chance to
learn something at all =)

2012-05-24 10:28, Richard Elling wrote:

Bigger-better-faster? ;)

The original proposal in this thread was about understanding
how resilvers and scrubs work, why they are so dog slow on
HDDs in comparison to sequential reads, and thinking aloud
what can be improved in this area.

One of the later posts was about improving the disk replacement
(where the original is still responsive, but may be imperfect)
for filled-up fragmented pools by in&lt;/pre&gt;</description>
    <dc:creator>Jim Klimov</dc:creator>
    <dc:date>2012-05-24T13:06:48</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49226">
    <title>Re: How does resilver/scrub work?</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49226</link>
    <description>&lt;pre&gt;
On May 23, 2012, at 2:56 PM, Jim Klimov wrote:


The FMA message is consistent with the man page.


Efficiency allows use of denominators other than time. Speed is restricted
to a denominator of time. There is no flame war here, look elsewhere.


Operationally, your method loses every time.



You have not made a case for why this hybrid and failure-prone 
procedure is required. What problem are you trying to solve?


Why not follow the well-designed existing procedure?


The failure data does not support your hypothesis.
 -- richard


&lt;/pre&gt;</description>
    <dc:creator>Richard Elling</dc:creator>
    <dc:date>2012-05-24T06:28:14</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49225">
    <title>Re: Migration of a Thumper to bigger HDDs</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49225</link>
    <description>&lt;pre&gt;
There were some pretty major fixes and new features added between
snv_117 and snv_134 (the last OpenSolaris release). It might be worth
updating to snv_134 at the very least.

-B

&lt;/pre&gt;</description>
    <dc:creator>Brandon High</dc:creator>
    <dc:date>2012-05-24T06:12:24</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49224">
    <title>Re: How does resilver/scrub work?</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49224</link>
    <description>&lt;pre&gt;Thanks again,

2012-05-24 1:01, Richard Elling wrote:

Indeed, even in snv_117 the zpool man page says that. But the
console/dmesg message was also quite clear, so go figure whom
to trust (or fear) more ;)

fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-GH, TYPE: Fault, 
VER: 1, SEVERITY: Major
EVENT-TIME: Wed May 16 03:27:31 MSK 2012
PLATFORM: Sun Fire X4500, CSN: 0804AMT023            , HOSTNAME: thumper
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: cc25a316-4018-4f13-c675-d1d84c6325c3
DESC: The number of checksum errors associated with a ZFS device
exceeded acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-GH for 
more information.
AUTO-RESPONSE: The device has been marked as degraded.  An attempt
will be made to activate a hot spare if available.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run 'zpool status -x' and replace the bad device.




Ummm... this is likely to start a flame war with other posters,
and you did not say what efficiency is to you? How can we compare
ap&lt;/pre&gt;</description>
    <dc:creator>Jim Klimov</dc:creator>
    <dc:date>2012-05-23T21:56:16</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49223">
    <title>Re: How does resilver/scrub work?</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49223</link>
    <description>&lt;pre&gt;

no


The man page is clear on this topic, IMHO

     DEGRADED
                 One or more top-level vdevs is in  the  degraded
                 state  because one or more component devices are
                 offline. Sufficient replicas exist  to  continue
                 functioning.

                 One or more component devices is in the degraded
                 or  faulted state, but sufficient replicas exist
                 to continue functioning. The  underlying  condi-
                 tions are as follows:

                     o    The number of checksum  errors  exceeds
                          acceptable  levels  and  the  device is
                          degraded as an  indication  that  some-
                          thing  may  be  wrong. ZFS continues to
                          use the device as necessary.



speed != efficiency


IMHO, this is too operationally complex for most folks. KISS wins.



What is it about error counters that frightens you enough to want to clear 
th&lt;/pre&gt;</description>
    <dc:creator>Richard Elling</dc:creator>
    <dc:date>2012-05-23T21:01:29</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49222">
    <title>Re: How does resilver/scrub work?</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49222</link>
    <description>&lt;pre&gt;
Thank you Richard for taking notice of this thread and the
definitive answers I needed not quote below, for further
questions ;)


Doesn't this status preclude the device with many CKSUM errors
from participating in the pool (TLVDEV) and the remainder of
the scrub in particular?

At least the textual error message infers that if a hotspare
were available for the pool, it would kick in and invalidate
the device I am scrubbing to update into the pool after the
DD-phase (well, it was not DD but a hung-up resilver in this
case, but that is not substantial).

Such automatic replacement is definitely not what I needed
in this particular case, so if it were to happen - it would
be a problem indeed, in and of itself.

 &amp;gt; dd, or simular dumb block copiers, should work fine.
 &amp;gt; However, they are inefficient...

Define efficient? In terms of transferring the 900Gb payload
of a 1Tb HDD used for ZFS for a year - DD would beat resilver
anytime, in terms of getting most or (less likely) all of the
valid bits with data ont&lt;/pre&gt;</description>
    <dc:creator>Jim Klimov</dc:creator>
    <dc:date>2012-05-23T20:32:35</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49221">
    <title>Re: How does resilver/scrub work?</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49221</link>
    <description>&lt;pre&gt;comments far below...

On May 22, 2012, at 1:42 AM, Jim Klimov wrote:


dd, or simular dumb block copiers, should work fine. However, they 
are inefficient and operationally difficult to manage, which is why they
tend to fall in the prefer-to-use-something-else catagory.


It shouldn't unless you did something to confuse it, such as having both
the original and the dd copy online at the same time. In that case, you
will have two different copies of the same identified device that are
independent. This is an operational mistake, hence my comment above.


Not needed.


DEGRADED is the status. You clear degraded states by fixing the problem
and running zpool clear. DEGRADED, in and of itself, is not a problem.


I would never allow such scripts in my site. It is important to track the 
progress and state changes. This script resets those counters for no
good reason.

I post this comment in the hope that future searches will not encourage 
people to try such things.
 -- richard

--
ZFS Performance and Training
R&lt;/pre&gt;</description>
    <dc:creator>Richard Elling</dc:creator>
    <dc:date>2012-05-23T16:54:14</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49220">
    <title>Re: How does resilver/scrub work?</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49220</link>
    <description>&lt;pre&gt;
I think this is at least in part an issue with older code.  There have
been various fixes for hangs/restarts/incomplete replaces and sparings
over the time since.  


I've done it a couple of times at least:

 * a failed disk in a raidz1, where i didn't trust that the other
   disks didn't also have errors.  Basically did a ddrescue from one
   disk to the new. I think these days, a 'replace' where the
   original disk is still online will use that content, like a
   hotspare replace, rather than assume it has gone away and must be
   recreated, but that wasn't the case at the time.

 * Where I had an iscsi mirror of a laptop hard disk, but it was out
   of date and had been detached when the laptop iscsi initiator
   refused to start.  Later, the disk developed a few bad sectors.  I
   made a new submirror, let it sync (with the error still), then
   blatted bits of the old image over the new in the areas where the
   bad sectors where being reported.  Scrub again, and they were fixed
   (as well as some b&lt;/pre&gt;</description>
    <dc:creator>Daniel Carosone</dc:creator>
    <dc:date>2012-05-23T00:00:31</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49219">
    <title>Re: How does resilver/scrub work?</title>
    <link>http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/49219</link>
    <description>&lt;pre&gt;
I got into similar situation last night on that Thumper -
it is now migrating a flaky source disk in the array from
an original old 250Gb disk into a same-sized partition on
the new 3Tb drive (as I outlined as IDEA7 in another thread).
The source disk itself had about 300 CKSUM errors during
the process, and for reasons beyond my current understanding,
the resilver never completed.

In zpool status it said that the process was done several
hours before the time I looked at it, but the TLVDEV still
had a "spare" component device comprised of the old disk
and new partition, and the (same) hotspare device in the
pool was "INUSE".

After a while we just detached the old disk from the pool
and ran scrub, which first found some 178 CKSUM errors on
the new partition right away, and degraded the TLVDEV and
pool.

We cleared the errors, and ran the script below to log
the detected errors and clear them, so the disk is fixed
and not kicked out of the pool due to mismatches.
Overall 1277 errors were logged and apparen&lt;/pre&gt;</description>
    <dc:creator>Jim Klimov</dc:creator>
    <dc:date>2012-05-22T08:42:02</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.os.solaris.opensolaris.zfs">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.os.solaris.opensolaris.zfs</link>
  </textinput>
</rdf:RDF>

