<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel">
    <title>gmane.os.netbsd.devel.kernel</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42149"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42148"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42147"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42146"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42145"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42144"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42143"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42142"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42141"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42140"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42139"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42138"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42137"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42136"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42135"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42134"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42133"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42132"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42131"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42130"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42149">
    <title>Re: raw/block device disc troughput</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42149</link>
    <description>&lt;pre&gt;

No, for small block sizes the overhead of the copyin() is more than offset 
by the larger buffercache block size.  And the I/O operation is 
asynchronous with respect to the write() system call.

With the character device the I/O operation must complete before the 
write() returns.  So the I/O operations cannot be combined and you suffer 
the overhead of each one.

Eduardoperations caq&lt;/pre&gt;</description>
    <dc:creator>Eduardo Horvath</dc:creator>
    <dc:date>2012-05-25T16:46:39</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42148">
    <title>Re: raw/block device disc troughput</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42148</link>
    <description>&lt;pre&gt;Thanks for the most insightful explanation!

Yes, sure. That's why I would have expected the raw device to outperform even at lower block sizes.
&lt;/pre&gt;</description>
    <dc:creator>Edgar Fuß</dc:creator>
    <dc:date>2012-05-25T09:53:03</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42147">
    <title>Re: raw/block device disc troughput</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42147</link>
    <description>&lt;pre&gt;But can't we safely assume reading /dev/zero taking zero time?
I've checked that indeed get about 10GB/s from /dev/zero.

&lt;/pre&gt;</description>
    <dc:creator>Edgar Fuß</dc:creator>
    <dc:date>2012-05-25T09:50:02</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42146">
    <title>Re: lwp resource limit</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42146</link>
    <description>&lt;pre&gt;
I'm not sure that the kern.maxproc shouldn't just be deleted.
Your maxprocperuid is just the default value for 'ulimit -h -p'
(or perhaps the soft limit...)

David

&lt;/pre&gt;</description>
    <dc:creator>David Laight</dc:creator>
    <dc:date>2012-05-24T20:49:14</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42145">
    <title>Re: raw/block device disc troughput</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42145</link>
    <description>&lt;pre&gt;

Depends... this case it's true.  physio() breaks the iov into chunks and 
allocates a buf for each chunk and calls the strategy() routine on each 
buf without waiting for completion.  So on a controller that does tagged 
queuing they run in parallel.

Eduardo

&lt;/pre&gt;</description>
    <dc:creator>Eduardo Horvath</dc:creator>
    <dc:date>2012-05-24T18:18:48</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42144">
    <title>Re: raw/block device disc troughput</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42144</link>
    <description>&lt;pre&gt;
Is this actually true?  For requests from userspace via the raw device,
does physio actually issue the smaller chunks in parallel?

Thor

&lt;/pre&gt;</description>
    <dc:creator>Thor Lancelot Simon</dc:creator>
    <dc:date>2012-05-24T18:10:31</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42143">
    <title>MPT inefficency (was: raw/block device disc troughput)</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42143</link>
    <description>&lt;pre&gt;Unfortunately, when I ordered these servers, NetBSD was lacking SAS 2008
support. Now that's been even pulled-up to 6.0_BETA, I could try a SAS2
controller.

&lt;/pre&gt;</description>
    <dc:creator>Edgar Fuß</dc:creator>
    <dc:date>2012-05-24T17:45:49</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42142">
    <title>Re: raw/block device disc troughput</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42142</link>
    <description>&lt;pre&gt;But I'm writing, not reading!

But would you expect an eightfold increas in troughput just by that?

I will test that. I can't right now because I have no physical access atm.

On an identical machine but with the RAID's SectorsPerSU=16, I get 99MByte/s
from the raw device at 16k blocks, 191 at 64k blocks, roughly the same at
1M blocks.
On the block device, I get 19MB/s independent of block size.
All with dd.

&lt;/pre&gt;</description>
    <dc:creator>Edgar Fuß</dc:creator>
    <dc:date>2012-05-24T17:38:35</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42141">
    <title>Re: raw/block device disc troughput</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42141</link>
    <description>&lt;pre&gt;

The protocol used to communicate between the CPU and the adapter is 
inefficient.  Not well designed.  They redesigned it for SAS2.


dd will send the kernel individual write operations.  sd and physio() will 
break them up into MAXPHYS chunks.  Each chunk will be queued at the 
HBA.  The HBA will dispatch them all as fast as it can.  Tagged queuing 
will overlap them.  

With smaller transfers, the setup overhead becomes significant and you see 
poor performance.

With large transfers (larger than MAXPHYS) the writes are split up into 
MAXPHYS chunks and the disk handles them in parallel, hence the 
performance increase even beyond MAXPHYS.

Also keep in mind:

When using the block device the data is copied from the process buffer 
into the buffer cache and the I/O happens from the buffer cache pages.

When using the raw device the I/O happens directly from process memory, no 
copying involved.

Eduardousing the raw &lt;/pre&gt;</description>
    <dc:creator>Eduardo Horvath</dc:creator>
    <dc:date>2012-05-24T17:31:43</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42140">
    <title>Re: raw/block device disc troughput</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42140</link>
    <description>&lt;pre&gt;Fat fingers. of=, or course.

Ah, good.

You mean the protocol the main CPU uses to communicate with an MPT adapter is
inefficient? Or do you mean SAS is inefficient?

I'm sorry, I'm unable to conclude why this explains my results.

Sometimes, probably.

Sorry, RAID 1 is mirroring.
So, everything needs to be sent to two discs in parallel. I can't believe
the CPU-HBA communication being a bottleneck?

&lt;/pre&gt;</description>
    <dc:creator>Edgar Fuß</dc:creator>
    <dc:date>2012-05-24T17:21:12</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42139">
    <title>Re: raw/block device disc troughput</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42139</link>
    <description>&lt;pre&gt;
(I've been assuming od= should be of=)


I thought of that too, but didn't mention it, because it's not
relevant.  dd isn't reading from the disk; it's writing to it.


I suspect this is a significant effect.  I was once using dd to copy
from one disk to another, and both drives happened to have activity
lights on them.  Watching each drive wait for the other convinced me dd
is an inefficient way to do that.  I built a program that uses two
processes, one reading and one writing, with a large chunk of memory
shared between them for buffer space.  Disk-to-disk copies (not on the
same spindle) got significantly faster. :)

In this case, dd has to block after each disk write to wait for its
buffer to be (unnecessarily, as it happens, though it can't know that)
zeroed for the next write.  This both imposes additional delay and
enforces a lack of overlap between each write and the next.

I speculate that the cooked device helps because it means that dd's
write finishes when the bits are in the buffer cache, rath&lt;/pre&gt;</description>
    <dc:creator>Mouse</dc:creator>
    <dc:date>2012-05-24T17:16:43</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42138">
    <title>Re: raw/block device disc troughput</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42138</link>
    <description>&lt;pre&gt;Ah, sorry: amd64.

&lt;/pre&gt;</description>
    <dc:creator>Edgar Fuß</dc:creator>
    <dc:date>2012-05-24T17:13:39</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42137">
    <title>Re: raw/block device disc troughput</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42137</link>
    <description>&lt;pre&gt;
The block device will cause readahead at the OS layer.  Since you are
accessing the disk sequentially, this will have a significant effect --
evidently greater than the overhead caused by memory allocation in the
cache layer under the block device.

I suspect that if you double-buffered at the client application layer
this effect might disappear, since the drive itself will already readahead,
and if we can present it with enough requests at once, it should return
the results simultaneously.  Plain dd on the raw device will not do that
since it waits for every read to complete before issuing another, thus
increasing latency and reducing the number of transactions the drive can
effectively overlap.


The increase is again tied to less latency in the synchronous dd read-write
loop.  The kernel breaks the large request down to many MAXPHYS sized ones
and dispatches each in turn.  I can't remember whether it is really
asynchronous or whether it waits for each request to complete before issuing
the next; if the f&lt;/pre&gt;</description>
    <dc:creator>Thor Lancelot Simon</dc:creator>
    <dc:date>2012-05-24T17:00:45</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42136">
    <title>Re: raw/block device disc troughput</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42136</link>
    <description>&lt;pre&gt;

What's "od="?


Not awfully surprizing given your setup.  Keep in mind mpt uese a rather 
inefficient communication protocol and does tagged queuing.  The former 
means the overhead for each command is not so good, but the latter means 
it can keep lots of commands in the air at the same time. 


Now you're just complicating things 8^).

Let's see, RAID 1 is striping.  That means all operations are broken at 
64K boundaries so they can be sent to different disks.  And split 
operations need to wait for all the devices to complete before the master 
operation can be completed.  I expect you would probably get some rather 
unusual non-linear behavior in this sort of setup.  

Eduardomaster=
=20
ope&lt;/pre&gt;</description>
    <dc:creator>Eduardo Horvath</dc:creator>
    <dc:date>2012-05-24T16:59:15</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42135">
    <title>Re: raw/block device disc troughput</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42135</link>
    <description>&lt;pre&gt;
Mostly I have nothing useful to say here.  But...


There is at least one aspect of performance that will not be cut off by
MAXPHYS, that being syscall overhead.  I don't know your system (you
don't say which port you're running on, for example), but if syscall
overhead for your hardware is not ignorably small compared to the costs
of doing the disk transfer, then doing one syscall per 256K will be
four times as costly in syscall overhead as doing one syscall per 1M,
even if it is four times as costly in disk transfers.

/~\ The ASCII  Mouse
\ / Ribbon Campaign
 X  Against HTMLmouse&amp;lt; at &amp;gt;rodents-montreal.org
/ \ Email!     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

&lt;/pre&gt;</description>
    <dc:creator>Mouse</dc:creator>
    <dc:date>2012-05-24T16:48:19</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42134">
    <title>Re: raw/block device disc troughput</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42134</link>
    <description>&lt;pre&gt;

What is the partition alignment on the raid?

There's a discussion of the alignment impact on my raidframe how-to on 
the NetBSD WiKi.


-------------------------------------------------------------------------
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer |                          | pgoyette at netbsd.org  |
------------------------------------------------------------------------- Developer | !&lt;/pre&gt;</description>
    <dc:creator>Paul Goyette</dc:creator>
    <dc:date>2012-05-24T16:42:26</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42133">
    <title>raw/block device disc troughput</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42133</link>
    <description>&lt;pre&gt;It seems that I have to update my understanding of raw and block devices
for discs.

Using a (non-recent) 6.0_BETA INSTALL kernel and an ST9146853SS 15k SAS disc
behind an LSI SAS 1068E (i.e. mpt(4)), I did a
dd if=/dev/zero od=/dev/[r]sd0b bs=nn, count=xxx.
For the raw device, the troughput dramatically increased with the block size:
Block size16k64k256k1M
Troughput (MByte/s)41549112
For the block device, throughput was around 81MByte/s independent of block size.

This surprised me in two ways:
1. I would have expected the raw device to outperform the block devices
   with not too small block sizes.
2. I would have expected inceasing the block size above MAXPHYS not
   improving the performance.

So obviously, my understanding is wrong.


I then build a RAID 1 with SectorsPerSU=128 (e.g. a 64k stripe size) on two
of these discs, and, after the parity initialisation was complete, wrote
to [r]raid0b.
On the raw device, throghput ranged from 4MByte/s to 97MByte/s depending on bs.
On the block devic&lt;/pre&gt;</description>
    <dc:creator>Edgar Fuß</dc:creator>
    <dc:date>2012-05-24T16:26:45</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42132">
    <title>Re: mlockall() and small memory systems</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42132</link>
    <description>&lt;pre&gt;

Not necessarily.  If the code does mlockall() with the MCL_FUTURE flag 
then any *new* pages are also supposed to be locked into memory.  So if 
the process address space at the time the mlockall() is done is smaller 
than the locked memory limit, the mlockall() should succeed.  

But later if the mmap() makes the total process address space exceed the 
mmap() call should return an error instead of succeeding.  I think there 
may be a bug in mmap().  You should check to see if the mmap() causes the 
process to cross the locked memory limit.

Eduardo

&lt;/pre&gt;</description>
    <dc:creator>Eduardo Horvath</dc:creator>
    <dc:date>2012-05-24T15:31:49</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42131">
    <title>Re: lwp resource limit</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42131</link>
    <description>&lt;pre&gt;In article &amp;lt;20120524090652.GC8483&amp;lt; at &amp;gt;britannica.bec.de&amp;gt;,
Joerg Sonnenberger  &amp;lt;joerg&amp;lt; at &amp;gt;britannica.bec.de&amp;gt; wrote:

Aside that having a separate limit is more flexible, and that current
maxproc is too small no. The current usage is that most processes have
just one thread, so what you are saying is true. It is simple to do either,
what do others think?

As a tangent, I also like what macosx did splitting maxproc and maxprocperuid.

christos


&lt;/pre&gt;</description>
    <dc:creator>Christos Zoulas</dc:creator>
    <dc:date>2012-05-24T14:14:27</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42130">
    <title>Re: lwp resource limit</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42130</link>
    <description>&lt;pre&gt;&lt;/pre&gt;</description>
    <dc:creator>Christos Zoulas</dc:creator>
    <dc:date>2012-05-24T12:52:56</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42129">
    <title>Re: lwp resource limit</title>
    <link>http://permalink.gmane.org/gmane.os.netbsd.devel.kernel/42129</link>
    <description>&lt;pre&gt;
Any reason for not using kern.maxproc as (hard) default limit?
It doesn't make sense to allow less threads than processes and I'm not
sure it is necessary to allow more threads, except that ulimit -p is
kind of small.

Joerg

&lt;/pre&gt;</description>
    <dc:creator>Joerg Sonnenberger</dc:creator>
    <dc:date>2012-05-24T09:06:52</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.os.netbsd.devel.kernel">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.os.netbsd.devel.kernel</link>
  </textinput>
</rdf:RDF>

