<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.culture.people.kragen.thinking">
    <title>gmane.culture.people.kragen.thinking</title>
    <link>http://blog.gmane.org/gmane.culture.people.kragen.thinking</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/210"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/209"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/208"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/207"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/206"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/205"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/204"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/203"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/202"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/201"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/200"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/200"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/199"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/198"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/197"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/196"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/195"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/192"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/190"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.culture.people.kragen.thinking/189"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/210">
    <title>a possibly novel error-correction code with a relationship to theHadamard transform</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/210</link>
    <description>&lt;pre&gt;Last night, I encoded some messages in ASCII by hand, including
error-correction coding: like the printer dots Seth Schoen decoded,
matrices of 8 columns of 8 bits each, with an extra column and row for
parity.  (I used even parity, but odd would have been a better
choice.)  Matrix parity lets you correct single-bit errors: a
single-bit error will show up as a parity error in both the row parity
and the column parity.  And it's simple enough and spatial enough that
you can both detect and correct errors by hand.

This got me to thinking about what kind of more powerful
error-correction coding might be feasible to perform by hand, and I
think I came up with one, which at first glance appears to be related
to the Hadamard transform, but in the end seems to be something
simpler.

Existing Hadamard-code work is quite different
----------------------------------------------

For many years, Hadamard matrices have been used for error-correction
coding by the simple expedient of using a row of the matrix for each
codeword; since the rows are perfectly uncorrelated, they have a large
Hamming distance, and the Fast Hadamard Transform of the received
(possibly corrupted) codeword gives you the Hamming distance from all
the N codewords in 2 N lg N additions and subtractions.  (This is
now apparently obsoleted by better codes.)

A problem with this for human use is that it's not just some checksum
symbols you glom onto your data; it makes your data totally
unrecognizable.

I want to discuss a different application of Hadamard matrices for
error-correction codes.  As far as I can tell, this is not related to
"Hadamard-Craigen error correcting codes" (Craigen 2002), and in the
unlikely event that these codes become popular, I hope that nobody
decides to call them "Hadamard-Kragen codes".

The Hadamard transform and its relevance briefly explained
----------------------------------------------------------

The Hadamard transform of a vector is a "holographic representation"
of the vector in the sense that the vector can be recovered from it
(by repeating the Hadamard transform and dividing by the determinant)
and that every value in the Hadamard transform is affected by every
element in the original vector.  This means that any local change in
the original vector results in a global change in the
Hadamard-transformed vector --- in the case of the Hadamard transform,
*every* element changes.

(For simplicity, I'm going to consider only dimension-`2**n` Hadamard
matrices produced by the Sylvester method here, although others
exist.)

The Hadamard transform is particularly interesting among such
holographic representations because the Fast Hadamard Transform
enables computing the HT very efficiently.

The fast Hadamard transform of a vector or characters, illustrated
------------------------------------------------------------------

So suppose you have a vector of characters, say, 'Hadamard'.  You
compute the Fast Hadamard Transform of this vector:

    def hadamard(vec):
        if len(vec) == 1: return vec
        pairs = zip(hadamard(vec[:len(vec)/2]),
                    hadamard(vec[len(vec)/2:]))
        return ([lv + rv for lv, rv in pairs] +
                [lv - rv for lv, rv in pairs])

    &amp;gt;&amp;gt;&amp;gt; hadamard(map(ord, 'Hadamard'))
    [786, 4, -36, -30, -54, -48, -20, -26]

Note that a local change in the input produces a linear global change
in the output:

    &amp;gt;&amp;gt;&amp;gt; hadamard(map(ord, 'Hadamarc'))
    [785, 5, -35, -31, -53, -49, -21, -25]
    &amp;gt;&amp;gt;&amp;gt; hadamard(map(ord, 'Hacamarc'))
    [784, 4, -34, -30, -54, -50, -20, -24]

And you can reconstruct the original input vector from the
Hadamard-transformed version:

    &amp;gt;&amp;gt;&amp;gt; ''.join(chr(c/8) for c in hadamard(hadamard(map(ord, 'Hadamard'))))
    'Hadamard'

And, naturally, local errors in the Hadamard-transformed version
result in linear global errors in the result:

    &amp;gt;&amp;gt;&amp;gt; ''.join(chr(c/8) for c in hadamard([786,4,-36,-30,-54,-48,-20,-26]))
    'Hadamard'
    &amp;gt;&amp;gt;&amp;gt; ''.join(chr(c/8) for c in hadamard([778,4,-36,-38,-54,-48,-20,-26]))
    'Fad_karb'
    &amp;gt;&amp;gt;&amp;gt; ''.join(chr(c/8) for c in hadamard([700,4,-36,-38,-54,-48,-20,-26]))
    '&amp;lt;WZUaWhX'

How to use this for a human-friendly error-correction code
----------------------------------------------------------

So suppose you want to be able to correct errors.  You can pick some
subset of the Hadamard transform to transmit along with your raw data.
If there's a single-character error, it will show up in every
transformed value you transmit, but with varying signs; each sign
after the first gives you one bit of information about where the error
is --- *exactly* one bit, I thought, since the rows of the Hadamard
matrix are *perfectly* uncorrelated; so you should be able to localize
the error with lg N bits.  (That turns out to be false, as
demonstrated below; it matters a lot which rows you pick.)  In fact,
since the whole transform is linear, you should be able to just run
the inverse transform on your error signal and get the location of the
error.

Using the inverse transform on arbitrary outputs to localize errors works poorly
--------------------------------------------------------------------------------

Let's try:

    &amp;gt;&amp;gt;&amp;gt; orig = hadamard(map(ord, 'Hadamard'))[:3]
    &amp;gt;&amp;gt;&amp;gt; computed = hadamard(map(ord, 'Hagamard'))[:3]
    &amp;gt;&amp;gt;&amp;gt; [computed_i - orig_i for computed_i, orig_i in zip(computed, orig)]
    [3, 3, -3]
    &amp;gt;&amp;gt;&amp;gt; hadamard([computed_i - orig_i
                  for computed_i, orig_i
                  in zip(computed, orig)] + [0] * 5)
    [3, -3, 9, 3, 3, -3, 9, 3]

Not quite.  We got a nice peak in the transformed error signal at the
location of the actual error, it's true, but we got an equally big one
elsewhere.  It doesn't work with four values either:

    &amp;gt;&amp;gt;&amp;gt; orig = hadamard(map(ord, 'Hadamard'))[:4]
    &amp;gt;&amp;gt;&amp;gt; computed = hadamard(map(ord, 'Hagamard'))[:4]
    &amp;gt;&amp;gt;&amp;gt; hadamard([computed_i - orig_i
                  for computed_i, orig_i
                  in zip(computed, orig)] + [0] * 4)
    [0, 0, 12, 0, 0, 0, 12, 0]

It does work with five:

    &amp;gt;&amp;gt;&amp;gt; computed = hadamard(map(ord, 'Hagamard'))[:5]
    &amp;gt;&amp;gt;&amp;gt; orig = hadamard(map(ord, 'Hadamard'))[:5]
    &amp;gt;&amp;gt;&amp;gt; hadamard([computed_i - orig_i 
                  for computed_i, orig_i
                  in zip(computed, orig)] + [0] * 3)
    [3, 3, 15, 3, -3, -3, 9, -3]

And of course the first value tells us the sign and magnitude of the
actual error:

    &amp;gt;&amp;gt;&amp;gt; computed[0] - orig[0]
    3

What if we pick different values, like the ones from the end?  We
still need five of them to locate the error:

    &amp;gt;&amp;gt;&amp;gt; orig = hadamard(map(ord, 'Hadamard'))[-3:]
    &amp;gt;&amp;gt;&amp;gt; computed = hadamard(map(ord, 'Hagamard'))[-3:]
    &amp;gt;&amp;gt;&amp;gt; computed
    [-45, -23, -29]
    &amp;gt;&amp;gt;&amp;gt; hadamard([0] * 5 + [computed_i - orig_i 
                            for computed_i, orig_i
                            in zip(computed, orig)])
    [-3, -3, 9, -3, 3, 3, -9, 3]
    &amp;gt;&amp;gt;&amp;gt; orig = hadamard(map(ord, 'Hadamard'))[-4:]
    &amp;gt;&amp;gt;&amp;gt; computed = hadamard(map(ord, 'Hagamard'))[-4:]
    &amp;gt;&amp;gt;&amp;gt; hadamard([0] * 4 + [computed_i - orig_i
                            for computed_i, orig_i
                            in zip(computed, orig)])
    [0, 0, 12, 0, 0, 0, -12, 0]
    &amp;gt;&amp;gt;&amp;gt; orig = hadamard(map(ord, 'Hadamard'))[-5:]
    &amp;gt;&amp;gt;&amp;gt; computed = hadamard(map(ord, 'Hagamard'))[-5:]
    &amp;gt;&amp;gt;&amp;gt; hadamard([0] * 3 + [computed_i - orig_i
                            for computed_i, orig_i
                            in zip(computed, orig)])
    [-3, 3, 15, -3, -3, 3, -9, -3]

How about a different error?

    &amp;gt;&amp;gt;&amp;gt; computed = hadamard(map(ord, 'Hadamerd'))[-5:]
    &amp;gt;&amp;gt;&amp;gt; hadamard([0] * 3 + [computed_i - orig_i 
                            for computed_i, orig_i 
                            in zip(computed, orig)])
    [-4, -12, 4, -4, -4, 20, 4, -4]

How about an error in the transformed values?  That produces a
distinctly different pattern:

    &amp;gt;&amp;gt;&amp;gt; computed = [0] + orig[1:]
    &amp;gt;&amp;gt;&amp;gt; hadamard([0] * 3 + [computed_i - orig_i
                            for computed_i, orig_i
                            in zip(computed, orig)])
    [30, -30, -30, 30, 30, -30, -30, 30]

It seems like it ought to be possible to correct multiple errors.

How about a longer text?  Correcting one error in eight characters by
appending five error-check symbols is not at all impressive.  Can we
correct an error in a 64-character text with 11 Hadamard results?
(Not the way I've been doing it, but see later sections for
successfully doing it with 7.)

    &amp;gt;&amp;gt;&amp;gt; text='correct an error in a 64-character text with 11 Hadamard results'
    &amp;gt;&amp;gt;&amp;gt; len(text)
    64
    &amp;gt;&amp;gt;&amp;gt; orig = hadamard(map(ord, text))[:11]
    &amp;gt;&amp;gt;&amp;gt; errtext = text[:20] + 'x' + text[21:]
    &amp;gt;&amp;gt;&amp;gt; errtext
    'correct an error in x 64-character text with 11 Hadamard results'
    &amp;gt;&amp;gt;&amp;gt; computed = hadamard(map(ord, errtext))[:11]
    &amp;gt;&amp;gt;&amp;gt; orig
    [5806, -74, 100, -304, 68, -576, -166, -78, -170, 310, 212]
    &amp;gt;&amp;gt;&amp;gt; computed
    [5829, -51, 123, -281, 45, -599, -189, -101, -147, 333, 235]
    &amp;gt;&amp;gt;&amp;gt; hadamard([computed_i - orig_i
                  for computed_i, orig_i
                  in zip(computed, orig)] + [0] * 53)
    [69, 23, 23, -23, 253, 23, 23, -23, -69, -23, -23, 23, 115, -23,
    -23, 23, 69, 23, 23, -23, 253, 23, 23, -23, -69, -23, -23, 23,
    115, -23, -23, 23, 69, 23, 23, -23, 253, 23, 23, -23, -69, -23,
    -23, 23, 115, -23, -23, 23, 69, 23, 23, -23, 253, 23, 23, -23,
    -69, -23, -23, 23, 115, -23, -23, 23]

No, it identifies four equally likely locations for the error.  13?

    &amp;gt;&amp;gt;&amp;gt; orig = hadamard(map(ord, text))[:13]
    &amp;gt;&amp;gt;&amp;gt; computed = hadamard(map(ord, errtext))[:13]
    &amp;gt;&amp;gt;&amp;gt; hadamard([computed_i - orig_i
                  for computed_i, orig_i
                  in zip(computed, orig)] + [0] * 51)
    [69, -23, -23, -23, 299, 23, 23, 23, -69, 23, 23, 23, 69, -23,
    -23, -23, 69, -23, -23, -23, 299, 23, 23, 23, -69, 23, 23, 23, 69,
    -23, -23, -23, 69, -23, -23, -23, 299, 23, 23, 23, -69, 23, 23,
    23, 69, -23, -23, -23, 69, -23, -23, -23, 299, 23, 23, 23, -69,
    23, 23, 23, 69, -23, -23, -23]

No, mysteriously the new Hadamard results didn't give us any new
information about the location of the error; we still have four
equally likely locations.  It's as if the relevant Hadamard rows are
somehow correlated with the basis comprised of the rows we have
already, though not any of those rows individually.

It turns out that better choices are available:

    &amp;gt;&amp;gt;&amp;gt; orig_full = hadamard(map(ord, text))
    &amp;gt;&amp;gt;&amp;gt; computed_full = hadamard(map(ord, errtext))
    &amp;gt;&amp;gt;&amp;gt; orig = orig_full[:11] + [0] * 51 + orig_full[-2:]
    &amp;gt;&amp;gt;&amp;gt; computed = computed_full[:11] + [0] * 51 + computed_full[-2:]
    &amp;gt;&amp;gt;&amp;gt; hadamard([computed_i - orig_i 
                  for computed_i, orig_i
                  in zip(computed, orig)])
    [115, 23, -23, -23, 207, 23, 69, -23, -115, -23, 23, 23, 161, -23,
    -69, 23, 23, 23, 69, -23, 299, 23, -23, -23, -23, -23, -69, 23,
    69, -23, 23, 23, 23, 23, 69, -23, 299, 23, -23, -23, -23, -23,
    -69, 23, 69, -23, 23, 23, 115, 23, -23, -23, 207, 23, 69, -23,
    -115, -23, 23, 23, 161, -23, -69, 23]

That reduces the choices to two.  What if we choose by a different
approach?

    &amp;gt;&amp;gt;&amp;gt; orig = orig_full[::5]
    &amp;gt;&amp;gt;&amp;gt; len(orig)
    13
    &amp;gt;&amp;gt;&amp;gt; computed = computed_full[::5]
    &amp;gt;&amp;gt;&amp;gt; orig_expanded = [0] * 64
    &amp;gt;&amp;gt;&amp;gt; orig_expanded[::5] = orig
    &amp;gt;&amp;gt;&amp;gt; orig_expanded
    [5806, 0, 0, 0, 0, -576, 0, 0, 0, 0, 212, 0, 0, 0, 0, 86, 0, 0, 0,
    0, 438, 0, 0, 0, 0, -108, 0, 0, 0, 0, -220, 0, 0, 0, 0, 4, 0, 0,
    0, 0, -358, 0, 0, 0, 0, 96, 0, 0, 0, 0, -102, 0, 0, 0, 0, 120, 0,
    0, 0, 0, 406, 0, 0, 0]
    &amp;gt;&amp;gt;&amp;gt; computed_expanded = [0] * 64
    &amp;gt;&amp;gt;&amp;gt; computed_expanded[::5] = computed
    &amp;gt;&amp;gt;&amp;gt; hadamard([computed_i - orig_i 
                  for computed_i, orig_i
                  in zip(computed_expanded, orig_expanded)])
    [69, 161, -23, 161, 23, -69, 23, 23, 23, -161, 23, 115, -23, -23,
    69, -23, -23, 69, -23, -23, 299, 23, 23, 23, 23, 23, -69, 23, -23,
    -23, -23, 69, -23, 161, -23, -115, 23, 23, -69, 23, 23, 115, 115,
    115, 69, -23, 69, -115, -23, -23, 69, -23, 23, 23, 23, -69, -69,
    23, -69, 115, -23, 69, 253, 69]

That has a max, 299, at the right spot, with "only" 13
Hadamard-transformed values to give us six bits of error-location
information.

So, although this can be made to work, it doesn't perform as well as I
had hoped, and I didn't really understand why.  I did find out how to
stop it; see later sections.

Using the assumption that there's just one error doesn't help
-------------------------------------------------------------

But that approach is, in essence, attempting to find an estimate of an
*arbitrary* difference between the two vectors.  But we're assuming
that the difference here comes from a transmission error corrupting a
single letter.

What if we just use the signs of the differences, and try to figure
out which single input position could have caused that pattern of
signs?  That doesn't work either with poor choices of rows:

    sgn = lambda x: 1 if x &amp;gt; 0 else 0 if x == 0 else -1
    matrix = lambda n: [hadamard([i==j for i in range(n)]) for j in range(n)]
    def find(prefix, items):
        for ii, item in enumerate(items):
            if item[:len(prefix)] == prefix: yield ii

    for i in range(64):
         print i, list(find([sgn(computed_i - orig_i) 
                             for computed_i, orig_i 
                             in zip(computed_full[:i], orig_full[:i])], 
                            matrix(64)))

    0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
       50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]
    ...
    2 [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,
       34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62]
    3 [0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60]
    4 [0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60]
    5 [4, 12, 20, 28, 36, 44, 52, 60]
    ...
    9 [4, 20, 36, 52]
    ...
    17 [20, 52]
    ...
    33 [20]
    ...

Picking particularly helpful rows of the matrix *does* help
-----------------------------------------------------------

Somehow, almost all of the additional rows we get are adding no new
information.  Rows 0, 1, 2, 4, 8, 16, and 32 apparently added a bit
each.  (I say row 0 because we need the information about which
direction the change perturbed the input in order to interpret the
others.)  What if we use just those rows?

    select = lambda x: [x[i] for i in [0, 1, 2, 4, 8, 16, 32]]
    &amp;gt;&amp;gt;&amp;gt; len(set(tuple(select(row)) for row in matrix(64)))
    64

This looks promising!  Compare to:

    &amp;gt;&amp;gt;&amp;gt; len(set(tuple(row[:5]) for row in matrix(64)))
    8

And it works, see?

    &amp;gt;&amp;gt;&amp;gt; list(find([sgn(computed_i - orig_i)
                   for computed_i, orig_i 
                   in zip(select(computed_full), select(orig_full))],
                  [select(row) for row in matrix(64)]))
    [20]

So there we have seven Hadamard values that successfully localized the
error using only their signs.  (If the change were negative, we'd've
had to try `-sgn` as well as `sgn`; you could presumably avoid that by
using the pattern of sign *changes* instead of signs).  What if we use
the reverse-transform approach with these values instead of the
arbitrary ones we were using before?

    def expand(vals):
        rv = [0] * 64
        for val, pos in zip(vals, [0, 1, 2, 4, 8, 16, 32]): rv[pos] = val
        return rv

    hadamard(expand(computed_i - orig_i
                    for computed_i, orig_i 
                    in zip(select(computed_full), select(orig_full))))

    [69, 23, 23, -23, 115, 69, 69, 23, 23, -23, -23, -69, 69, 23, 23,
    -23, 115, 69, 69, 23, 161, 115, 115, 69, 69, 23, 23, -23, 115, 69,
    69, 23, 23, -23, -23, -69, 69, 23, 23, -23, -23, -69, -69, -115,
    23, -23, -23, -69, 69, 23, 23, -23, 115, 69, 69, 23, 23, -23, -23,
    -69, 69, 23, 23, -23]

And that does indeed have a clear peak of 161 at position 20, which is
where the error is.

What if we have a different error?  That seems to work too:

    &amp;gt;&amp;gt;&amp;gt; errtext = text[:35] + '!' + text[36:]
    &amp;gt;&amp;gt;&amp;gt; errtext
    'correct an error in a 64-character !ext with 11 Hadamard results'
    &amp;gt;&amp;gt;&amp;gt; computed_full = hadamard(map(ord, errtext))
    &amp;gt;&amp;gt;&amp;gt; select(computed_full)
    [5723, 9, 183, -15, -253, -55, 5]
    &amp;gt;&amp;gt;&amp;gt; select(orig_full)
    [5806, -74, 100, 68, -170, 28, -78]

    ...     hadamard(expand(computed_i - orig_i
    ...                     for computed_i, orig_i 
    ...                     in zip(select(computed_full), select(orig_full))))

    [-83, -249, -249, -415, 83, -83, -83, -249, 83, -83, -83, -249,
    249, 83, 83, -83, 83, -83, -83, -249, 249, 83, 83, -83, 249, 83,
    83, -83, 415, 249, 249, 83, -249, -415, -415, -581, -83, -249,
    -249, -415, -83, -249, -249, -415, 83, -83, -83, -249, -83, -249,
    -249, -415, 83, -83, -83, -249, 83, -83, -83, -249, 249, 83, 83,
    -83]

And indeed there's a negative peak at position 35, where all seven
differences of 83 ('t' - '!') line up with the same sign, producing
-581.  And using the approach of just looking at the signs?  That
works too, but since it's a negative change, we have to negate the
signs:

    &amp;gt;&amp;gt;&amp;gt; list(find([sgn(computed_i - orig_i)
    ...                    for computed_i, orig_i 
    ...                    in zip(select(computed_full), select(orig_full))],
    ...                   [select(row) for row in matrix(64)]))
    []
    &amp;gt;&amp;gt;&amp;gt; list(find([-sgn(computed_i - orig_i)
    ...                    for computed_i, orig_i 
    ...                    in zip(select(computed_full), select(orig_full))],
    ...                   [select(row) for row in matrix(64)]))
    [35]

This is very far from a proof that the method works, but it's
suggestive.  Here are the rows of `select(matrix(64))`:

    [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

    [1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1,
    -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1,
    1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1,
    -1, 1, -1, 1, -1, 1, -1]

    [1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1,
    -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1,
    -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1,
    1, -1, -1, 1, 1, -1, -1]

    [1, 1, 1, 1, -1, -1, -1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1, 1, 1,
    1, -1, -1, -1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1, 1, 1, 1, -1, -1,
    -1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1,
    1, 1, 1, -1, -1, -1, -1]

    [1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, 1, 1, 1,
    1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1,
    1, 1, -1, -1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, -1,
    -1, -1, -1, -1, -1, -1, -1]

    [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1,
    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
    -1, -1, -1, -1, -1, -1, -1]

    [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
    -1, -1, -1, -1, -1, -1, -1]

These appear to be precisely the most regular rows, the ones with
sequencies 0, 63, 31, 15, 7, 3, and 1.  You certainly don't need a
Hadamard matrix to compute those!  It's closely related to the matrix
parity code I started with --- but instead of XOR, you're using
addition, and instead of two dimensions, you're in 6, and instead of
separately recording the totals modulo whatever of two 5-dimensional
hyperplanes, you're recording only the difference between them.

So maybe this is already known under some other name!

Using finite fields instead of the integers?
--------------------------------------------

I did all the above in the ring Z, the integers, which has the problem
that you need a potentially unbounded number of bits to represent its
members.

All of this also works over finite fields such as Z/7, the integers
mod 7, as well as the integers themselves.  In fact, it isn't even
necessary to have a full field, which is why it's possible to do it in
Z at all; you only need to be able to divide by the power of two that
is your vector length.  Unfortunately, this eliminates the most
desirable group for computation, Z/256 or in general `Z/2**n`, since
it will lose the upper bits of each character.

One option would be to use Z/257, the integers mod 257.  Then you can
represent the Hadamard transform results in a byte each, plus a
usually-empty list of which transform results are equal to 256; or in
9 bits each.

Another somewhat more computationally expensive option, which might
avoid that additional unnecessary message expansion, is to represent
the entire message in some other finite field: start by interpreting
it as a large number in base 256, then transform it into a sequence of
digits in e.g. Z/7 or Z/257, then do the Hadamard computation in that
field.

Using a field with fewer members makes the message longer, which
improves the ratio lg N/N, which I strongly suspect is proportional to
the number of Hadamard results you need to include to localize and
correct a single error.

How do you find the peaks in the transformed error signal in Z/p,
where big numbers can wrap around to be small ones again, and division
can easily produce big numbers?  I have no idea.  But the
sign-sequence-lookup approach should work, even though "sign" itself
is not a well-defined concept in Z/p.
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2013-03-03T22:18:28</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/209">
    <title>a logarithmic-time alternative to summed-area tables for reducingarbitrary semigroup operations over arbitrary ranges (a generalizationof RMQ segment trees)</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/209</link>
    <description>&lt;pre&gt;# Summary #

There's an alternative to summed-area tables with a small, linear
space cost and linear construction time and space, providing worst-case
logarithmic-time reduction over arbitrary intervals under arbitrary
semigroup operations, and which supports updates efficiently, unlike
summed-area tables.  The algorithm is like twenty fricking lines of code
if you leave out the "update" and "small" parts.

I am surely not the original discoverer of any of this.

# Explanation #

## Introduction to abstract algebra ##

Abstract algebra is the study of what you can deduce from minimal sets
of axioms about some set of things and operations on them.  A lot of it
seems to be taxonomic, assigning names to particular sets of axioms.
This is cool because once you establish that, say, 32-bit bitstrings
form a semilattice under the bitwise-OR operation, you can apply every
theorem that anybody's ever proven about semilattices to 32-bit
bitstrings and OR.

In particular, as I think Stepanov first realized, the correctness of an
algorithm depends on these algebraic properties of the data it's
manipulating.  (Typically it also depends on operations being
computable, and its efficiency depends on the complexity of that
computation, which are perhaps unfortunately not within the purview of
abstract algebra.)

A magma is a set associated with a binary operation that's closed over
that set.

A semigroup is an associative magma.  An example is the set of nonempty
finite strings over some alphabet, with concatenation as the binary
operation.

A semilattice is a semigroup whose operation is commutative and
idempotent; in general a semilattice is a partially ordered set of some
kind, with a unique smallest element, where the binary operation is the
operation of finding the largest upper bound of the elements.  Aside
from the obvious examples of totally ordered sets like integers, things
like 32-bit bitstrings under bitwise OR or AND form semilattices.

A monoid is a semigroup with identity.  String concatenation is the
usual example; the empty string is the identity element.

A group is a monoid where every element has inverse for every element.
It's sufficient to have a left inverse for every element; from that you
can get identity (I think!) and right inverse.

## Summed-area tables ##

Franklin Crow's 1984 paper, "Summed-area tables for texture mapping"
calls them "summed-area tables", and Graphics Gems called them "sum
tables".  More recently, they're known as "integral images".  In the
one-dimensional case, they allow you to calculate the sum of values in
an arbitrary interval in constant time by subtracting the values from
the summed-area table at the ends of the interval: sum(f[m:n]) =
-sat(f)[m] + sat(f)[n], where sat(f)[i] = sum(f[0:i]), assuming f's
indexes start at 0.

### N-dimensional case ###

You can compute an N-dimensional sum table; sat(f)[i0, i1, ... in] is
sum(f[0:i0, 0:i1, ... 0:in]).  In some interesting sense, more
dimensions makes it more powerful: the set of queries that can be
answered in constant time grows exponentially with the number of
dimensions, while the constant-time factor only grows linearly with the
number of dimensions.

### Decimation ###

As an extension, you can use a decimated summed-area table, with
values only present every (e.g.) 16th or 32nd index, without losing
the constant-time property.  You may have to consult the original
array, but only up to 2*(16-1) or 2*(32-1) values of it, which is
constant.  If you needed to keep the original array around anyway, this
dramatically reduces the space cost of the technique without slowing it
down too much, which (I speculate) might actually make it faster.

N-dimensional decimation is nontrivial because you can't just store the
values at the lattice points; to keep the constant-time guarantee, you
have to store values for every point *at least one* of whose coordinates
is a round number.  This means decimation basically only saves you a
linear factor of 16 or 32 or whatever, and you have to use a
sparse-array representation to get any good out of it.

### Generalization over operations ###

This problem, sum(f[m:n]), is a specific case of the general idea of
"[range queries][]".

[range queries]: http://en.wikipedia.org/wiki/Range_Queries

Sum tables generalize beyond integer addition.  Clearly they work fine
for mod-N integer addition, vector addition, and the combination of the
two (e.g. XOR).  In fact, you can use summed-area tables over arbitrary
groups, as long as the group operation and inverse are computable.  (The
range query is constant time only as long as those computations are
constant time.)

(For the N-dimensional case, I think you may also need commutativity,
but I'm not sure.)

## A logarithmic-time alternative to sum tables for semigroups ##

But what do you do if you're interested in an operation that doesn't
have a left inverse?  For example, the "minimum" operation (or in
general the meet operation of a meet-semilattice) can't have inverses
of elements, because it's idempotent, so you can't compute it with a
sum table.

But you *can* compute it in logarithmic time with a tree.  Let

    mint(f, m, n) = nil if m == n
                  = (m, n, min(f[m:n]), mint(f, m, floor((m+n)/2)),
                                        mint(f, floor((m+n)/2), n)) otherwise

You can compute this in linear time, assuming a constant-time binary min
operation, as follows:

    mint(f, m, n) = nil if m == n else
                    (m, n, f[m], nil, nil) if m == n - 1 else
                    (m, n, a, b, c) where
                        k = floor((m+n)/2) and
                        (_, _, a0, _, _) = b = mint(f, m, k) and
                        (_, _, a1, _, _) = c = mint(f, k, n) and
                        a = min(a0, a1)

Now if you precompute mint(f, 0, f.length), which is a balanced binary
tree with 2*f.length - 1 nodes, not counting the nils, and which can
be computed in linear time, you can compute min(f[m:n]) for arbitrary
m, n in logarithmic time given that tree.  That algorithm is
straightforward:

    tmin((a, b, c, d, e), m, n) =
        c   if a &amp;gt;= m and b &amp;lt;= n
        nil if a &amp;gt;= n or b &amp;lt;= m
        nmin(tmin(d, m, n), tmin(e, m, n)) otherwise

    where nmin(a, b) = b if a == nil
                       a if b == nil
                       min(a, b) otherwise

This algorithm applies to any semigroup over the elements; it can be
used to calculate sums as easily as it can be used to calculate minima,
although less efficiently than a sum table.

### Space reduction: decimation ###

Analogously to sum tables, if your leaf nodes represent spans of some
16 or 32 elements instead of 1, you get a dramatic space reduction
without losing the logarithmic-time asymptotic performance.

### Space reduction: array storage ###

The contents of the tree produced by the mint() function depends only
on m and n, except for the min(f[m:n]); and if f.length is a power of
2, it is a full binary tree.  A full binary tree can be stored, as in
the classic binary heap, in an array a such that the children of the
element at a[i] are at a[2i+1] and a[2i+2] (zero-based).  So you can
store the minima for the tree in an array (without decimation, of
2*f.length - 1 elements) rather than allocating numerous nodes on the
heap.

This requires a slight enhancement to the lookup algorithm to
recompute the same (m, n) as the construction algorithm, rather than
looking them up in the tree.

### Constant-space bottom-up construction ###

If you construct the tree recursively, in addition to the O(N) space
for the results, you need O(log N) stack space.  But that is not
necessary.  If you're using the array storage suggested in the
previous section, you can fill the array starting from the end, so
that the only auxiliary storage you need for the construction process
is a simple counter.

### Enhancement: updates in logarithmic time ###

If you update an element of the original array, you can update the tree
nodes going back up to the root to reflect your update in worst-case
O(log N) time.  Appending or removing elements at the end of the array
can be handled similarly, although sometimes appending an element will
involve creating a new root node, which (in the array representation of
the tree) is worst-case O(N), but amortized constant time.

This is a reason you might actually want to use these trees to handle
range-sum queries rather than using sum tables: updating this tree takes
O(log N) time, while updating a sum table takes O(N) time.

### Enhancement: indices ###

In the case where the semigroup operation is exactly minimum or
maximum over a totally ordered set, the value stored in each treenode
will be the value of one of the items in the original array.  In this
case it is strictly more powerful to store the index of that item
rather than its value.  This may be useful if you have some other data
that are indexed the same way.

This allows the algorithm to solve the "range minimum query" or RMQ
problem, for which it is known as the "segment tree" algorithm.  Danielp
wrote a [really awesome tutorial on RMQ][] on Topcoder.

[really awesome tutorial on RMQ]: http://community.topcoder.com/tc?module=Static&amp;amp;d1=tutorials&amp;amp;d2=lowestCommonAncestor

The constant-time "sparse table" algorithm given in that article
unfortunately only works for semilattices rather than general semigroups
including arbitrary monoids.

### N-dimensional case ###

This generalizes easily to quadtrees, octrees, etc., although the
efficiency guarantees are not as good.

## A constant-time alternative to sum tables for semigroups ##

A.C. Yao published one in 1982, "Space-Time Tradeoff for Answering Range
Queries", but I don't know it.  I think it's explained in the
aforementioned [really awesome tutorial on RMQ][], involving a reduction
to the least-common-ancestor problem, but I don't understand it yet.

# Thanks #

To John Cowan, Gian Perrone, and Seth David Schoen for discussion.
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-12-06T08:37:01</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/208">
    <title>decentralized chat using CouchDB</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/208</link>
    <description>&lt;pre&gt;I just sent this message to the CouchDB User mailing list instead of sleeping
like I ought to be doing.

----- Forwarded message from Kragen Javier Sitaker &amp;lt;kragen&amp;lt; at &amp;gt;canonical.org&amp;gt; -----

Date: Fri, 23 Nov 2012 22:14:18 -0500
From: Kragen Javier Sitaker &amp;lt;kragen&amp;lt; at &amp;gt;canonical.org&amp;gt;
To: user&amp;lt; at &amp;gt;couchdb.apache.org
Subject: decentralized chat
User-Agent: Mutt/1.5.20 (2009-06-14)

Hi.  I'm new to CouchDB, but I was just chatting with Noah on IRC about
how IRC sucks and we need to replace it and whether we could do that
using CouchDB.  He mentioned this thing Chris did four years ago called
Toast:

&amp;lt;http://jchrisa.net/drl/_design/sofa/_list/post/post-page?startkey=%5B%22Simple-Wins%22%5De&amp;gt;

Right now Skype chat is basically the best thing out there that I've
seen, but it's proprietary.  It has the following features I like:

* Real-time: people normally see your lines within less than a second
  after you send them.
* Peer-to-peer: you don't have to install or manage software on a server
  to make it work, and it knows how to traverse NATs.  (Skype, Inc.,
  does maintain a centralized authentication service, which I regard as
  a significant drawback.)
* Replicated: you can look at chat history and add chat lines when
  you're offline.  When you reconnect to the people you're talking to,
  all the lines get sent.  This is important not mostly because I'm
  trying to chat while I'm on an airplane but because I don't want to
  lose messages when my network gets disconnected, and because I switch
  between devices and want to be able to see messages I typed on my
  laptop on my cellphone and vice versa.  (As far as I can tell, the
  POP-not-IMAP nature of XMPP makes this impossible for XMPP chats.)
* Unified-UI: you can chat with many different people or groups of
  people at once using the same app, and see e.g. a list of chat
  channels where you have unread messages.
* Encrypted: your ISP can't read your chat messages.
* Secure: the people you're chatting with can't subvert your chat
  software to e.g. snoop on your other chats.
* Correct: unlike IRC, doesn't truncate your lines at arbitrary places
  if they get too long.
* File transfers: you can send screenshots and stuff to other people,
  including everybody in a group chat.
* Cross-platform: versions for both GNU/Linux and Android.

So I'm trying to get a sense of whether there's something out there that
would make this feature set easy to replicate.  And maybe CouchDB is it?
Or do I need to build something from scratch?

[Noah suggested](http://swhack.com/logs/2012-11-24#T02-25-22) making
each chat line a new document, using continuous replication, creating a
mapreduce view using datetime as a key and pulling the most recent 40
messages, and using the _changes feed to get Comet updates to the
browser asynchronously.  That sounds pretty reasonable to me, but the
following questions occur to me:

* I'm behind NAT.  If you're behind NAT too, how can we set up
  continuous replication between our CouchDB instances?  Is there STUN
  support for CouchDB replication yet?
* Do I need to create a new CouchDB database for every chat room?  Is
  there any problem with having 20 or 30 databases on my netbook talking
  at once?
* How about if I want to have a single Comet connection from my browser
  to all of them at once?  (Browsers won't let you have 30 Comet
  connections to localhost.)
* Are there security concerns I need to think about?  Like, how do I
  make it so that I can update my DHTML UI, and maybe even automatically
  get updates from someone else, but not *everybody* I chat with can
  update my DHTML UI to a version that spies on my chats?  What are the
  security properties of the replication protocol?  What if I need to
  kick someone out of a chat channel because they're spamming it?

I think the ability to automatically update application code is
necessary for decentralized applications (like email) to start to
successfully compete with centralized ones (like Facebook) again, but
although I have some ideas about how to do that securely, I don't feel
like I've solved the problem because I haven't tried them in practice
yet.

Anyway, I don't know if anybody else thinks it's an interesting project,
but it seems to me like every chat system out there today is basically
unusably bad, and decentralized database replication with automatically
incrementally updating views seems like the right substrate to build it
on.  

----- End forwarded message -----
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-11-24T03:21:04</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/207">
    <title>impermanence of life</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/207</link>
    <description>&lt;pre&gt;I wrote this to a far-away acquaintance of mine who's been struggling
with her fears since coming abruptly face to face with her mortality
during a recent acute illness.

Sending you hugs; I'm reading this rather late. I will try to say
something that has not already been said.

The recognition of the impermanence of life and the recognition of the
inevitability of death are necessary foundations for truly beginning to
live.  We can't understand the preciousness of every moment of life
until we realize how very scarce those moments are.

Knowing you will die, you can begin to abstain from fear of death.
Suppose something might kill you; so what? You'll be dead in the long
run anyway. Death cannot harm you. What matters is not whether you die
but how you live. Don't waste your life by living it in fear. You may
not get another one.

Many people know they will die but still have faith that all will be
well. They are not foolish; they just have a definition of all being
well that is bigger than their own individual lives, that can continue
after their death. Some of them are religious, and some of them answer
to some other cause. This is what gives warriors the strength to fight
to the death: their allegiance to a higher cause than their own
survival.

It is wonderful to be alive and healthy. Be glad in each day that you
still live, and do not let that precious life go lightly. Do not forget
to honor your wild nature, and seize each day in your teeth, letting its
sweet nectar run down your chin.
&amp;lt;http://www.poemhunter.com/poem/do-not-go-gentle-into-that-good-night/&amp;gt;

Happiness is within your grasp in each day you still live.

I don't know you well enough to love you as so many other commenters
here do. I hope to get the chance to know you some day, because I've
been impressed with what I know of you so far.
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-09-13T07:37:01</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/206">
    <title>boycotting Chick-Fil-A is authoritarian and evil</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/206</link>
    <description>&lt;pre&gt;Although I'm not actually gay --- I can't remember having had so much as
a dick in my mouth --- I'd certainly have an uphill battle proving that
if I were being indicted for gayness.  And if I were attracted
exclusively to men, I'd definitely have sex with men instead of living
celibate.  Finally, a lot of people close to me are gay, bi, or
transsexual.  Nevertheless, I think the current calls to boycott
Chick-Fil-A because of the anti-gay views of its president are
anti-liberal and authoritarian.

The liberal ideal of freedom of speech goes beyond a norm constraining
government actions; it's a norm about how public debate ought to be
carried out: with everyone free to speak their mind without fear of
suffering a backlash, and nobody intimidated into silence.  The liberal
ideal is that ideas stand or fall on their own merits, not on the
personal connections of their proponents. That's why we don't want the
government telling us what we can and can't say — there's no guarantee
that the ideas that the government endorses are true rather than false.
If you let the government suppress speech because of the viewpoint it
expresses, you end up suppressing the truth in many cases.  Lynch mobs
and picket lines, if used to respond merely to speech, can suppress
speech just as effectively as police action.

In this case, Chick-Fil-A is not discriminating against gay people,
except insofar as following the broadly established social and legal
norms surrounding employment benefits for spouses and the like
constitutes discriminating against gay people. An effective boycott, or
police action, against a company for actually discriminating against gay
people would be entirely justifiable. But what we have in this case is
something quite different: Chick-Fil-A, or at least its president, is
*advocating* discriminating against gay people. That's speech, and ought
to be respected and not punished, even if we find its contents odious.

What we're seeing here is simply the left-wing authoritarian equivalent
of the right-wing calls to boycott Oreo over their rainbow cookie a few
weeks ago.  My sympathy for the ends to which this boycott campaign 
is directed does not justify its means of intimidating people into
silence.

If campaigns like this succeed, people must choose between expressing
unpopular political viewpoints and making a living. Society is healthier
when people feel free to air every viewpoint, not just the viewpoints
you like. Guaranteed commercial ruin for anyone who advocates anarchism,
or atheism, or polygamy, or legalization of recreational drugs, or
pacifism, or whatever viewpoint the majority finds odious, would make
society much less free, as surely as government censorship would.

The more likely case, to me, is that these calls to boycott are
ineffective and merely a distraction, because effective commercial
boycotts are currently few and far between.  But my claim is that this
kind of boycott, a boycott to punish odious speech, is not merely
ineffective; if it were effective, it would be poisonous to society.
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-08-02T07:37:01</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/205">
    <title>language-model dimensionality reduction with Markov-chaindistribution medians</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/205</link>
    <description>&lt;pre&gt;In my kragen-tol post earlier this month about [predictive text input
methods][0], I wrote:


[0]: http://lists.canonical.org/pipermail/kragen-tol/2012-July/000961.html

But part-of-speech tagging is sort of hard, especially on incomplete and
possibly erroneous sentences containing words that aren't in the dictionary.
But it seems like you ought to be able to do a good enough job with simple
statistics to be useful.

Now, the rest of this post may be a case of "a few hours of hacking can save
you several minutes in the library", so take that under advisement.  I haven't
checked out the literature enough to know if this idea is already known, let
alone an improvement over what already exists.

One simple approach is to consider a Markov-chain model of text, and cluster
together words whose transition probabilities are similar, either for
in-transitions (what they tend to follow) or out-transitions (what they tend to
precede).

However, this runs into a practical problem immediately: even with a
first-order Markov model, you're trying to perform unsupervised classification
on thousands of vectors in a space with thousands of dimensions.  This is, as
far as I know, a totally intractable problem, because in high-dimensional
spaces, almost all points are closer to boundaries of the space than they are
to any other point.  So you need to reduce the dimensionality of the term space
before you can reduce the dimensionality of the term space.

Rather than eat my tail through that rathole, I thought maybe I'd pick some
kind of arbitrary characteristic of those vectors to cluster them.  The vectors
are finite discrete probability distributions of preceding or following words,
so given a mapping from the words onto the real line, you can use the usual set
of characteristics of finite discrete probability distributions, such as mean,
median, mode, standard deviation, skewness, kurtosis, and so on.  Of these,
median is dependent only on the ordering of the words, and mode isn't even
dependent on the ordering.

So if we cluster these distributions by their median, an idea due to a friend
of mine who should get credit if he wants it, we should be able to easily find
words that are used in similar contexts.

So, grouping the words into clusters that tend to precede the same word, using
the KJV bible for my example text, I got reasonable-looking clusters like the
following.  First, a cluster of mostly verbs, mostly naughty ones, that precede
"not", although interestingly "Is", "am", and "ye" made it in there too:

    not: Accuse, Be, Boast, Cast, Cease, Could, Count, Curse, Deceive,
        Defile, Desire, Despise, Destroy, Devise, Did, Didst, Distress,
        Do, Doth, Dread, Eat, Enter, Fear, Forget, Fret, Give, Grudge,
        Harden, Hide, Hurt, Is, Judge, Labour, Lay, Lodge, Lust, Marvel,
        Meddle, Mind, Murmur, Neglect, Quench, Regard, Rejoice, Rejoiceth,
        Remove, Reprove, Respect, Rob, Slack, Think, Told, Touch, Trust,
        Uncover, Was, Weep, Went, Withhold, agreed, altereth, am,
        appertaineth, approveth, are, attained, backbiteth, bearest,
        believed, believeth, bewray, brawler, bridleth, buildedst,
        burdened, canst, casteth, ceased, circumspectly, confesseth,
        consent, considerest, consisteth, continueth, could, couldest,
        defer, deferred, defile, defileth, delightest, descendeth, dieth,
        diggedst, dishonesty, displease, do, does, doeth, doubletongued,
        doubt, durst, edify, envieth, epistle, extendeth, falleth, feared,
        filledst, followedst, forbid, fret, gathereth, gnaw, grieve,
        harden, heareth, hearkenedst, hide, housetop, kept, knew, knewest,
        knowest, lacketh, lady, laid, layedst, layeth, let, lingereth,
        needed, needest, needeth, obey, obeyed, obeyedst, obeyeth,
        observest, oppress, patient, payeth, perceivest, perfection,
        prefer, profane, purifieth, rained, receive, regard, regarded,
        regardest, regardeth, remained, repented, respected, respecteth,
        return, returneth, reviled, sacrificeth, savourest, saw, seek,
        servedst, shalt, shone, slack, slumbereth, sobriety, sowest,
        spared, spin, staggered, striker, strive, stumbleth, suffereth,
        swelled, tarrieth, tarry, they, toil, tortured, touch, travailest,
        trow, understand, understood, upbraideth, vaunteth, wasted, wax,
        wipe, wist, withdraweth, withheldest, wot, wotteth, wouldest,
        wrestle, ye

And second, a cluster of mostly proper names that precede capitalized "And":

    And: Abez, Abiezrites, Abihud, Abishalom, Achshaph, Adadah, Ader,
        Ahlai, Ahoah, Alamoth, Allonbachuth, Ammonitess, Anaharath, Aniam,
        Anim, Antothijah, Aphekah, Ara, Ardon, Aridatha, Armageddon,
        Asareel, Ashbea, Ashima, Asiel, Aspatha, Athach, Athlai, Attalia,
        Avith, Azem, Azzan, Bealoth, Beera, Berith, Bethbaalmeon,
        Bethdiblathaim, Bethgader, Bethmeon, Bethpalet, Bethpazzez,
        Bethphelet, Birzavith, Boscath, Camon, Canaanitess, Caphthorim,
        Caphtorim, Chelubai, Chislon, Chronicles, Dalmanutha, Diklah,
        Dinhabah, Dothan, EARTH, Edar, Eker, EleloheIsrael, Elpalet,
        Enhazor, Ephesdammim, Ephod, Eshbaal, Eshean, Ethnan, Euroclydon,
        Ezel, Gaash, Gabbatha, Galeed, Galilaean, Gazer, Gennesaret,
        Gilboa, Girgashite, Girgasite, Goath, Hamongog, Hamul, Hararite,
        Harum, Hasenuah, Hathath, Hazarsusah, Hazelelponi, Hazezontamar,
        Helekites, Hepherites, Hormah, Huphamites, Ibnijah, Imla,
        Irshemesh, Ishmeelite, Ivah, Jaasau, Jagur, Japhia, Japho,
        ...
        Tyrannus, Urias, Uzzensherah, Zacher, Zaham, Zarthan, Zered,
        Zithri, Zorathites, Zorites, abolish, alloweth, amen, amethyst,
        badness, bakers, bat, begging, casement, chamois, clearness,
        conquer, consist, crew, cupbearer, cymbal, delicacies, dropsy,
        dryshod, effected, excused, fishhooks, flatteries, foals,
        foreheads, hungered, inclosings, inheriteth, manger, marketplaces,
        meadow, meant, mightiest, mouldy, mowings, murrain, occurrent,
        offenders, ospray, overspread, perfectness, perpetually,
        pollution, pounds, pricks, recorder, repenting, rigour, rushes,
        shrubs, size, socket, sorrowing, stanched, target, tens,
        tentmakers, therefrom, traitor, vails, woods, wretchedness

Unsurprisingly, the "begat" cluster is entirely proper names:

    begat: Abia, Abishua, Abiud, Achaz, Achim, Amariah, Aminadab,
        Amminadab, Attai, Azor, Bukki, Canaan, Coz, Eliud, Ephlal, Eshton,
        Esrom, Ezekias, Hezron, Irad, Jahath, Jarah, Jechonias, Jehoadah,
        Jekamiah, Joatham, Joiada, Josaphat, Josias, Kish, Manasses,
        Matthan, Mehujael, Meonothai, Meraioth, Meribbaal, Methusael,
        Mikloth, Mizraim, Ner, Obed, Ozias, Phares, Pharez, Ram, Roboam,
        Sadoc, Salma, Salmon, Shaharaim, Shema, Shobal, Sisamai, Zabad,
        Zerahiah, Zorobabel

All in all, the 14000 or so distinct words get clustered by this approach into
1793 clusters, of which 1185 have only one word.  Many of the smaller clusters
very clearly express some kind of affinity between words:

    support: feebleminded, financial, volunteer
    form: hypertext, proprietary, readable
    falsely: science, swearing, you
    Where: Puteoli, Telassar, Thelasar
    brother: weak, younger, youngest
    appear: flowers, menchildren, plainly
    honour: husbands, retaineth, without
    hither: Hasten, Reach, description
    Duke: Jetheth, Mibzar, Pinon
    branches: Took, fruitful, natural
    She: Tirhanah, distaff, tributary
    besides: Stephanas, loins, self
    knoweth: Man, certainly, unjust
    Till: intermission, purity, stocks
    slept: Jehoiakim, Menahem, Omri
    died: Achbor, Chilion, Jether, Pirathonite, Seled, Terah
    When: Hareth, discretion, guile, patrimony, thereat, unequal
    F: DAMAGE, PARAGRAPH, PURPOSE, below, problem, provisions
    such: Defects, lettest, marvels, perhaps, pigeons, saveth
    year: eighteenth, eightieth, fiftieth, fortieth, nineteenth, thirtieth
    cities: Berothai, Chun, defenced, fenced, ruined, thirteen
    children: Little, Zebedees, backsliding, beget, impudent, sottish
    two: Asuppim, Telaim, Teresh, calamus, containing, spearmen
    So: corpses, debt, eater, freewoman, peacocks, saying
    none: So, burned, finding, put, quarter, smiths
    might: I, mortality, mother, purpose, scripture, whereto
    bare: Adah, Aramitess, Bashemath, Hammoleketh, Jehudijah, Milcah
    OR: DISTRIBUTE, EXPRESS, MERCHANTIBILITY, PUNITIVE, REPLACEMENT,
        WARRANTY
    some: Arm, oppressed, save, sixtyfold, sowed, thirtyfold

It seems like this kind of clustering might be a useful dimensionality-
reduction technique for natural-language processing, in general; and perhaps it
could be iterated to reduce the numbers further.  For example, when applying
Katz smoothing to a language model, as an alternative to backing off to shorter
N-grams when you don't have enough N-grams for a given value of N, you could
back off to N-grams computed over a vocabulary of these clusters.

Efficient computation
---------------------

An interesting thing about this follower-median characteristic -- it can be
computed efficiently from a suffix array.  Although I'm not familiar with the
performance characteristics of modern suffix-array construction algorithms (I
only know that they're linear-time), I suspect that computing the suffix array
is, in practice, more expensive than computing the pdfs of the contexts using a
hash table, then computing their cdfs in order to find the median, which I seem
to recall is what you were doing; but if you're already paying the cost of
building a suffix array for some reason, this may be useful.  For example, I'm
thinking about using a suffix array for Katz smoothing of a language model.

To find the distribution median of followers for a single context:

1. Find the beginning B and end E of that context's span in the suffix array
   --- that is, the first index B such that text[SA[B]:].startswith(context),
   and the last index E meeting the same criterion.  This is O(lg len(text)).
2. Find the median index M = (B+E)/2, or under some circumstances, B+(E-B)/2.
3. Find text[SA[M]+len(context)], and that's your median follower character.

Step 3 is slightly more complicated if you want something more than a single
character, but not much; this is an advantage of the suffix-array structure.

Finding the distribution median for all M contexts this way is 
O(M lg len(text)), which might be less than O(len(text)).  It's very likely
faster than computing the cdfs, which is 256*M work, albeit sequential.

There's a much simpler linear-time algorithm to get them all at once, of
course: read through the suffix array in order, remembering the last boundary
between contexts.

Simple implementation
---------------------

This is the code I used to get the results above.

    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    import sys
    import re
    import textwrap

    def words(file, pattern='[a-zA-Z]+'):
        return (word for line in file for word in re.findall(pattern, line))

    def ngrams(n, words):
        queue = []
        for word in words:
            queue.append(word)
            if len(queue) == n:
                yield tuple(queue)          # Lists aren’t hashable.
                queue.pop(0)

    def multimap(kvpairs):
        rv = {}
        for key, value in kvpairs:
            if key not in rv:
                rv[key] = []
            rv[key].append(value)

        return rv

    following = multimap(ngrams(2, words(open(sys.argv[1]))))
    clusters = multimap((sorted(next_words)[len(next_words)//2], prev_word)
                        for prev_word, next_words in following.iteritems())
    items = sorted(clusters.iteritems(), key=lambda (name, contents): len(contents))

    for next_word, prev_words in items:
        for line in textwrap.wrap(', '.join(sorted(prev_words)),
                                  initial_indent=next_word+': ',
                                  subsequent_indent='    '):
            print line

&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-07-30T23:00:33</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/204">
    <title>database query optimizers for simplifying software systems</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/204</link>
    <description>&lt;pre&gt;Another item that didn't make it into [my post on Monday][0]: database
queries are probably another powerful primitive which can simplify a
whole computing system.  

[0]: http://lists.canonical.org/pipermail/kragen-tol/2012-July/000962.html "choosing powerful primitives for a simplified computing system"

It’s a commonplace observation that complex SQL queries are
dramatically shorter, clearer, and sometimes more efficient than
imperative programs to evaluate the same queries — especially on a
database that wasn’t originally designed for those particular queries.
It’s less well known that reformulating those queries in Datalog or
Prolog makes them shorter and clearer still, while removing some of
the restrictions on SQL (at least ANSI SQL-92, which doesn’t support
any form of transitive closure), and providing dramatically superior
abstraction capabilities.  I have an [unfinished proposal tentatively
called Binate][Binate] for a database query language that is even
terser than Datalog and provides even lower-overhead abstraction; but
Binate queries, so far, are harder to read than Datalog queries.

[Binate]: http://canonical.org/~kragen/binary-relations.html

And then there’s stuff like SPARQL and XQuery.

Anyway, whatever the query language, it’s likely that the proper
application of database query languages could simplify a great deal of
software.

However, this is not as open-and-shut as the things that actually did
make it into that post, for two reasons.

Existing query languages have minimal adoption
----------------------------------------------

First, query languages are far from new, and current applications of
SQL are very limited indeed.  I speculate that this is in part because
SQL itself is horrible crap, which could be remedied by the adoption
of one of the alternatives mentioned above; in part because current
implementations of SQL can typically only query data that has been
stored in a persistent on-disk database; and in part because there are
a lot of programmers who don’t even know SQL.

One current inspiring example of the use of query languages is CSS and
jQuery, where very short queries (usually under 32 characters, often
under 16) are used to separate concerns in DHTML software systems.
The underlying query optimizers for CSS and jQuery selectors can be
surprisingly complex.

Existing query evaluators are very complex
------------------------------------------

Second, for a primitive to simplify the system overall, implementing
it must add less complexity than using it removes.  But SQL query
optimizers, like compilers in general, are notorious for being large
chunks of code.  Like other compilers, they must find acceptable
approximations to NP-complete or actually uncomputable problems, and
they have very little time to do it in.  Whether you can do an
acceptable job of this under the kinds of constraints adopted by STEPS
— 20kloc for the entire system, from bootloader and device drivers up
through networked hypertext — is an open question.

&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-07-23T07:37:01</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/203">
    <title>a unified caching system for avoiding recomputation</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/203</link>
    <description>&lt;pre&gt;There’s an old saying that there are only two hard problems in computer
science: naming, caching, and off-by-one errors.  Caching of
computations is basically a classic time-space tradeoff, but caching is
also used for other purposes: for fault-tolerance, to trade off network
bandwidth against storage, for privacy, and to improve scalability by
minimizing the work done at a potential hotspot in a distributed system,
among other things.  

I propose that delegating almost all caching of computations to a
dedicated computation-caching subsystem would have substantial
benefits for computer system design, enabling transparent parallelism,
network transparency, and fault tolerance, while improving performance
and substantially simplifying application code.


Examples of caching of computation
----------------------------------

Current computer systems use many layers of caching, implemented in
many different ways, many of them just to cache computations.

### MVC ###

One broad generalization is that views in MVC applications contain
cached representations of the state of the models.  That’s why they
get notified when the model changes: so they can update their
representation.  (Rails programmers are probably confused here,
because what Rails calls “MVC” doesn’t work this way.)

### In a terminal text editor ###

For example, I’m editing this text file in the vim editor, running
inside a terminal emulator displaying in X-Windows.

#### Buffer contents are a cached computation on command history ####

As I edit, the vim editor executes a computation on the file’s initial
contents and my sequence of commands to get the buffer’s new contents,
and stores the result of this computation in the file every time I
type :w.  You could think of the file’s contents as being a cached
result of executing the entire editing history.

#### Text screen contents are a cached computation on buffer contents ####

Also, my terminal emulator’s memory contains a cached representation of
the visible part of the buffer’s contents.  When the buffer’s contents
change, this cache becomes invalid and needs to be updated; vim, using
curses, carefully calculates a minimal sequence of edit operations to be
applied to its local image of my screen contents to make them match the
file, and encodes that sequence as a string and sends it to the terminal
emulator.  If the two replicas of this cache get out of sync, for
example because some other program wrote some text onto my terminal, the
desynchronized state can last quite some time; vim provides an explicit
command, ^L, to request resynchronization by resending the entire screen
contents instead of an incremental update.

#### Framebuffer contents are a cached computation on text and font ####

The framebuffer *also* contains a cached representation of those same
screen contents, and the same kind of crazy synchronization dance ensues
between the terminal emulator and the X-Windows server, which manages
the framebuffer contents.  However, the X server is a lot more competent
at preventing desynchronization, because it speaks a more featureful
protocol to its applications, such as the terminal emulator, than the
terminal emulator does to its applications, such as vim.

#### X-Windows apps cache information about the X server ####

At a more detailed level of the X11 protocol, when the terminal
emulator app starts up, it gets information about the X server’s
screen configuration (available visuals, root window ID, and so on)
which it then caches for the rest of its run, using for all future
drawing commands.  There is in general no way to notify the app that
these cached values have become invalid and need to be refetched.
This is one reason that it’s relatively straightforward to detach an
ANSI-character-terminal app from one ANSI terminal emulator and
reattach it to another one, but essentially impossible to do the same
thing for an X-Windows application.

#### Font bitmaps are cached renderings of the font ####

When the terminal emulator sends update commands to the X server, it
requests the drawing of strings of characters in a specified font at
specified locations.  The process of rendering a character from a font
can be sufficiently time-consuming to be a performance bottleneck,
especially when a screen frame can have tens of thousands of them at
once; so the X server has a special-purpose cache for rendered glyphs
from fonts.  But wait!  The operations that use that special-purpose
cache don’t support alpha-blending.  And the terminal emulator I’m using
at the moment uses alpha-blending to get antialiased rendering of the
fonts it uses, so it uses the X Render Extension, driven by a library
called Xft, to manage its very own cache of font glyphs in the X server.
(I don’t know if this cache is shared between X applications.)

#### Compiled programs are cached results of a computation on source code ####

Also, the code for the terminal emulator, the X server, Xft, and vim is
all written in C.  Compiling this amount of C code into machine code, so
you can execute it, used to be time-consuming in the 1980s, and you can
maybe still amp up your compiler optimizations to the point where it’s
time-consuming today, so the executable code for those programs is
cached.  In fact, this particular caching system is so complicated that
it has a world-wide organization consisting of thousands of people
largely to manage it, called “Debian”.  It includes components such as
`make`, object and executable file formats, binary packages for different
architectures, the linker (as distinct from the compiler), the dynamic
linker, the configuration for the dynamic linker, and sometimes, the
“derived-object wink-in” feature of ClearCase, and the closely analogous
features of the Vesta version control system.

(Debian actually spend a lot of time on things other than compilation
and managing compilation output.  They also archive and version source
code, track and fix bugs, test software, establish interoperability
standards, document undocumented software, and so on.  But having to
struggle with the build infrastructure costs them a lot of time that
could be spent doing valuable work like the above.)

#### X11 apps also cache Compose key bindings ####

Another thing the terminal emulator caches is my [Compose key
bindings][xcompose].  The X input method linked into it reads the
`.XCompose` file at startup, apparently in order to build a table of
compose-key sequences, and then never rereads it.  This means that I
have to kill and restart my terminals to find out if a new binding I’ve
added to the file works properly, or start a different terminal process,
such as xterm; and I can’t take advantage of new bindings until I
restart the terminal.  On the other hand, I can also test the new
bindings without worrying that I’ll break my keyboard setup.

[xcompose]: https://github.com/kragen/xcompose

#### `gnome-terminal` caches GConf settings ####

The terminal emulator application, `gnome-terminal`, chooses the font
based on a “profile” stored in GConf, which is managed by `gconfd`,
but naturally cached within the terminal emulator process in a
`TerminalProfile` object.  (You can have several different profiles
for different kinds of terminals.)  This means that the font-glyph
cache mentioned earlier is a cache of, among other things,
configuration information stored in GConf.  GConf supports
asynchronous event notification, so that if you change, say,
`/apps/gnome-terminal/profiles/Default/font` using `gconf-editor,` a
callback fires to update the `TerminalProfile` in each of your open
terminals using the Default profile, and they will immediately redraw
with the new font.  However, this cache invalidation mechanism is
limited; as explained in the comments:

     * It's somewhat tricky to manage the profiles/ dir since we need to track
     * the list of profiles, but gconf doesn't have a concept of notifying that
     * a directory has appeared or disappeared.

#### `gnome-terminal` shares cache by running many windows in a process ####

All of the computation and I/O that `gnome-terminal` does at startup,
including reading profiles from GConf, reading my Compose key
bindings, rendering fonts to images, and many other things, adds up to
about two seconds of CPU time on my netbook.  In order to cache all of
this for future windows, `gnome-terminal` (like `dtterm`, the
forgotten CDE terminal program) runs all terminal windows from a
single process; at startup it uses D-BUS to check to see if there’s an
existing terminal emulator process running to which it can pass its
command-line arguments, which takes less than 200ms.  So D-BUS is here
being used as part of a system for searching for existing cached
computation results, in the form of a running process.

#### And that’s only the beginning ####

There are several other layers here that I’m not getting into at all:
GDK, GTK+, Pango, Cairo, and D-BUS, among others.  Each of these has
its own state, most of which is cached results of previous
computations that could be recomputed if necessary — either in
principle or in fact.


Consequences of caching of computation
--------------------------------------

As illustrated above, cached computation results are pervasive in
software at all different scales, from single bytes to multi-gigabyte
software distributions; this is essential to make software acceptably
efficient; they are managed by a mélange of different mechanisms;
deficiencies in these caching mechanisms introduce a huge number of
bugs and inflexibilities into the software; and the caching mechanisms
themselves are a huge amount of code, with its attendant maintenance
cost.

Something that may not be obvious is that this diversity of caching
mechanisms inevitably leads to irrational caching tradeoffs and poor
systemwide performance.  This is because these time-space tradeoffs
are being made in isolation from one another, and the time savings
being bought by a megabyte of RAM varies wildly from one part of the
system to another.  Consequently, we have some parts of the system
that are unnecessarily slow because they don’t cache sufficiently, and
other parts of the system that are unnecessarily memory-hogging
because they cache too much — causing everything else to run more
slowly.

Worse, the value of each cached item — the amount of computation it
saves, in the end — can’t be evaluated in isolation.  It depends on
how often that item is needed, which is much more common if the stuff
that is computed from it isn’t cached effectively; and it depends on
how much computation is needed to compute it, which depends on how
effectively the things it’s computed from are cached.  So picking the
ideal set of computations to cache, even given perfect knowledge of
future demands, is a global optimization problem which can’t be solved
by composing local solutions to subproblems.


The alternative: a unified cache system
---------------------------------------

So suppose that, instead of a mélange of different caching mechanisms,
we have a single systemwide caching mechanism.  All stored data is
classified either as “source data” — which came from outside the
computer system and thus cannot be regenerated if it is erased — and
“derived data” or “cached computation”, which is anything that could
be regenerated by deterministic computation from the source data.

The systemwide caching mechanism knows every cached computational
result in the system, what the inputs to the computation were (both
source inputs and derived inputs), and how long it took.  And it can
use an intelligent systemwide policy to figure out which cache results
are likely to save the most computation time in the future, taking
into account how long they took to compute and what other
previously-cached results would need to be recomputed.  Like `make`,
it can also automatically invalidate cached results when they change —
whether they are compiled object code, font glyphs, or screen images.

### The caching system interface replaces the filesystem interface ###

To work reliably, such a caching system must mediate all access to
persistently stored data on the computer system, or at least the part
of the system it controls, whether it’s cached computation or received
through I/O.  All of the “application code” in the system is in effect
side-effect-free, or pure functional, even if it uses side effects in
its internal implementation.  It becomes a filesystem, like Vesta, but
more than a filesystem, since it’s also responsible for I/O.  And by
virtue of mediating all persistent data access, like MapReduce, it can
achieve transparent parallelism, transparent network distribution, and
transparent fault-tolerance, by running the computations whose results
are to be cached on whatever computer it wants, and possibly on
multiple computers in case one of them crashes or runs slowly.

(The caching system looks a lot like a software transactional memory
system.)

### But the first step is to boil the ocean. ###

The biggest disadvantage is that it requires you to rewrite all of
your software with a different fundamental design to get these
benefits.  (There are other disadvantages as well: worst-case
performance would be under the control of the caching system, rather
than user code.  Whether such a system’s policies could be tuned for
competitive performance across a wide range of domains remains to be
seen.  My intuition says yes, but we’d need real systems to prove it.)

However, lots of existing program code could still be used inside such
a system, just as it you can use existing compilers in Vesta.

### There’s a mountain of related work. ###

I described a hypothetical [concrete example of such a system in a
specific domain][compositing] in 2005.  That post contains references
to a lot of related ideas, which I won’t repeat here.

[compositing]: http://lists.canonical.org/pipermail/kragen-tol/2005-November/000810.html "Make for URLs, or dependency-directed compositing"

Another example of such a system in a specific domain is the
polynomial greedy algorithm for OLAP materialized view selection.
“Materialized views” of OLAP “data cubes” are cached computation
results, and while they can be computed from the underlying data cube,
they can be computed much more efficiently from other materialized
views of the same cube.  Like other database query results, though,
there are a number of different possible cached values that could be
taken advantage of if they existed.

`make` of course is such a dependency-driven system operating at large
scale — computations that take seconds to hours, specifically
source-code compilation.  Various successor systems to `make`, like
DSEE and Vesta, have taken advantage of the side-effect-free nature of
compilation that `make` is built on to get some of the other benefits
I’m touting for this approach: transparent network distribution and
parallelism. `distcc` is probably the currently most popular such
system.  `ccache` is a C-compiler wrapper that uses a different
management policy for compilation output than `make` does, one that
can use more space and be more efficient.

### There's a range of choices where the imperative nature of input ends. ###

Maybe, in such a system, my text editor would still interpret my key
commands to imperatively update the state of a persistent buffer
object and a persistent file.  But all of the redisplay logic — going
from the buffer contents, cursor position, scrollbar position, and
terminal preferences, to the pixels on the screen, including font
rendering — would be mediated by the caching subsystem; so it could be
transparently distributed in a fault-tolerant manner, and it would
automatically update when any of the dependencies changed.

Or maybe, as in [rumor-oriented programming][rop], even the current
file contents could be merely a cached result of running a computation
on the previous file contents and an editing command.  However, making
that practical probably requires sharing structure among different
versions of the file contents, thus using something like a “rope” made
from immutable nodes, and so might have a significant performance
penalty.

[rop]: http://lists.canonical.org/pipermail/kragen-tol/2004-January/000749.html

### Unified caching dramatically simplifies eventual consistency. ###

As in rumor-oriented programming, another benefit of unified caching
is that it eliminates most of the complexity of eventually-consistent
database replication.  The difficulty with eventually-consistent
replication is that almost any state transition is possible; if your
calendaring system is built on top of an eventually-consistent store,
you can ensure that you don’t book a new meeting in the same time slot
as an existing meeting, but you might end up with two meetings booked
into the same time slot anyway, because you booked them on different
computers that later synchronized with each other.  This kind of
unexpected transition tends to introduce subtle bugs into application
code that expects its underlying data store to change, if at all, only
in sensible ways.

### There are many different possible strategies for managing the cache. ###

The simplest strategy is probably the `make` approach: recompute
chunks of cached data from scratch when any of their dependencies
change, but only when they’re being demanded (“lazy”).

Or you could use a “push” or “eager” version of that approach, where
the recomputation happens as soon as the dependency changes, so that
the cached result is instantly available when it’s demanded, but the
computation may be wasted.

You could save some previous cached values in case inputs change back
to a previous state.

You could use a delta-application approach instead of recomputing from
scratch, where the user code has access to the old input state, the
difference between the old and new input state, and the old cached
output, and can then use those three things to compute the new output.

You could support multiple possible ways to compute the same requested
result, so that in cases like database queries or OLAP materialized
views, it can choose the computational plans that require the least
computation, given the currently-available cached results.  

You could compare results of a recomputation with new inputs against
the previous results in order to see whether recomputation of
downstream results can be skipped.

Any such strategy should probably strive to evict the things from the
cache with the lowest predicted future time-saved-to-space-used ratio.
That is, any cache entry can be rated according to how much CPU time
it’s expected to save in the future (the benefit), and how much space
it has to use for how long in order to do so (the cost), and the cache
entries with the lowest benefit-to-cost ratio should be the ones that
next get replaced with new cache entries — but only those with a
higher benefit-to-cost ratio.  In effect, memory is being auctioned
off to the different cache entries.  This is complicated by the fact
that, as mentioned earlier, the benefit of any given cache entry is
contingent on the existence or nonexistence of other cache entries.

(This applies to disk usage as well as RAM.)

A tricky aspect here is that there is no limit to the amount of
computation that could be applied, in principle, to reducing cache
miss rates.  However, at some point, the cache manager starts
consuming more compute time than it’s avoiding.  The same is true of
code complexity.

### The system should cache things at as fine a granularity as practical. ###

Practically speaking, a system like this can only be reasonably
efficient down to some granularity — the granularity where the
overhead of dependency tracking and cache eviction decisions becomes
significant compared to the time spent actually computing something
from the data in application code.  This is crucial, because caches
that are invalidated at a too-large granularity suffer from “false
sharing”, where spurious recomputations happen because something
changed in an input, but it’s something that a particular computation
isn’t actually using.

Although this critical granularity depends on the computational
intensity of the task, my intuition is that the right granularity for
this is typically closer to the database record, line of text, or HTML
element, rather than the text file or video frame.  Coincidentally,
this is close to the cache line size of modern CPUs.

This suggests that an interface considerably more efficient than the
standard filesystem interface — a library interface, which doesn’t
require a context switch, rather than a system-call interface — is
needed to reap the full benefits.

Acknowledgments
---------------

In addition to the people thanked in my 2005 post, I’d like to thank
Michael P. Frank and Shae Erisson for helping develop these ideas;
some of the above is theirs.
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-07-19T07:37:01</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/202">
    <title>choosing powerful primitives for a simplified computing system</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/202</link>
    <description>&lt;pre&gt;One of the basic principles of the [FONC/STEPS system] [STEPS] is that
you can reduce the overall complexity of software by choosing powerful
primitives that simplify many parts of a system.  OMeta is sort of their
poster child: a generic rewriting system powerful enough to write
simple, efficient parsers with, but also to write compiler optimizers
and code generators with.

[STEPS]: http://vpri.org/fonc_wiki/index.php/Glossary

Here are some candidates for other primitives which, in practice,
simplify other parts of a computing system by more than the complexity they
add.

LZ77 obsoletes curses and other things
--------------------------------------

Using zlib to compress your data streams and files greatly reduces the
performance penalty for not optimizing your file formats and protocols
to reduce file and message size.  Here are some exmaples.


### curses ###

At the Barracas [hacklab][] tonight, I was talking about why curses, the
programming library, is useless today.  Curses goes to a lot of effort
to minimize the number of bytes sent to your terminal, because at 300 or
2400 or 9600 baud, redrawing an entire 80×24 screen would take 66 or 8
or 2 seconds.  So it sought the minimum amount of change to your screen
to get it to match its internal screen image.

[hacklab]: http://lab.hackcoop.com.ar/

Fast-forwarding to 2012, if I’m receiving those bytes over a
connection at all, they’re being sent over an SSH connection
compressed with zlib.  If you send the same screen image twice with
small differences, zlib will notice that you’re sending a blob of
bytes you’ve sent before, and just transmit a reference to the earlier
index in the bytestream where you sent those bytes before.  What’s
more, it will also compress runs of whitespace, words that were
mentioned before, things that were on previous screens, and so on.  It
actually does a much better job than curses at reducing the bandwidth
needed.

So basically all of the cleverness that went into the special-purpose
screen-update optimization algorithms in curses (and Emacs, and other
things) has been replaced by a more general-purpose algorithm that
actually works better.  It requires a bit more CPU, and can’t be
executed by a nonprogrammable terminal, but that’s okay.

(There’s a separate question of how important it is to conserve the
bandwidth of text login sessions when we have megabits instead of
kilobits.)


### SNMP and BER in general are ad-hoc compression that doesn’t work ###

Similarly, it turns out that if you represent the contents of a
typical SNMP packet as simple ASCII text with explicit name-value
pairs (with short names), and then compress it with either zlib LZ77
or the LZW implementation in `compress`, it ends up about the same
length as the BER-represented format that SNMP actually uses.

The following [sample packet][] is 53 bytes in BER. 

[sample packet]: http://www.pccl.demon.co.uk/papers/snmpdecode.html

    302B            SEQUENCE (LEN = 43)
    0201 00                 Version 1
    0404 55544D43           Community = UTMC
    A320                    SetRequestPDU (context constructed 3) (Len = 32)
    0204 023712FB                   REQID = 0X023712FB
    0201 00                         Error = NONE
    0201 00                         ErrIndex = 0
    3012                            SEQUENCE (LEN = 18)
    3010                                    SEQUENCE (LEN = 16)
    060B 512C010401010401010100                     OID = 2.1.44.1.4.1.1.4.1.1.1.0
    0201 01                                         Integer = 1

The following version is 45 bytes in ASCII; `compress` makes it 44 bytes,
`gzip -9c` makes it 54 bytes; `bzip2` makes it 79 bytes.

    UTMC
    023712FB
    set 2.1.44.1.4.1.1.4.1.1.1.0=1

If we run it through a stupid expansion phase (“binary dump”) where we write
out two bytes per bit, it becomes 721 bytes, which gzip down to 106 bytes,
bzip2 down to 99 bytes, or LZW compress down to 138 bytes.  So even fairly
extreme inefficiencies in representational choice only cost a few dozen bytes
of overhead.

This is nearly a best case.  SNMP packets in the wild commonly contain
multiple similar OIDs, long sequences of zero bytes, and similar
highly-compressible nonsense.  (To say nothing of endianness errors,
signedness errors, and integer truncation errors.)

The ad-hoc compression in SNMP is similar to the ad-hoc compression in
DNS, which would also be well-served by an ASCII text protocol with
optional compression.


### LBX was a waste of effort because LZ77 works better ###

Similarly, X11R6.3, one of the versions of X-Windows in the 1990s,
added a protocol extension called LBX, Low-Bandwidth X.  A great deal
of ingenuity went into reducing the size of messages in the X11
protocol, and an extension was added to the X server.  It met the same
fate; mostly quoting Wikipedia here:


Unfortunately, I don’t think ASN.1 BER or curses are likely to meet
the same fate as LBX any time soon.  But if you were designing a
system today, with the knowledge of LZW and LZ77, you could avoid them
entirely.


### Git uses compression instead of storing file diffs ###

Obsolete version-control systems like RCS and Subversion use the
worst-case O(N²) longest-common-subsequence algorithm to determine the
smallest possible edit to get from the current version of a file to its
previous version, or vice versa, so that storing all the past versions
of a file takes only slightly more space than storing the current
version of it.  This complicates file renaming and fails to save any
space when you're using the wrong newline convention.  However, it means
that the diffs are already computed and, in theory, available for use in
merging branches.

Git, instead, concatenates many different versions of many different
files into a “packfile”, sorted by a crude measure of similarity, and
compresses it with LZ77.  This turns out to produce a smaller total
filesize, and it eliminates the coupling between the storage format and
merge algorithms that has plagued many obsolete version control systems.


### Complexity and adaptability to pathological cases ###

LZW is only about 50 lines of code, which is a lot less than LZ77, and
it has less overhead than `gzip`, but neither it nor Burrows-Wheeler do
as well as LZ77 compression in extreme cases; the Gnumeric file
mentioned above is 9kB compressed with `compress`.  Some screen updates
I tested --- concatenating slightly modified copies of a 2.5K screen
image --- were an average of 140 bytes each with `gzip`, but 1200 bytes
each with `compress` and 270 bytes each with `bzip2`.


### Summary ###

So, to a great extent, you can forget about the space-efficiency of your file
formats and wire formats if you run them through a generic compression
algorithm as a last step, and optimize them entirely for readability,
extensibility, and simplicity.  Gnumeric takes this approach.  A particular
Gnumeric spreadsheet I have handy here, representing a screenful of text, is
2.7kB on disk; `gzip -d` on it produces 32kB of XML.


SAT solvers and theorem provers
-------------------------------

I assert without proof that a substantial chunk of compiler
optimizations could be solved with a general-purpose SAT solver, and
even more with a general-purpose theorem prover.


Suffix arrays provide searching, sorting, indexing, and compression
-------------------------------------------------------------------

Computer systems have a lot of code in them to do search.  LZ77
implementations must search for previous occurrences of the current
input string within their window.  Compilers and linkers must search
for symbols in symbol tables.  Databases must search by various keys,
including occasionally finding keys within ranges.  Full-text search
engines must search for terms that occur anywhere within documents.
Filesystems must search for filenames within a directory.


### Full-text indices subsume single-column RDBMS indices ###

There’s a small software company in Florida called askSam, which sells
a piece of software also called “askSam”, billed as a “free-form
database”.  Basically you divide your data into records.  Your records
are basically strings of text, but there’s a convention for “fields”:
a field named FOO is the text following the string “FOO[” up to the
next “]”.  This sounds kind of hokey, but with a little bit of
full-text indexing, an awful lot of queries get performance comparable
to the performance they’d get in a traditional relational database
with, you know, columns and shit.  (There are some queries that don’t,
of course, like multicolumn queries.)

So, in a sense, full-text indices subsume the more normal kinds of
database indices as a special case.  (Except for multicolumn indices,
and even for single-column queries, they’re probably worse by constant
factors.)


### Suffix arrays ###

One particularly interesting kind of full-text index is the suffix
array.  In a suffix array, you consider your entire corpus (database)
as one long string, and the index is an array of integers, one integer
per index point, and that integer is the offset into the string of
that index point.  (In the usual case of indexing a document full of
words, you might have an index point at the beginning of each word; if
you want to support arbitrary substring search, you might instead have
an index point at every character.)  The tricky part is that the
integers are sorted in the *lexicographical order of the suffixes of
the corpus starting at the points they point to*.  Which means that
all the index points followed by, say‚ “walnuts” are in a single run
in the array.  So you can binary-search in the array to find all the
occurrences of a word, and the first thing alphabetically following
it.  (There’s a trick called an LCP-LR array that stores some extra
information so that you can find arbitrarily long strings in
essentially linear time.)

It turns out that there’s a linear-time algorithm for constructing a
suffix array in about 50 lines of code, called the “skew algorithm”,
discovered by Kärkkäinen and Sanders.  (This might sound like it’s
asymptotically better than the O(N lg N) needed to construct a regular
database column index, but that’s sort of an illusion.  I’d be very
surprised if it turned out to be faster in practice.  But I don’t
expect it to be many times slower or asymptotically slower, either.)

Another feature of suffix arrays is that they give you the suffixes
beginning with a given string in *sorted* order.  That means that if
you have a big text file, you can find the range of its suffix array
that begins with '\n' and iterate over it, and you have the lines of
the text file in sorted order.  (Except for the first.)  And if you
want your emails sorted by subject, you can get most of the way there
by looking at the suffixes of your email file that begin with
'\nSubject:' --- then you just need to search from there to the
beginning of each email, which can be done relatively efficiently.

So what would it mean if you could replace most sorting and searching
code with the use of a single suffix-array primitive?  Things would be
a constant factor slower, but how much slower?


### Burrows-Wheeler compression is easy once you have a suffix array ###

Also, revisiting the subject of generic compression algorithms, there's
Burrows-Wheeler compression, which is commonly substantially more
efficient than LZW or LZ77, the hard part of which is generating a sort
of suffix array.  Perhaps if you already have an efficient suffix array
generation algorithm, implementing Burrows-Wheeler compression and
decompression might be easier than implementing LZ77 and almost as easy
as implementing LZW.  (You still need the other algorithms, though,
because Burrows-Wheeler isn't suitable for compressing streams such as
remote desktop sessions, and it does much worse than LZ77 at compressing
extremely repetitive things like Git packfiles or sequences of screen
images.)

There are “compressed full-text indices” or “self-indexes” based on the
Burrows-Wheeler transform that simultaneously provide the search functionality
of suffix arrays and the compressed-text-retrieval functionality of BWT
compression — in effect, the overhead of indexing your text is negative 60%.
So far I’ve been too intimidated by Navarro and Mäkinen's highly-cited 2006
[survey paper on the subject][Navarro] (67 pages long, including a five-page
bibliography) to actually read it, so I don’t know if they have the kind of
power-to-complexity ratio that the other things in here have.

[Navarro]: ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/survcompr.ps.gz


### Replacing hash tables ###

It may be a little hard to see how you could replace hash tables in
general with suffix arrays, since hash tables often contain references
to garbage-collected objects in memory, not just text strings.  You
could generalize your strings to contain object references as well as
characters, which doesn’t cause any problems for the skew algorithm as
long as there’s a well-defined ordering for object references; you could
end up building strings like this:

    main=*
    usage=#
    strerror=&amp;amp;

where the punctuation characters are actually references to objects.
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-07-16T07:37:01</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/201">
    <title>how to do a better predictive text input method for Android</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/201</link>
    <description>&lt;pre&gt;The SwiftKey X trial just expired on me on the Android phone loaner
I’m using from my work.  I still have the keyboard layout, but the
predictive text no longer works.  This makes it a real bummer to use
the machine.

I guess it serves me right for depending on proprietary software.  I
should have known better.

So it occurred to me to think about what would be needed to replace
it with free software.

The first thing to note is that SwiftKey X can result in major
disclosures of confidential information.  If you borrow someone else’s
Android phone to check your Facebook account or whatever, and you
start accepting the default suggestion over and over again, it will
frequently reproduce substantial chunks of text from their history.
If they used SwiftKey to enter confidential information into their
phone, you may end up seeing bits of it purely by accident if you use
some of the same words.

So, given that this is apparently acceptable, a quick-and-dirty input
method that works more or less the same way could be done as follows:

1. Use one of the n-gram datasets published by Google (e.g. the 2009
   books dataset from &amp;lt;http://books.google.com/ngrams/datasets&amp;gt;; in
   English, 2 gigabytes raw compressed for 1-grams, 26 gigabytes raw
   for 2-grams, 88 gigabytes raw for 3-grams; other languages
   available are French, Chinese, German, Hebrew, Russian, and
   Spanish, but not Arabic or Hindi; there’s also the smaller 2006 web
   dataset,
   &amp;lt;http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13&amp;gt;
   linked from
   &amp;lt;http://googleresearch.blogspot.com.ar/2006/08/all-our-n-gram-are-belong-to-you.html&amp;gt;
   for US$150) as a baseline.  A reasonable 1-gram dataset for
   English, like the British National Corpus frequency file I
   frequently use, might be half a megabyte compressed; the most
   common 2-grams might be another megabyte or two.  SwiftKey has
   apparently shown that you can get by with very few or no 3-grams.

2. Store the history of everything the user types --- maybe up to some
   limit, like 16 megabytes uncompressed.  (At 60wpm, that would be
   about 800 hours of typing.)  Maintain a prefix array --- a suffix
   array of the reversed text --- of this history.  The suffix array
   allows you to find the longest suffix of the text before the cursor
   that has previously occurred, and what followed it, in logarithmic
   time: 24 fetches from flash, say, or 0.5 ms.  An initial prototype
   could skip the suffix array and just do linear search of the text,
   which should scale to a few hundred kilobytes, while a version that
   does the search using a hash table indexed by three-character
   contexts (as described in the DEFLATE RFC) should scale to a
   megabyte or more.

From the n-grams, you can use e.g. Katz smoothing to derive enough of
a probability distribution to do Viterbi decoding of an observed
keystroke sequence.  SwiftKey does a little more than this, because it
can handle insertion and deletion errors as well, but that can be
handled with a simple dynamic-programming algorithm that’s very
similar to Viterbi in spirit.  Making the jump from there to
best-first search should give you the three possibilities displayed by
SwiftKey.  SwiftKey doesn’t seem to take into account how far from the
center of the key your finger is, but it does seem to take into
account how accurate your hitting of a particular key usually is.

The second item can be used not merely to update 3-gram-based language
models, but also to find the longest context immediately preceding the
cursor that has ever occurred before, then all of the places that
context has occurred, and what followed.  Probably the most recent
occurrence should be used for the #1 prediction, rather than the most
frequent; in general it will not be practical to retrieve all of the
occurrences, but perhaps enough could be retrieved to adjust the
language model substantially.

Kärkkäinen and Sanders’s “skew algorithm”, although not the first
linear-time suffix-array construction algorithm, nor the most
efficient, is probably the simplest.  The C++ implementation in their
paper is only 50 lines of code.

Another outcome of all of the above, apparently not used by SwiftKey
(but, I suspect, used by iOS) is a probability distribution for your
next keystroke.  This could be used to expand and contract the
hotspots for each key on the keyboard, so that key taps on the border
between two keys will be interpreted as the correct key.

You could do *much better* than SwiftKey with dimensionality reduction
techniques, which could allow much more context to be brought to bear
on the prediction problem using a much smaller model.  SwiftKey often
makes predictions that yield clearly ungrammatical 3-grams, which even
a simple 3-gram model over part-of-speech tags ought to be able to
reject.  There’s a lot of dimensionality-reduction work in the last
decade of statistical computational linguistics that could be brought
to bear.

&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-07-15T05:15:03</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/200">
    <title>washing machines don't have to use energy to heat water</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/200</link>
    <description>&lt;pre&gt;I recently saw an article that claims that heating water to 40° for washing
laundry consumes around 5–10kWh per load.

However, it turns out that heating up water doesn't consume energy.  You need
energy to do it, to be sure — but that energy is still in the water after you
heat it up.  The Carnot limit prevents you from recycling most of that heat
into exergy to, say, drive the washing machine motor, but nothing is preventing
you from transferring that heat from dirty water into clean water, or into a
heat reservoir that holds it until your next wash.

First, is such a reservoir possible?  Most definitely.  You can buy
fractionated paraffin wax that melts at more or less whatever temperature you
want, to a few degrees of precision, with a heat of fusion near that of water
ice.  A big Thermos inside the washing machine, full of paraffin or another
phase-change material, could hold a laundry load's full of heat for several
days, if not a week.  But how to get the heat into it?

The key is a clever little machine called a "countercurrent heat exchanger": in
its simplest form, two parallel long pipes that have been welded together, with
water, or some other fluid, flowing in opposite directions through them.  As
the hot water flows in one direction, it loses heat to the cold water flowing
in the other direction — and the cold water, you might say, loses its cold to
the hot water.  When the formerly hot water exits, it's just a little warmer
than the cold water going in, and similarly, the formerly cold water exits just
a little cooler than the hot water was originally.

The countercurrent heat exchanger is part of many animals (a nose is a variant
that needs only one tube, which acts as a heat reservoir, and the reason you
don't get hypothermia just from breathing) and its use in human engineering
goes back decades, if not centuries.  And indeed devices like cement kilns and
Passivhausen use CCHE to reduce heat loss to manageable levels.  

So why doesn't your washing machine use one?  Maybe because 5-10kWh per load is
maybe a dollar at residential electricity prices, and saving US$30 a year isn't
worth the extra machinery, fragile glass, and extra space.

Essentially any process that heats something up to a high temperature, then
cools it back down, can have its efficiency improved by the same principle.
Firing pottery, making glass, casting metal, powder-coating metal, baking
bread, sterilizing water, making cement (as mentioned above), and so on.  Many
of these processes are not inherently energy-consuming, or inherently consume
only a tiny fraction of the energy that we currently spend on them.

Kragen
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-07-03T23:24:19</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/200">
    <title>washing machines don't have to use energy to heat water</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/200</link>
    <description>&lt;pre&gt;I recently saw an article that claims that heating water to 40° for washing
laundry consumes around 5–10kWh per load.

However, it turns out that heating up water doesn't consume energy.  You need
energy to do it, to be sure — but that energy is still in the water after you
heat it up.  The Carnot limit prevents you from recycling most of that heat
into exergy to, say, drive the washing machine motor, but nothing is preventing
you from transferring that heat from dirty water into clean water, or into a
heat reservoir that holds it until your next wash.

First, is such a reservoir possible?  Most definitely.  You can buy
fractionated paraffin wax that melts at more or less whatever temperature you
want, to a few degrees of precision, with a heat of fusion near that of water
ice.  A big Thermos inside the washing machine, full of paraffin or another
phase-change material, could hold a laundry load's full of heat for several
days, if not a week.  But how to get the heat into it?

The key is a clever little machine called a "countercurrent heat exchanger": in
its simplest form, two parallel long pipes that have been welded together, with
water, or some other fluid, flowing in opposite directions through them.  As
the hot water flows in one direction, it loses heat to the cold water flowing
in the other direction — and the cold water, you might say, loses its cold to
the hot water.  When the formerly hot water exits, it's just a little warmer
than the cold water going in, and similarly, the formerly cold water exits just
a little cooler than the hot water was originally.

The countercurrent heat exchanger is part of many animals (a nose is a variant
that needs only one tube, which acts as a heat reservoir, and the reason you
don't get hypothermia just from breathing) and its use in human engineering
goes back decades, if not centuries.  And indeed devices like cement kilns and
Passivhausen use CCHE to reduce heat loss to manageable levels.  

So why doesn't your washing machine use one?  Maybe because 5-10kWh per load is
maybe a dollar at residential electricity prices, and saving US$30 a year isn't
worth the extra machinery, fragile glass, and extra space.

Essentially any process that heats something up to a high temperature, then
cools it back down, can have its efficiency improved by the same principle.
Firing pottery, making glass, casting metal, powder-coating metal, baking
bread, sterilizing water, making cement (as mentioned above), and so on.  Many
of these processes are not inherently energy-consuming, or inherently consume
only a tiny fraction of the energy that we currently spend on them.

Kragen
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-07-03T23:24:19</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/199">
    <title>Meghan Saweikis on the development of human society</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/199</link>
    <description>&lt;pre&gt;Meghan Saweikis wrote the following; I thought it was really excellent,
so with her permission, I am posting it here.  Maybe it should go to
kragen-fw instead, but kragen-fw is mostly dead, really.

I think the issues touched on in this mail are among the most important
issues for every human being to think about.

---

I’ve been thinking a lot about what you said and about how I quantify
“net positive” vs. “net negative” impact on the world. I agree with
you that the objective level of suffering may be decreasing. I agree
that there is increasing literacy, autonomy and access to basic health
care services. I agree that these are positive things. I agree with
you that humanity as a whole has accomplished these positive
things. But, I think humanity as a whole has also created some
significant problems (both for other humans and for the
ecosystem). I’m not sure I think the benefits you mention are positive
enough to suggest that humanity in general is a “net positive.” And
even if humanity in general is a “net positive,” I don’t think that
implies that any particular individual (being a part of humanity)
would necessarily be a net positive.

As far as literacy goes, I think literacy is often equated with
opportunity and in many ways that is why increasing literacy has been
so important. In the past, I think this was a relevant correlation
because literacy was generally both necessary and sufficient to
provide a living wage and economic opportunity. Currently in the US
nearly 10% of individuals between 18-64 with basic literacy (measured
as high school diploma) live in families with incomes below a living
wage. I’m not certain that increasing literacy is still an indicator
of increasing opportunity. The drastic changes in literacy rate over
the past century has not correlated with changes in poverty rate
(which is currently at a 50 year high in the US). I should add that I
admit that I am not nearly as well traveled as many of my friends
(including you) - so my perspective is significantly more limited and
is very centered within the U.S.

Literacy itself is a wonderful thing (my mom was a librarian and I
distinctly remember that day when I turned five and was finally able
to get my own library card - it was a bigger deal than my driver’s
license when I turned 16). For me personally literacy has had numerous
positive impacts. It has provided me with a means of self-expression,
a valuable and virtually limitless learning tool, and a much needed
escape from reality. But most importantly, it has provided me with a
means of caring for myself and those who are important to me. Basic
literacy is still generally necessary for economic mobility, but I
don’t think it is sufficient. So I’m not sure that tracking increases
in literacy rates necessarily demonstrates a decrease in human
suffering.

Disregarding participation in humanity as a whole, and looking at
things on a more personal level, as an average US citizen I most
likely produce 4.6 lbs of solid waste a day (not to mention gas,
electricity, air-conditioning, sewage, etc). The clothes I’m wearing
and the laptop I’m typing on impact an entire supply chain of people
many of whom are not earning a living wage. I take four daily
prescriptions supporting companies that use animal testing. The local,
organic food I eat is probably from Monsanto seeds. There are a lot of
really nasty impacts that I on the world both actively and passively.

As an average citizen I also contribute a significant amount to taxes
which are used for great things like literacy (but possibly cancelled
out by not great things which are also paid for by taxes). A vast
array of people are receiving free appropriate mental health care for
sexual trauma in part because of my professional activities. I’m on
the board of directors for a cat rescue where I manage a network of 30
foster homes. I personally have fostered over 20 cats in the past
year. I’m on the leadership committee and serve as volunteer
coordinator for a second rescue organization. Does this cancel out the
negative? Maybe. Probably not. It is hard to quantify suffering.

On a broader note, you mentioned that humanity overall was positive
and I think that is very subjective. You suggested that the positive
trend towards less suffering supported this. I don’t necessarily
agree. If there are 50% content people and 50% suffering people
(obviously a simplification) and through literacy and health care and
other things we move to 60% content people and 40% suffering people
that certainly means that there is a net positive change. But does
that mean humanity as a whole has a net positive impact on the world?
It may mean that the particular generation of humans has a net
positive impact on humanity. But if there were no people there would
be no suffering people. If there were no people, there might be fewer
suffering animals. How many content people do we need before it is OK
for those people to “create” one suffering person/animal? Is moving in
the right direction good enough? How fast do we have to be moving?
Does the fact that most people don’t seem to care (or even consider)
their impact say something? I currently feel like humanity isn’t “net
positive.”

I don’t see any evidence that society is becoming more
compassionate. And that makes me sad.
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-06-28T01:49:47</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/198">
    <title>FeML: a skeleton of a femto-ML with nothing but polymorphicvariants and functions</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/198</link>
    <description>&lt;pre&gt;A year or two ago, Darius Bacon and I were talking about making a sort
of variant of Scheme with ML-like pattern-matching as the basic
primitive.  You have a slightly more complicated lambda, but you don’t
need cond, car, cdr, eq, or null? in the base language, just cons or
quasiquote, plus the pattern-matching lambda and either labels or
define.

For example:

    (define assoc 
      (lambda (key ((key . value) . more)) (cons key value)
              (key (x . more))             (assoc key more)
              (key ())                     '()))

Here `lambda` takes an arbitrary set of pattern-action pairs and
matches the patterns in order against the argument list, evaluating
and returning the action expression for the pattern that matches.  In
this case, first we check to see if the key is the caar of the list of
key-value pairs, and if so, we return its (reconstructed) car;
otherwise, if there are more items in the list, we recurse down the
list; otherwise, if the list is the empty list, we return the empty
list.

(It’s not clear what happens if you say `(assoc 3 4)`.  Should that be
an error, or should it have unspecified behavior to permit compiler
optimization?)

This is awfully close to Prolog, although you wouldn’t do it this way
in Prolog:

    assoc(Key, [[Key | Value] | _], [Key | Value]).
    assoc(Key, [_ | More], Y) :- assoc(Key, More, Y).
    assoc(_, [], []).

However, you don’t actually need general backtracking search or
Prolog’s unification of inputs and outputs to make it Turing-complete.
The result is an arguably simpler notation for general recursive
functions than McCarthy’s ur-Lisp, and certainly one that’s easier to
use for writing compiler-like programs, things that operate on
recursively-structured data.

(This is probably in Essentials of Programming Languages somewhere.
And, if not, it’s very similar indeed to Aardappel and, for that
matter, a simple ML.)

So in April, I was thinking about this again, and it occurred to me
that you might be able to implement this without runtime type
checking.

OCaml polymorphic variants
--------------------------

In particular, it turns out Objective Label, which is integrated into
the versions of OCaml over the last several years, does decidable type
inference on a language including higher-order functions, mutability,
and what it calls “polymorphic variants”, which are basically Prolog
terms or Aardappel trees.  The consequence is that it needs very
little runtime type checking, and is immune to runtime type errors,
catching at compile-time the `(assoc 3 4)` example above.

Here's some Objective Caml code that type-checks, compiles, and works,
without any explicit type annotations:

    let rec length = function | `Nil -&amp;gt; 0 | `Cons(_, x) -&amp;gt; 1 + length x;;

For this, OCaml infers the type ``([&amp;lt; `Cons of 'b * 'a | `Nil ] as 'a)
-&amp;gt; int``, which is to say it’s a function returning int (since that’s
the return type of `+`) and taking as its argument *at most* 
`` `Nil `` with no arguments or `` `Cons `` with two arguments, whose
first argument can be of any type but whose second argument must be of
the same type as the function’s argument.  The “at most” means it’s
okay to call the function in a context where it is guaranteed to be
passed *only* `` `Nil ``, for example, but it is not okay to invoke it
where it might be passed things with some other kind of value.

Here's a higher-order function which invokes a given function with
`` `Nil ``:

    let callnil x = x `Nil ;;

OCaml infers the type ``([&amp;gt; `Nil ] -&amp;gt; 'a) -&amp;gt; 'a`` for this function;
that is, it’s a function that takes a function argument that takes,
*at least*, `` `Nil ``, and returns whatever that function does.  That
means that it’s okay for the function to be able to handle other kinds
of arguments as well.  So we can apply this `callnil` function to
`length` above:

    callnil length ;;

which returns 0.

This makes type inference substantially more complicated than in ML,
because you aren’t just unifying types; there’s a subtyping relation,
or I think actually two of them, and you sometimes take the meet or
join of types.

One drawback of polymorphic variants is that every occurrence of the
variant tag could be associated with a different type, which
apparently produces some slight efficiency reductions in OCaml.

Here’s a list made out of polymorphic variants in OCaml:

    `Cons(1, `Cons(2, `Cons(3, `Nil)));;

OCaml infers the type `` [&amp;gt; `Cons of int * [&amp;gt; `Cons of int * [&amp;gt; `Cons
of int * [&amp;gt; `Nil ] ] ] ] `` for this list, which is to say that each
of the three Conses that make it up is of a different type; one is
``[&amp;gt; `Cons of int * [&amp;gt; `Nil ] ]``, another is ``[&amp;gt; `Cons of int * [&amp;gt;
`Cons of int * [&amp;gt; `Nil ] ] ]``, and the third is the entire list.

However, the above `length` function can successfully be applied to
it, so OCaml is able to unify all three of those types (plus the type
of the Nil node) with the argument type of the function.

FeML
----

So I was thinking that an eager language with:

* polymorphic variants as its only data structure;
* pattern-matching as its only conditional;
* (higher-order) functions as its only means of abstraction;
* tail-calls as its only means of iteration;
* and complete run-time type erasure

Ought to be:

* feasible to implement;
* as safe as ML;
* as flexible as Smalltalk (or more so, since the “messages” can
  contain other messages you pattern-match on);
* easy to optimize to the neighborhood of C performance;
* relatively terse;
* and a language in the simplicity neighborhood of C, Scheme, and Lua.

Let’s call it “FeML”, for “Femto-ML”.

What’s the equivalent of classes?  You need *either* λ-calculus-style
automatic currying *or* closures, but you don’t need both.  (That is,
you could make it work like C, without nested functions; you could
require people to write their code directly in supercombinators.)

Interestingly, you can take advantage of polymorphic-inline-cache
optimizations from Self for your pattern-matching; at call-sites to
functions that pattern-match their arguments, you can test the common
cases for that callsite and jump straight to the clause that is
usually used.

It would be pretty awesome to be able to write programs in
happy-go-lucky Python style, but with full pattern-matching and C-like
efficiency.  *Predictable* C-like efficiency.  And parametric
polymorphism (aka templates) without the hassle that accompanies it in
C++ or Java.  And it seems like FeML might be able to deliver that.

How does this map to object-oriented programming?  I’m not 100% sure
that I haven’t screwed something up with inheritance, but I don’t
think so.  It requires a sort of mental inversion:

* Functions correspond to classes.
* Variables captured from outer scopes, or early arguments to curried
  functions, correspond to instance variables, as is common practice
  in JavaScript.
* The last argument to the function corresponds to a method name and
  arguments.
* Pattern-matching on that argument corresponds to method dispatch.
* The variant tag of that argument corresponds to a method name.
* The pattern-match clauses correspond to method definitions.
* And a catch-all clause in that pattern-matching that tail-calls
  another function with the same argument corresponds to inheritance
  from a single superclass.

You can do all of that in ordinary ML, but then the “subclasses” can’t
add new “methods”.  Polymorphic variants give you that ability.

There are a couple of flies in the ointment, though.

First, OCaml’s type inference algorithm for polymorphic variants
doesn’t actually handle that implementation of inheritance:

    let a = function `X -&amp;gt; 1 | `Y -&amp;gt; 2 and b x = match x with `Z -&amp;gt; 3 | _ -&amp;gt; a x ;;
                                                                               ^
    Error: This expression has type [&amp;gt; `Z ]                
           but an expression was expected of type [&amp;lt; `X | `Y ]
           The second variant type does not allow tag(s) `Z

That doesn’t, in itself, mean that this approach is infeasible.  It
might be trivial to extend the type-inference algorithm to handle that
case — it’s clearly true that in the second branch of the conditional,
`x` cannot be `` `Z `` — but I don’t know enough about type inference
to be confident that that won’t, for example, make the type-inference
problem for the new language undecidable.

Second, in the above case, the “methods” of the “class” `a` are pure
functions.  But what if they actually needed something out of the
object?  Say, they need to call some other method on “self”?  How do
you write the code so that they get a reference to “self”?

Because the language is so simple, you can use a very minimal syntax for
it.  For example:

    assoc key (Cons (Cons key value) _): Cons key value
    assoc key (Cons _ more): assoc key more
    assoc _ Nil: Nil

    a X: 1
    a Y: 2
    b Z: 3
    b x: a x

    length Nil: 0
    length (Cons _ x): + 1 (length x)

Or:

    assoc key =
        Cons (Cons key value) _: Cons key value
        Cons _ more: assoc key more
        Nil: Nil

    a =
        X: 1
        Y: 2
    b =
        Z: 3
        x: a x

    length =
        Nil: 0
        Cons _ x: + 1 (length x)

(The above makes it seem like some infix operators would be a really
good idea, for lists if nothing else.)

Aardappel has the interesting idea of not distinguishing syntactically
between functions (node heads with rewrite rules existing) and data
structures (polymorphic variants, i.e. heads without any rewrite rules
defined).  Some tree-rewriting languages allow you to define rewrite
rules that rewrite a tree partway, so the same node head can serve as
either one.  Intuitively I think that these would make both the
type-checking and the syntax of FeML more complicated.
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-06-14T07:37:01</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/197">
    <title>short-term predictions: 2014, 2017, 2022</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/197</link>
    <description>&lt;pre&gt;The release of Meteor in April reminded me about an abandoned project of my own
called Kogluktualuk, which was basically Meteor, but never implemented to any
significant extent. It was kind of obvious that something like Meteor needed to
exist. So I thought I would record some other predictions about things that
haven't happened yet, to see if my foresight is really as good as it seems in
hindsight. After all, it's easy to fool yourself into thinking about only your
correct predictions, forgetting the stupid ones. So this is, like debugging, a
sort of exercise in therapeutically feeling stupid. 

---

Two years from now, most iOS applications will be written in Cordova (formerly
PhoneGap) or a successor, rather than in ObjC. This is both because JS is a much
more productive language than ObjC and in order to target Android as well. 

Within 5 years, P2P protocols will resurge in importance. This is despite the
massive move from desktop and laptop computers to handheld computers running
iOS and Android and using cellphone networks. The driver will be better
connectivity and crackdowns on user-generated content on centrally-operated
network services like YouTube and Megaupload. 

When automated fabrication—the scenario where you get your next bicycle by
downloading bicycle blueprints over the network and sending them to a machine
that then produces a bicycle for you without human intervention—happens, it
will not be by means of 3-D printers, which work by depositing layers of a
small number of materials. Instead, it will take the form of automated assembly
by robots of parts mostly made by other means, such as laser cutting, torch
cutting, CNC machining, and planar printing processes. 

More and more communication between people will be mediated by computers. 

Within a few years, most of our lives will be recorded and permanently archived
without our knowledge or consent. 

Android will continue to grow over the next three years at the expense of iOS.

Apple will release at most one groundbreaking new product (like the Apple ][,
Macintosh, Newton, HyperCard, NeXTStep, iMac, iPad, Lisa, iPhone, iTunes Music
Store, and Macbook Air, in the next ten years. 

Nokia will collapse. Their Windows phones will be a failure. 

Solar energy will provide nearly as much marketed energy as coal by 2022.

Oil production will not exceed its 2008 peak by more than 30% by 2022.
Traditional oil production has peaked. 

Oil prices will exceed their 2008 peak at least once by 2017.

US influence will wane; that is, it will be less significant in 2022 than
today. 

China's laissez-faire copyright enforcement will become more widespread by
2017, despite US protests. 

Argentina will have another financial crisis by 2017, with a collapse of the
peso, but not as severe as in 2001.

Photovoltaic, rather than solar thermal, will still be the major form of
marketed solar energy until 2022.

In 2022, the rather stupid opinion that Chinese manufacturers are mere copycats
will be much less widespread, held only by the occasional crank rather than, as
today, garden-variety ignorant people. 

By 2022, Brazil will be a bigger startup hub than New York, England, Russia,
Australia, or Japan, but not Silicon Valley or China. 

The murder rate in Argentina will be higher in 2013, 2014, and 2015 than in
2012. 

A generic GPU programming language will arise to replace CUDA and enable
competition with NVIDIA by 2020.

Photographic and audio recording evidence will be easy to fake so that human
eyes can't tell the difference by 2017. 

Quantum computers will turn out to work, but building ones big enough to
revolutionize anything will take longer than five more years. 

Most performance-critical code will scale up to at least 16 cores by 2017,
despite doing it with explicit concurrency, such as threads and locks or
message passing like Erlang and Golang, rather than implicit forms like
transactional memory or APL-like array operations.

Cash payment will still be common throughout the poor countries in 2022.

As computerized communication, planning, and manufacturing take over the
economy, companies will continue to shrink while depending on ever more custom
software. The consequence is that programming will partly displace management
as a core competency of running a business. 

China will largely shift to nuclear power generation for electricity by 2022.

By 2017, desktop computers will be something like CRTs a couple of years back:
used in special circumstances (gamers, say) and where money is tight, but not
many places.  Instead people will use laptops, phones, cloud applications,
microservers, and rackmount servers. As I explained in "people, places, things.
and ideas" in 1999, I see this development as profoundly prejudicial to
software freedom, but it is, if anything, accelerating. 

Computer security will keep getting worse at least until 2017, with
exponentially more software deployed on networks, almost all without the
requisite knowledge to secure or audit it. This will drive OS-level sandboxing
like PNaCl, but that will be only moderately effective, in large part because
free operating systems will not be used much (although Linux is) so users will
not be allowed to make their own computers secure, and absent that, software
vendors don't have the right incentives.

Self-driving cars will be a substantial minority of cars in rich countries by
2017. 

Assassinations will rise dramatically by 2017 as their cost falls dramatically,
due to the lack of computer security, to the pervasive gathering and
warehousing of previously private information, and to lower-cost killer robots.

Spam and viruses will remain major problems at least until 2017.

No language will replace C as the language of nearly all serious software by
2022. C lost that position around 1992, to a combination of C and C++, and
since then there's been a diversity of languages in use. That will remain true.
Not Scala, not JS, not Python or Ruby, will shut out the other languages like
CRTs did.

Mining mineral resources such as copper from existing landfills will employ
tens of thousands of people by 2022, mostly illegally and in very hazardous
conditions. 

Population growth will continue to slow until 2022. 

The US GDP in 2022 will not be more than 22% bigger than in 2011, measured in
oil, kilowatt hours, or wheat, or probably measured in gold, at their average
prices over the year.  US GDP in 2011 is estimated at US$15.1 billion.

Humans will not go beyond low Earth orbit, for example to geosynchronous orbit,
to the moon, or to other planets, by 2022. 
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-06-12T10:39:58</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/196">
    <title>math for 3-D printers without any linear actuators</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/196</link>
    <description>&lt;pre&gt;How do you control a polar 3-D printer?

Like, you have a turntable with your workpiece on it, and another
turntable that moves your tool, that can swing it over the center of
the first turntable.  And you can move your other turntable up and
down.  Now how do you convert x and y coordinates into angular
positions?

Perhaps a handy thing to know is that the polar equation r = sin θ
produces a circle running through the origin, retracing the same
circle when θ is in [0, π] and in [π, 2π], with its center at r = ½, θ
= ½π.  So, given the radius r at which a point is located on the
workpiece turntable, you can rotate the workpiece turntable to any θ
such that sin θ = r, such as sin⁻¹ θ, to give the tool turntable a
chance to hit the point.  Then it’s just a matter of finding the angle
for the tool turntable, a simple atan2.

In short, starting from x, y on the workpiece turntable, scaled so
that the distance between turntable centers is 1:

    θw = sin⁻¹ √(x² + y²)
    s = sin θw
    c = cos θw
    θt = atan2(1 - (c·x - s·y), s·x + c·y)

If you’re trying to design a toolpath, it may be worthwhile to take
advantage of the other alternatives available for θw.
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-04-26T07:37:01</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/195">
    <title>printing microfilm on ordinary laser printers on paper</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/195</link>
    <description>&lt;pre&gt;#!/usr/bin/python
# -*- coding: utf-8 -*-
"""Compute proportional-font print size of a text.

The laser printer at my new workplace is 600dpi in both directions.
It prints on A4 or similar paper: 216×279mm.

Simple multiplication yields a capacity of 33.6 megabits per page, or
about 4 megabytes.

At 600dpi, a 4×6 pixel character cell like the one I use in
&amp;lt;http://canonical.org/~kragen/sw/dofonts-1k.html&amp;gt; gives you an 80×66
page of 13.5 mm × 16.8 mm.  (Janne Kujala designed the font.) If you
can successfully control every pixel, the result should be clearly
readable with a magnifying glass.  (If we consider 5-point text as the
lower limit of comfortable readability, and these 6-pixel-tall
characters are 1/100 inch, you need 7× magnification to make the text
comfortably readable.)

Further calculation suggests that the A4 page will contain an array
fitting 16 such reduced pages horizontally and 16.6 of them
vertically; practically this is probably 15 horizontally and 16
vertically, or 240 pages, 480 pages on the two sides of the paper, or
31680 80-column lines.  This is on the order of a megabyte and a half
of text, assuming an average of about 50 bytes per line.  You could
print the King James Bible on four sheets of paper.

In this form, the Rosetta Disk's 13000-page archive would require some
260 sheets of paper, the size of an average hardcover book.  Their
metal disk is probably more durable than the paper, but the paper
version can be printed for about US$20-100, rather than the several
thousand dollars for the disk.

But perhaps we can do better!  That's only about a third of the total
data capacity of the page.  Can a proportional font let us use less
than 4 pixels horizontally, on average?

After inspection, I think that the 4×6 pixel font would need the
following widths for its 96 character glyphs, modified slightly, and
ensuring one pixel of space on the right of each one:
"""

proportions_1 = map(int, """
2 2 5 4 4 4 4 2 3 3 4 4 3 3 2 4
4 3 4 4 4 4 4 4 4 4 2 3 4 3 4 4
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 3 4 3 4 3
3 4 4 4 4 4 4 4 4 2 3 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 4 2 4 5 0
""".split())

"""
So we can definitely use less than 4 pixels on average.  We could
probably even give an extra pixel or two of width to M, N, W, m, w,
and maybe #, Z, and z, and get an improvement in readability:
"""

proportions_2 = map(int, """ 
2 2 6 4 4 4 4 2 3 3 4 4 3 3 2 4
4 3 4 4 4 4 4 4 4 4 2 3 4 3 4 4
4 4 4 4 4 4 4 4 4 4 4 4 4 6 5 4
4 4 4 4 4 4 4 6 4 4 6 3 4 3 4 3
3 4 4 4 4 4 4 4 4 2 3 4 4 6 4 4
4 4 4 4 4 4 4 6 4 4 5 4 2 4 5 0
""".split())

"""
If you're really shooting for density instead of readability, you
could make up some other glyph to represent parapraph breaks or line
breaks or whatever, probably only 3 pixels wide.
"""

def total_pixel_width(inputfile, proportions, newline_width):
    assert len(proportions) == 96
    widths = [0] * 32 + proportions + [0] * 128
    widths[ord('\n')] = newline_width
    
    total_width = 0
    for line in inputfile:
        for char in line:
            total_width += widths[ord(char)]
    return total_width

def main():
    import sys, codecs
    sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
    kjv = lambda: open('bible-pg10.txt')
    print u"Fixed-width 4×6 with a 4-pixel newline:"
    print total_pixel_width(kjv(), [4] * 96, 4)
    print "With some characters narrower:"
    print total_pixel_width(kjv(), proportions_1, 3)
    print "With some narrower and some wider:"
    print total_pixel_width(kjv(), proportions_2, 3)

if __name__ == '__main__':
    main()

"""
Output:
Fixed-width 4×6 with a 4-pixel newline:
17407376
With some characters narrower:
15187418
With some narrower and some wider:
15486129

So yes, the proportional font saves 11% of the space, or 13% in the
version where none of the glyphs are wider.  You’d need 15.5 million
pixels horizontally to represent the PG KJV this way with the wider
variant, or 92 916 774 total pixels, 92.9 megabits or 11 614 597
bytes’ worth.  So you could print the entire KJV on three sheets of
paper, using a regular laser printer. And with some human effort you
can do substantially better; about 0.3 million of those 15.5 million
are used just in representing newlines, and most of those are
linebreaks to keep the lines from getting too wide, not for separating
paragraphs.

"""
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-04-23T20:42:30</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/192">
    <title>backtracking HTML templating</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/192</link>
    <description>&lt;pre&gt;This is an idea originally from 2010-04-08 that I just never got
around to publishing until now.

Now that I’ve written the below, I’m tempted to try to hack this
together tonight, but I think I should probably sleep instead, and
maybe do this on the weekend.

Motivation
----------

This week I’ve been working on a bibliographic website, which, among
other things, renders citations into HTML.  And this is resulting in
me writing a lot of HAML templates that say things like:

    - if &amp;lt; at &amp;gt;publication.booktitle?
      In #{publication.booktitle}.
    - if &amp;lt; at &amp;gt;publication.month?
      #{&amp;lt; at &amp;gt;publication.month} #{&amp;lt; at &amp;gt;publication.year}.

The general pattern here is that there are one or more properties that
need to be present, and if they’re present, we format them together,
along with some other window dressing intended to format them
properly.  But this involves a lot of very local duplication in the
code, and even though that duplication is local, it is still
error-prone; consider what happens in the above case if the year is
missing.

Now, aside from the question of whether there are already existing
BibTeX HTML formatters for Rails, this is an interesting kind of
problem to solve.

Introducing the solution: an example
------------------------------------

So I remembered this thing I’d written up in a notebook a couple of
years ago, under the name "Backtracking HTML Templates", which makes
that kind of thing very simple, and eliminates the duplication.
Here’s a presentation by example:

    In $booktitle.&amp;lt;;&amp;gt; $month $year.&amp;lt;;&amp;gt;

This template consists of three "transactions", separated by `&amp;lt;;&amp;gt;`.
The first one concatenates "In ", the value of the variable
`booktitle`, and ".".  If it succeeds, that concatenation will be
emitted.  If it fails, nothing is emitted for that transaction.  It
will fail if any of the three things it’s concatenating fail.  Two of
them are literal strings, which always succeed, but the middle one is
a variable reference, which will only succeed if the variable is set.
The second transaction is similar, but it interpolates two variables
instead of one, so it fails unless *both* variables are set.  The
third transaction is the empty string after the second `&amp;lt;;&amp;gt;`, which is
boring but worth mentioning.

A longer example
----------------

This shows off the full feature set of this templating language,
albeit in a fairly artificial setting:

    &amp;lt;!DOCTYPE html&amp;gt;
    &amp;lt;html&amp;gt;
      &amp;lt;head&amp;gt;&amp;lt;title&amp;gt;&amp;lt;{&amp;gt;$title - rabujogopo.com&amp;lt;|&amp;gt;RABUJOGOPO&amp;lt;}&amp;gt;&amp;lt;/title&amp;gt;&amp;lt;/head&amp;gt;
    &amp;lt;body&amp;gt;&amp;lt;;&amp;gt;
      &amp;lt;h1&amp;gt;$title&amp;lt;/h1&amp;gt;&amp;lt;;&amp;gt;

      &amp;lt;p&amp;gt; Hello&amp;lt;{&amp;gt;, $fullname&amp;lt;|&amp;gt;, $firstname $lastname&amp;lt;;&amp;gt;&amp;lt;}&amp;gt;. &amp;lt;;&amp;gt; You
          have arrived from a search engine seeking $searchterms. &amp;lt;;&amp;gt;
          This article is tagged as 
          &amp;lt;{&amp;gt;&amp;lt;&amp;lt; at &amp;gt;tags&amp;gt;&amp;lt;a href="/tag/$tagname"&amp;gt;$tagname&amp;lt;/a&amp;gt;&amp;lt;,&amp;gt; &amp;lt;}&amp;gt;.
          &amp;lt;|&amp;gt; This article is not tagged.&amp;lt;;&amp;gt; 
          This article is important. &amp;lt;if $importance == high&amp;gt; &amp;lt;;&amp;gt;
      &amp;lt;/p&amp;gt;
      &amp;lt;{&amp;gt;
        &amp;lt;&amp;lt; at &amp;gt;sections&amp;gt;
        &amp;lt;h2&amp;gt;$title&amp;lt;/h2&amp;gt; &amp;lt;!-- the section title --&amp;gt;
        &amp;lt;&amp;lt; at &amp;gt;contents&amp;gt;
        &amp;lt;if $type == paragraph&amp;gt;
          &amp;lt;p&amp;gt;$text&amp;lt;/p&amp;gt;
        &amp;lt;|&amp;gt;
          &amp;lt;&amp;lt; at &amp;gt;subsections&amp;gt;
          &amp;lt;h3&amp;gt;$title&amp;lt;/h3&amp;gt;
          &amp;lt;&amp;lt; at &amp;gt;contents&amp;gt;
          &amp;lt;p&amp;gt;$p&amp;lt;/p&amp;gt;
      &amp;lt;}&amp;gt;
    &amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;

### Informal explanation ###

This contains literal text, `$var`, `&amp;lt;;&amp;gt;`, `&amp;lt;{&amp;gt;&amp;lt;}&amp;gt;`, `&amp;lt;|&amp;gt;`, `&amp;lt;if
...&amp;gt;`, `&amp;lt;&amp;lt; at &amp;gt;var&amp;gt;`, and `&amp;lt;,&amp;gt;`, which are the entire feature set of the
language.  I’ve already explained literal text, `$var`, and `&amp;lt;;&amp;gt;`, and
concatenation.

#### Alternation: `&amp;lt;|&amp;gt;` and `&amp;lt;{&amp;gt;&amp;lt;}&amp;gt;` ####

The `&amp;lt;;&amp;gt;` is used to isolate a transaction, so that if it fails, you
just get an empty string instead of, say, an error-message page.  It’s
just syntactic sugar for the more general alternation construct
written with `&amp;lt;|&amp;gt;`, though.  This template:

    $foo bar&amp;lt;;&amp;gt;

could be written like this, and mean exactly the same thing:

    &amp;lt;{&amp;gt;$foo bar&amp;lt;|&amp;gt;&amp;lt;}&amp;gt;

The `&amp;lt;|&amp;gt;` makes an *alternation* of two transactions.  If the first
transaction succeeds, the alternation yields the result of the first
transaction; if it fails, the alternation runs the other transaction
and yields its result.  In this case, the second transaction is the
empty string, which will always succeed.  The `&amp;lt;{&amp;gt;&amp;lt;}&amp;gt;` syntactically
delimit the reach of the `&amp;lt;|&amp;gt;` operator, so that it doesn’t suck up
your entire document as its operands.

But your second transaction could be something more interesting than
an empty string.  For example, it could explain that some piece of
data was missing.

    &amp;lt;{&amp;gt;By $author.&amp;lt;|&amp;gt;Author unknown.&amp;lt;}&amp;gt;

So, you see, in the longer example template above, the page title has
a fallback of “RABUJOGOPO”.  And we greet the user by the variable
`fullname` if we have it, otherwise `firstname lastname`, or otherwise
just with “Hello.”

#### `&amp;lt;if condition&amp;gt;` ####

`&amp;lt;if condition&amp;gt;` succeeds and produces the empty string if `condition`
is true, and fails otherwise.  Because of how failure works, you can
put it before, after, or in the middle of the text you want it to
control.  So far I’ve only thought about string-equality conditions.

So, in the example above, the sentence “This article is important.” is
only emitted if the variable `importance` has the value `high`,
because otherwise the `&amp;lt;if&amp;gt;` fails, failing the whole transaction
containing that sentence.

#### Iteration: `&amp;lt;&amp;lt; at &amp;gt;var&amp;gt;` and `&amp;lt;,&amp;gt;` ####

`&amp;lt;&amp;lt; at &amp;gt;var&amp;gt;` begins a loop.  The variable named needs to name a list of
dicts.  If the variable doesn’t exist or is an empty list, the
transaction fails.  Otherwise, the text that follows `&amp;lt;&amp;lt; at &amp;gt;var&amp;gt;` is
evaluated once for each of the dict in the list — with the names from
the dict added to the local namespace — and the resulting values are
concatenated.  If any of the iterations of the loop fails, the whole
loop fails.

(This namespace thing is the one thing I’m not sure about here;
merging the global namespace with the per-iteration namespace means
that both human readers and the compiler have to guess which scope a
given variable refers to.)

Because the loop fails if it executes zero times, you can use `&amp;lt;|&amp;gt;` to
provide an alternative to display in the case where a collection came
up empty, as in the “This article is not tagged.” example above.

`&amp;lt;,&amp;gt;` is an optional part of the loop construct.  `&amp;lt;&amp;lt; at &amp;gt;var&amp;gt;a&amp;lt;,&amp;gt;b`
evaluates `ab` once for each item in the list *except the last*, and
for the last item, evaluates only `a`.

So, in the above, we actually have loops nested four deep, with
sections containing contents, which may contain either text or
subsections, which contain further contents.  And in the loop over
`&amp;lt;&amp;lt; at &amp;gt;tags&amp;gt;`, the links to the individual tags are separated by spaces,
but the last tag is not followed by a space, because the space is
between the `&amp;lt;,&amp;gt;` and the outside of the `&amp;lt;{&amp;gt;&amp;lt;}&amp;gt;`.

Some notes on grammar
---------------------

For the most part, this is a relatively traditional infix grammar,
masquerading as a markup language.  `&amp;lt;{&amp;gt;&amp;lt;}&amp;gt;` provide nothing more than
syntactic grouping; `&amp;lt;|&amp;gt;`, `&amp;lt;;&amp;gt;`, bare juxtaposition, and `&amp;lt;&amp;lt; at &amp;gt;var&amp;gt;` are
infix operators; and `&amp;lt;,&amp;gt;` can be thought of as an infix operator that
only makes sense within the right operand of `&amp;lt;&amp;lt; at &amp;gt;var&amp;gt;`.

The precedence order I’ve implicitly used above, from tightest binding
to loosest:

* concatenation or juxtaposition;
* `&amp;lt;|&amp;gt;`
* `&amp;lt;;&amp;gt;`
* `&amp;lt;&amp;lt; at &amp;gt;var&amp;gt;` and `&amp;lt;,&amp;gt;`

I don’t know if this is the best order, or even a sane one.

As it happens, all of juxtaposition, `&amp;lt;|&amp;gt;`, and `&amp;lt;;&amp;gt;` are associative,
which provides some liberty; but the looping construct is
not. `&amp;lt;{&amp;gt; a &amp;lt;&amp;lt; at &amp;gt;b&amp;gt; c &amp;lt;}&amp;gt; &amp;lt;&amp;lt; at &amp;gt;d&amp;gt; e` is not the same as `a &amp;lt;&amp;lt; at &amp;gt;b&amp;gt; &amp;lt;{&amp;gt; c &amp;lt;&amp;lt; at &amp;gt;d&amp;gt; e &amp;lt;}&amp;gt;`, 
because in the first case, `&amp;lt;&amp;lt; at &amp;gt;b&amp;gt;` and `&amp;lt;&amp;lt; at &amp;gt;d&amp;gt;` are separate iterations
that get concatenated, and in the second case, `&amp;lt;&amp;lt; at &amp;gt;d&amp;gt;` is nested inside
`&amp;lt;&amp;lt; at &amp;gt;b&amp;gt;`.

(It’s kind of nice to have a syntactically flat structure that’s
nevertheless capable of expressing complex things.)

In the above, I treated `&amp;lt;&amp;lt; at &amp;gt;var&amp;gt;` as right-associative: `&amp;lt;&amp;lt; at &amp;gt;a&amp;gt;&amp;lt;&amp;lt; at &amp;gt;b&amp;gt;x` is
`&amp;lt;&amp;lt; at &amp;gt;a&amp;gt;&amp;lt;{&amp;gt;&amp;lt;&amp;lt; at &amp;gt;b&amp;gt;x&amp;lt;}&amp;gt;`, not `&amp;lt;{&amp;gt;&amp;lt;&amp;lt; at &amp;gt;a&amp;gt;&amp;lt;}&amp;gt;&amp;lt;&amp;lt; at &amp;gt;b&amp;gt;x`.  That was just because it
made that particular example come out syntactically simpler, not
because of any deep analysis.

Variations
----------

The current syntax completely fails to take advantage of the most
desirable ASCII punctuation characters, which are “.”, “:”, “'”, and
“"”.  You could write `&amp;lt;.sections&amp;gt;` instead of `&amp;lt;&amp;lt; at &amp;gt;sections&amp;gt;`, and `&amp;lt;:
$importance == high &amp;gt;` instead of `&amp;lt;if $importance == high&amp;gt;`, thus
reducing the visual noise a bit.  However, `&amp;lt;if ...&amp;gt;` is more
understandable.  Maybe `&amp;lt;:if ...&amp;gt;` to avoid clashes with SGML.

Similarly, `&amp;lt;&amp;lt;` and `&amp;gt;&amp;gt;` might be better than `&amp;lt;{&amp;gt;` and `&amp;lt;}&amp;gt;`, which
look a bit unbalanced.  `&amp;gt;&amp;gt;` has the disadvantage that it could
legitimately occur in normal HTML, and both could occur in normal JS
or similar languages.  Also, they’re ambiguous when they’re used with
HTML tags adjacent to them.

With those two variations we would get:

    &amp;lt;!DOCTYPE html&amp;gt;
    &amp;lt;html&amp;gt;
      &amp;lt;head&amp;gt;&amp;lt;title&amp;gt;&amp;lt;&amp;lt;$title - rabujogopo.com&amp;lt;|&amp;gt;RABUJOGOPO&amp;gt;&amp;gt;&amp;lt;/title&amp;gt;&amp;lt;/head&amp;gt;
    &amp;lt;body&amp;gt;&amp;lt;;&amp;gt;
      &amp;lt;h1&amp;gt;$title&amp;lt;/h1&amp;gt;&amp;lt;;&amp;gt;

      &amp;lt;p&amp;gt; Hello&amp;lt;&amp;lt;, $fullname&amp;lt;|&amp;gt;, $firstname $lastname&amp;lt;;&amp;gt;&amp;gt;&amp;gt;. &amp;lt;;&amp;gt; You
          have arrived from a search engine seeking $searchterms. &amp;lt;;&amp;gt;
          This article is tagged as 
          &amp;lt;&amp;lt;&amp;lt;.tags&amp;gt;&amp;lt;a href="/tag/$tagname"&amp;gt;$tagname&amp;lt;/a&amp;gt;&amp;lt;,&amp;gt; &amp;gt;&amp;gt;.
          &amp;lt;|&amp;gt; This article is not tagged.&amp;lt;;&amp;gt; 
          This article is important. &amp;lt;:$importance == high&amp;gt; &amp;lt;;&amp;gt;
      &amp;lt;/p&amp;gt;
      &amp;lt;&amp;lt;
        &amp;lt;.sections&amp;gt;
        &amp;lt;h2&amp;gt;$title&amp;lt;/h2&amp;gt; &amp;lt;!-- the section title --&amp;gt;
        &amp;lt;.contents&amp;gt;
        &amp;lt;:$type == paragraph&amp;gt;
          &amp;lt;p&amp;gt;$text&amp;lt;/p&amp;gt;
        &amp;lt;|&amp;gt;
          &amp;lt;.subsections&amp;gt;
          &amp;lt;h3&amp;gt;$title&amp;lt;/h3&amp;gt;
          &amp;lt;.contents&amp;gt;
          &amp;lt;p&amp;gt;$p&amp;lt;/p&amp;gt;
      &amp;gt;&amp;gt;
    &amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;

You need some kind of escaping mechanism in any case.

The loop-namespace problem bothers me.  One approach is to name the
loop variable, and extend `$var` to `$var.prop`:

        &amp;lt;&amp;lt; at &amp;gt;sec in $sections&amp;gt;
        &amp;lt;h2&amp;gt;$sec.title&amp;lt;/h2&amp;gt;
        &amp;lt;&amp;lt; at &amp;gt;c in $contents&amp;gt;
        &amp;lt;if $c.type == paragraph&amp;gt;
          &amp;lt;p&amp;gt;$c.text&amp;lt;/p&amp;gt;
        &amp;lt;|&amp;gt;
          &amp;lt;&amp;lt; at &amp;gt;s in $c.subsections&amp;gt;
          &amp;lt;h3&amp;gt;$s.title&amp;lt;/h3&amp;gt;
          &amp;lt;&amp;lt; at &amp;gt;p in $s.contents&amp;gt;
          &amp;lt;p&amp;gt;$p.p&amp;lt;/p&amp;gt;

Another approach might be to use Perl/BASIC/Ruby/CoffeeScript sigils
to distinguish variables from different scopes.  Ruby has
`&amp;lt; at &amp;gt;instance_variables` and `&amp;lt; at &amp;gt;&amp;lt; at &amp;gt;class_variables`.  If we only supported
two levels of scope — globals, and variables inside the innermost loop
— we could do the same thing, using separate sigils, like `$.` and
`&amp;lt;&amp;lt; at &amp;gt;. &amp;gt;`, for loop-locals:

        &amp;lt;&amp;lt; at &amp;gt;sections&amp;gt;
        &amp;lt;h2&amp;gt;$.title&amp;lt;/h2&amp;gt; &amp;lt;!-- the section title --&amp;gt;
        &amp;lt;&amp;lt; at &amp;gt;.contents&amp;gt;
        &amp;lt;if $.type == paragraph&amp;gt;
          &amp;lt;p&amp;gt;$.text&amp;lt;/p&amp;gt;
        &amp;lt;|&amp;gt;
          &amp;lt;&amp;lt; at &amp;gt;.subsections&amp;gt;
          &amp;lt;h3&amp;gt;$.title&amp;lt;/h3&amp;gt;
          &amp;lt;&amp;lt; at &amp;gt;.contents&amp;gt;
          &amp;lt;p&amp;gt;$.p&amp;lt;/p&amp;gt;

I think that if you want to have more than two active scopes, you
can’t rely on sigils; you end up having to count nested scopes, and
basically writing de Bruijn indices by hand, which is not an
acceptable user interface.  You’d need to use the `$c.text` approach I
suggested earlier.

I had at one point thought about making the data model simpler:
instead of a namespace being a mapping from names to either strings or
lists of namespaces, a namespace would be a mapping from names to
values, where a value was either a string or a list of values, like in
Scheme’s macro system.  So you’d iterate down lists in parallel,
hoping they were the same length and depth:

      &amp;lt;dl&amp;gt;
        &amp;lt;{&amp;gt; &amp;lt;&amp;lt; at &amp;gt; $n $v&amp;gt;
          &amp;lt;dt&amp;gt;$n&amp;lt;/dt&amp;gt;
          &amp;lt;dd&amp;gt;$v&amp;lt;/dd&amp;gt;
        &amp;lt;}&amp;gt;
      &amp;lt;/dl&amp;gt;
    &amp;lt;|&amp;gt;
      Empty.

Or:

    &amp;lt;p&amp;gt; &amp;lt;&amp;lt; at &amp;gt; $word $synonym&amp;gt;
    &amp;lt;b&amp;gt;$word&amp;lt;/b&amp;gt;: &amp;lt;{&amp;gt;&amp;lt;&amp;lt; at &amp;gt; $synonym&amp;gt;$synonym&amp;lt;,&amp;gt;, &amp;lt;}&amp;gt;
    &amp;lt;/p&amp;gt;

In Scheme, you only have to mention the variable names once, instead
of twice; you stick a `...` after the structure containing them to
indicate at which syntactic level the repetition is supposed to occur.
If you did the same thing here, using `&amp;lt;[&amp;gt;&amp;lt;]&amp;gt;` to indicate repetition,
you’d get this:

    &amp;lt;[&amp;gt;
    &amp;lt;p&amp;gt;&amp;lt;b&amp;gt;$word&amp;lt;/b&amp;gt;: &amp;lt;[&amp;gt;$synonym&amp;lt;,&amp;gt;, &amp;lt;]&amp;gt;&amp;lt;/p&amp;gt;
    &amp;lt;]&amp;gt;

That gives the same kind of DRY implicit-DWIM feeling to iteration
that `&amp;lt;|&amp;gt;` gives to conditionals, and with the sigil approach, you
could even mix global variables with loop variables.  Still, I’m not
convinced that it’s better, particularly since it forces what I think
of as a fairly inflexible data model.
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-04-13T04:30:44</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/190">
    <title>dithering by optimizing perceptually-weighted error in the Gabortransform</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/190</link>
    <description>&lt;pre&gt;The spatial responses of neurons in the first stages of the human visual system
are well approximated by Gabor features (the wavelet that arises from the
multiplication of a spatial sinusoid, i.e.  sin(omega dotproduct(somedirection,
[x y]) with a Gaussian), and the visual system is more sensitive to some
frequencies of these Gabor features than others.  (Interestingly, amblyopic
vision is sensitive to a different set of spatial frequencies.) JPEG, JPEG2000,
and MPEG compression exploit this feature of visual perception by spending more
bits on spatial frequencies that are likely to be perceptually salient, given a
typical viewing distance and pixel density.

(I'm going to stop saying "spatial frequencies".  All the frequencies in this
post are spatial; I'm not going to talk about how many terahertz is red or how
frequently you microsaccade.)

Dithering or halftoning is another sort of lossy data compression problem.  In
a typical case, given a picture with, say, 9 bits per pixel of luminance
information, you want to produce another picture with only 1 bit per pixel, but
which nevertheless appears identical to the original, to a human viewer.  This
is similar to the problem JPEG solves, but with the additional constraint that
you don't have the freedom to specify the "decompression algorithm", because
biology has already determined it for you.

There is clearly a tradeoff here between different frequencies: the bits you
use to encode higher frequencies are not available to encode lower frequencies,
and vice versa.  This implies that there's an unavoidable tradeoff between
precisely locating edges and precisely rendering levels of brightness.  But
that doesn't mean that standard algorithms are Pareto-optimal in these
tradeoffs, and because of the human visual system's varying sensitivity to
errors at different Gabor frequencies, even Pareto-optimal algorithms could
possibly be improved.

A special case where this is a particularly easy tradeoff is when the pixels
are very, very small: say, 600 or 1200 dpi. At a typical viewing distance of 1
meter, a 1200dpi pixel subtends 9 seconds of arc.  A foveal cone subtends some
30 arcseconds, and ideal human foveal visual resolution is commonly cited as 20
arcseconds, although "vernier acuity", the ability to visually detect edge
misalignments which is used by vernier scales, goes down to 4-6 arcseconds.  So
modern inkjet printers have the advantage of being able to use spatial
frequencies to which we are not only neurally insensitive but actually not even
able to see at all except perhaps via aliasing and microsaccade-based subpixel
effects.  (Apparently this "visual hyperacuity" is still not understood, and is
listed on Wikipedia's "Perceptual paradox" page; my bet is on microsaccades and
antialiasing.)  So you can guiltlessly devote none of your bits to representing
the top octave or two of the frequency range, which gives you plenty of bits to
encode the lower frequencies that you can actually see.  That's a tradeoff you
don't even have to think about.  (You just have to be careful that your
halftoning algorithm isn't going to generate artifacts that alias down into the
visual range.)

However, we don't always enjoy the advantage of being able to use 10-arcsecond
pixels.  The pixels on the netbook screen I'm typing this on are about 0.2mm
across, or about 130dpi, which is considerably denser than is typical for a
computer screen.  If I look at the screen from 25cm, each pixel subtends about
2½ minutes of arc.  And I'm thinking about a laser-printed microfilm project
where you'll typically be magnifying the printed pixels, for reading, to about
the same size.  At these resolutions, bilevel images generated using the usual
algorithms (halftone screens, ordered dithering, Floyd-Steinberg error
diffusion; [Cris Luengo wrote an excellent overview][2]) are dramatically
inferior to grayscale images.

They're also dramatically inferior to the visual quality achieved by stippling,
like the work of Noli Novak and her colleagues in the Wall Street Journal, and
copperplate engraving, which also produce bilevel images with limited
resolution.

This, then, is the background against which I wrote my kragen-hacks post last
month, which crudely [makes images look like you ran them through a
photocopier][3].

This weekend, I was working on rendering text with a very small pixel
proportional pixel font (a followup to [my font rendering engine in DHTML][0],
with application to archival data recording as in [Carving stone tablets at
home in your spare time][1]) and it occurred to me that a really useful thing
to archive in a tiny font would be Wikipedia.  Maybe not all of Wikipedia, but
at least [the most important articles][4].  And it occurred to me that you'd
pretty much need to include at least some of the images; nearly all of the
articles are substantially improved by them (see
&amp;lt;http://en.wikipedia.org/wiki/Claude_Monet&amp;gt;,
&amp;lt;http://en.wikipedia.org/wiki/Richard_Wagner&amp;gt;, and
&amp;lt;http://en.wikipedia.org/wiki/Maize&amp;gt;) but articles about two- or
three-dimensional inventions, such as &amp;lt;http://en.wikipedia.org/wiki/Arch&amp;gt;;
visual arts, such as &amp;lt;http://en.wikipedia.org/wiki/Mona_Lisa&amp;gt;; or anatomy, such
as &amp;lt;http://en.wikipedia.org/wiki/Kidney&amp;gt;; would be totally vitiated by their loss.

With an eye toward finding a reasonable way to include images in what is
effectively microfilm made on a standard laser printer, I [tried out some
different dithering methods on a photograph of singer Gwen Stefani][5],
including my kragen-hacks post.  I was able to get results nearly as good as my
photocopier hack by the well-known technique of applying a "sharpen" filter to
the image before dithering it, thus amplifying the high frequencies.  But none
of the results even approached the quality of WSJ stipple portraits.

[0]: http://canonical.org/~kragen/dofonts.html
[1]: http://lists.canonical.org/pipermail/kragen-tol/2005-May/000777.html
[2]: http://www.cb.uu.se/~cris/blog/index.php/archives/355
[3]: http://canonical.org/~kragen/sw/byn/ (Contains bare breasts.)
[5]: http://canonical.org/~kragen/sw/stefani-images/ (Contains bare arms.)
[4]: http://en.wikipedia.org/wiki/Wikipedia:Vital_articles

Why was the quality so bad? I looked at the academic literature on dithering
and halftoning using computers, and found a lot of stuff about trying to
minimize the error between a smoothed version of the output image and the
original input image, but nothing about minimizing an approximation of
perceptual error using a Gabor-wavelet-based model of the human visual system.

The only thing I came up with where someone was using Gabor wavelets to measure
and optimize the performance of halftoning algorithms was an article called
"Joint Edge-Preserving and Moire-Suppressing Image Resampling Using the
Discrete Gabor Transform", which was specifically about preventing the creation
of moiré artifacts or "aliasing" in gravure printing --- that is, visible
difference frequencies created by heterodyning through nonlinear interaction
between a sampling grid and high-frequency components of the input image ---
without low-pass filtering the holy living fuck out of the input image with a
single-pole filter, thus blurring the edges of everything, and without just
brick-wall filtering its Fourier transform, which will create all kinds of
poorly localized ringing artifacts.  Instead they use its the Gabor transform
to figure out which parts of the image to low-pass filter the holy living fuck
out of.  At least that's what it sounds like from the abstract; I haven't
actually read the whole paper.  You'd think maybe they'd just low-pass filter
it in the Gabor domain instead of the Fourier frequency domain or the spatial
domain, but what do I know?

There's been some dramatic recent work by Chang, Alain, and Ostromoukhov called
"[Structure-Aware Error Diffusion][6]", building on a 2008 paper by Pang, Wong,
Cohen-Or, and Heng called "[Structure-aware Halftoning][7]", which has even
more dramatic results but runs orders of magnitude slower than other halftoning
algorithms.

[6]: http://liris.cnrs.fr/Documents/Liris-4442.pdf
[7]: http://www.cse.cuhk.edu.hk/~ttwong/papers/structurehalftone/paper/texhalftone.pdf

But none of this work is exploiting the bandpass nature of the contrast
sensitivity function (CSF) of the human visual system, a well-documented
phenomenon that has been known for decades among vision scientists.  The HVS is
most sensitive to frequencies between 2 and 5 cycles per degree; frequencies
*on both sides* are less visible.  (Those seem to be the half-power points,
i.e.  this is a band-pass filter of just over an octave of bandwidth, with a Q
factor of just below 1.) That means that when papers say things like, "It is
accepted that the ideal halftone image has a blue-noise spectrum," ([Pang et
al.  2008][7]) which implies that the lower-frequency the noise, the less
acceptable it is, they are setting themselves an arbitrary goal that cannot be
justified in terms of perceptual quality.

So here's what I was thinking: halftone by minimizing some vector norm in the
Gabor space between the input and output images, where the Gabor space
components are perceptually weighted.  That is, you do the Gabor transform of
both images, multiply the Gabor components pointwise by a CSF model of the HVS,
subtract corresponding components, and measure the size of the resulting
difference vector, say by the algebraic sum or Euclidean sum of its components.
That's what you want to minimize.  The great thing is that because the Gabor
transform is a linear operator, modifications to the spatial-domain image
affect the Gabor-domain image and difference vector linearly.

It would be nice if the Gabor transform were invertible, but the Gabor basis
functions are apparently not orthogonal, which I think means that most sets of
coefficients for the Gabor basis functions do not correspond to an image in the
spatial domain.  If they were, you could tell which pixels in your image most
need to be flipped in order to improve the image quality: they're the ones with
the highest peaks in the inverse Gabor transform of the motherfucking
difference vector.  Presumably you can make some kind of headway by throwing
out coefficients at random until a solution starts to exist.

There was actually a 1997 SPIE paper, [Image Quality Assessment with a Gabor
Pyramid Model of the Human Visual System][8], by Taylor, Pizlo, Allebach, and
Bouman, which proposed doing exactly this in order to measure the fidelity of
image approximation methods, including, among other things, halftoning.  This
hasn't been completely ignored (Google Scholar finds 34 citations) but you
still find even the best current halftoning work using totally boneheaded
blue-noise criteria instead.  Maybe it's because Taylor was a grad student at
Purdue, where he'd actually just done a bunch of work on halftoning and then
went on to do his Ph.D. thesis on this model, and then went off to teach at a
vocational school in Milwaukee and stopped doing research.

[8]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.2727&amp;amp;rep=rep1&amp;amp;type=pdf

Taylor et al.'s algorithm doesn't let you invert the Gabor transform, but it
does produce for each pixel a probability that a human will notice a difference
between the images at that pixel, which seems like it would be just as good for
the case of halftoning, where it tells you which pixels you most need to flip.
From there, it's gradient descent and boom, Bob's your uncle.

Or maybe I just had too much Diet Coke and it's 5 AM.  Your call.
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-04-09T08:03:52</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/189">
    <title>GitHub's "responsible disclosure" stance is irresponsible</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/189</link>
    <description>&lt;pre&gt;GitHub has generally been exemplary in their handling of crises, service
interrruptions, and security, but there has recently arisen a significant
exception, one where I would like to see GitHub correct their irresponsible
behavior.

Four paragraphs explaining the `attr_accessible` problem
--------------------------------------------------------

Ruby on Rails by default allowed web requests to update arbitrary database
fields unless those fields were declared `attr_protected`, or unless some other
fields in the same table ("model") were declared `attr_accessible`.  

This means that, unless you're extremely careful, your Rails application would
allow random users to change things in your database that they shouldn't be
able to change or even see, just by guessing the names you'd given them and
typing those names into their browser.  Egor Homakov reported this as a bug in
Rails.  The Rails people argued it wasn't a bug, because people would be
extremely careful, unless they were stupid.  

The Rails project keeps its source code on GitHub, which is written in Rails,
by some of the world's top Rails experts.  Homakov found a place where they had
forgotten to use `attr_accessible` and used it to add a modify the Rails
project adding a file explaining that even the best Rails developers made that
mistake.

The Rails project fixed the problem.

GitHub's irresponsible response
-------------------------------

In [GitHub's blog post][0] almost a month ago, mojombo wrote, in part:


As [I posted][1] at the time, there's a big problem with this blog post.   It's
not GitHub's place to set policy on what kind of disclosure is or isn't
"responsible". Egor Homakov's responsibility is not to GitHub; his
responsibility is to other users. His moral duty upon finding a security
vulnerability is to act in such a way that other users will be minimally hurt.
It appears that he has fulfilled that responsibility spectacularly in this
very unusual case, and he could not have done so by following GitHub's new
policy.

GitHub has no business demanding his, or your, agreement to a legal contract
that prohibits you from exercising your best judgment in such a case.

Furthermore, "responsible disclosure" is a propaganda euphemism for "allowing
irresponsible vendors to cover their asses, possibly at the expense of their
users". Terms like "responsible disclosure" have no place in a serious
discussion. Please see the [Responsible Disclosure blog post by the Google
security team][2] for further details.

What GitHub should do
---------------------

GitHub should change the *title* and *URL* of this section of their policy so
that it no longer accuses anyone who acts otherwise of acting irresponsibly,
retract the section of their blog post that accuses Homakov of acting of
irresponsibly, and publicly apologize for the accusation.  The actual *content*
of the policy is exemplary.

Because GitHub is generally such a responsible company, I have a lot of hope
that they will do this once it is pointed out to them that it is the
responsible thing to do.

[0]: https://github.com/blog/1069-responsible-disclosure-policy "Responsible Disclosure Policy, 2012-03-04"
[1]: http://news.ycombinator.com/item?id=3665427
[2]: http://googleonlinesecurity.blogspot.com/2010/07/rebooting-responsible-disclosure-focus.html "Rebooting Responsible Disclosure: a focus on protecting end users, by Chris Evans, Michal Zalewski, et al."
[3]: http://help.github.com/responsible-disclosure
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-04-02T16:11:11</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.culture.people.kragen.thinking/188">
    <title>higher-order programming and concurrency in low-level languages</title>
    <link>http://comments.gmane.org/gmane.culture.people.kragen.thinking/188</link>
    <description>&lt;pre&gt;(Warning: almost none of the code in this post is tested.)

In C, I miss the kind of higher-order programming I can do in
languages like Ruby and Python, but in its usual form, that kind of
programming seems to require garbage collection (for closures), and
that’s not really compatible with running on a microcontroller.  But
I've thought up some relatively simple enhancements that could
substantially increase the power of C-like languages for
microcontroller-like targets.

I was thinking about the problems of stack overflow that I was
[complaining about on the Arduino back in January][0], and it turns
out John Regehr and the TinyOS folks have done some [interesting work
on bounding stack sizes through static analysis][stacktool].  (Thanks
for telling me, Brandon!)  You feed an AVR executable to their
analyzer, and it tells you what its maximum stack depth is, or the
maximum stack depth of each thread.  This is **awesome**.

[0]: http://lists.canonical.org/pipermail/kragen-tol/2012-January/000943.html
[stacktool]: http://www.cs.utah.edu/~regehr/stacktool/

Stack overflows
---------------

This is important in the first place because, if your stack gets big
enough, it collides with other data, such as the heap, and then
writing to the stack corrupts the other data, and writing to the other
data modifies the stack and very likely crashes your program in a
mysterious way.  This basically never happens in 32-bit operating
systems because, first, the gap between the stack and the heap is
gigabytes in size, and, second, it has other write-protected stuff in
it, so the stack collides with that stuff first, so you get a very
non-mysterious error report: a segfault with, typically, a stack
hundreds of thousands of calls deep.  Or, occasionally, just one, as
Yossi Kreinin hilariously explains in his [The C++ Sucks Series:
Petrifying Functions][1].

[1]: http://www.yosefk.com/blog/the-c-sucks-series-petrifying-functions.html

But it does happen regularly on embedded microcontrollers.

Stack analysis and interrupts
-----------------------------

A lot of their work has to do with interrupts: when an interrupt
fires, the interrupt pushes the current program counter (and, on some
architectures, flags) onto the stack.  Then, the interrupt handler
itself may push more state onto the stack, and if it re-enables
interrupts, it may become a reëntrant interrupt handler, dramatically
complicating the job of stack analysis.

This requires some explanation.  Platforms like the PDP-11 (I think)
allow a higher-priority interrupt to preempt the handler for a
lower-priority interrupt, so that the maximum handling latency for an
interrupt depends only on its handler and the higher-priority
interrupts.  Platforms like the x86 and AVR don’t have multiple
priority levels for executing interrupt handlers; they just have an
"interrupts enabled" flag.  So you have the temptation to make your
interrupt handler reëntrant by reënabling interrupts before the
interrupt handler is done, in order to reduce the latency for
higher-priority interrupts --- but the same original interrupt could
then re-occur and make your stack deeper without limit.

Stacked interrupt handler data is particularly a problem in a system
with a lot of different threads, each of which have small stacks,
since the interrupt handlers could stack up on any of the threads.
This means that the maximum interrupt handler stack depth gets added
to the maximum depth of each thread stack.  A solution to this is for
the interrupt handler to switch to the main stack before pushing
anything else on the stack, so that the cost to each thread stack is
only one word of extra maximum depth.

So that means that if you have, say, a thread that just endlessly
loops in a leaf function with three or four local variables, you could
figure that out at compile-time and **give that thread a stack space
of, say, 10 bytes**.  Which is to say that the entire run-time state
of the thread could comfortably and safely be a local variable of
another function.  Or you could have a global array of them.

Python-style generators
-----------------------

Anyway, though, I was thinking about Python generators.  Python
generators are a limited form of coroutine, which, like [Adam
Dunkels’s protothreads][2], can only yield control to another thread
directly, not from within some other function they’ve called.  So,
although their own local bindings can’t be kept on a call stack
(because some function can instantiate a generator and return it),
functions they call could use the main stack, since those functions
must return before the generator has a chance to yield control back to
another generator or the main thread.

[2]: http://www.sics.se/~adam/pt/

I’ve written before about how [generators simplify problems like Peter
Naur’s telegram problem][3], although [Dave Long demurred][4], but I
haven’t tried to give an explanation of why that is.

[3]: http://lists.canonical.org/pipermail/kragen-hacks/2010-September/000514.html
[4]: http://lists.canonical.org/pipermail/kragen-discuss/2010-September/001144.html

Consider these three Python generators:

    def each_slice(n, items):
        buf = []

        for item in items:
            buf.append(item)
            if len(buf) + 1 &amp;gt; n:
                yield buf
                buf = []

        yield buf

    def words(file, pattern='[a-zA-Z]+'):
        for line in file:
            for word in re.findall(pattern, line):
                yield word

    def ngrams(n, words):
        queue = []
        for word in words:
            queue.append(word)
            if len(queue) == n:
                yield tuple(queue)          # Lists aren’t hashable.
                queue.pop(0)

The above allows you to write expressions like the following:

    ngrams(2, words(open('corpus')))
    each_slice(2, words(open('corpus')))
    ngrams(3, (word.lower() for word in words(open('corpus'))))

Now, you could also write ngrams() or each_slice() to eagerly consume
an already-computed input array, and fill up and return an array of
their results, at a substantial cost in promptness, storage space,
locality of reference, and (if you don’t have garbage collection)
memory management complexity.  So, for efficiency, you could write
each of the expressions above as an explicit loop, with the various
stages fused together; for example, the third one becomes:

    file = open('corpus')
    queue = []
    for line in file:
        for word in re.findall('[a-zA-Z]+', line):
            word = word.lower()
            queue.append(word)
            if len(queue) == 3:
                use(tuple(queue))
                queue.pop(0)

I hope, however, that you will agree that this is considerably harder
to read and change than the one-line expression above, which is to say
that the parts of the algorithm named as `ngrams` and `words` are
mixed together and therefore not reusable.

The way that traditional APL and NumPy work is to be fully eager, as I
described before; if you write an expression like:

    (peak * sin(arange(n_samples) * (2*pi * hz / rate))).astype(Int16)

then Numeric first allocates an array for the output of `arange` and
fills it up, then multiplies each number by `(2*pi*hz/rate)` (in a new
array), then computes the sine of each number (in a new array), then
multiplies each number by `peak` (in a new array), then converts the
result to 16-bit integers (in a new array), making a total of six
arrays.  For sufficiently large values of `n_samples`, it would
certainly be more efficient to do this computation one sample at a
time, although the optimal value is probably somewhere in the middle.
(I'm obviously no expert here, but you might want to compute at least
one SIMD register’s worth of results, and probably several, before you
go haring off after function pointers and branch target buffers and
mispredicted branches and whatnot.)

You could regain that kind of separation of concerns, without the
generator or coroutine facility, by constructing explicit iterator
objects with `next`; here I use closures to avoid the syntactic tax of
Python classes, but I assume that the iteration is terminated by a
StopIteration exception, which it’s okay to simply pass along to our
caller, as with the normal Python iteration protocol:

    def words(fnext, pattern='[a-zA-Z]+'):
        buf = []
        def next():
            while not buf:
                buf = re.findall(pattern, fnext())
            return buf.pop()
        return next

    def ngrams(n, wnext):
        queue = []
        def next():
            while len(queue) &amp;lt; n:
                queue.append(wnext())
            rv = tuple(queue)
            queue.pop(0)
            return rv

        return next

I’ve done some work here to restructure the code to avoid explicit
“state” variables that encode what was previously implicit in the
program counter, but it seems to me that the code is more complicated
anyway.  Letting each generator track its own state with its program
counter, rather than having to use explicit variables, allows us to
use sequence, alternation, and iteration rather than explicit state
transition functions.

Recasting `each_slice` without generators is a little hairier:

    def each_slice(n, items):
        done = False
        def next():
            if done:
                raise StopIteration

            buf = []
            while len(buf) + 1 &amp;lt;= n:
                try:
                    buf.append(items.next())
                except StopIteration:
                    done = True
                    return buf
            return buf
        return next

Python generators are pull-driven (“external iterators”), but that's
irrelevant; the benefit of coroutines applies equally to push-driven
(“internal”) iterators.  Here's `ngrams` as a push-driven iterator;
note that it is almost exactly the same as before:

    def ngrams(n, output):
        queue = []
        while True:
            queue.append(yield)
            if len(queue) == n:
                output.send(tuple(queue))
                queue.pop(0)

Python-style generators in an extended C
----------------------------------------

So Python-style generators are a very useful language feature.  Are
they compatible with being added to C, or a language with similar
sensibilities?  In particular, are they still useful without closures
and garbage collection?  Yes; here I sketch an outline of how.

Each generator routine would be a data type, like a struct or union,
of some computable and possibly large size.  The value returned from
the generator each time would be returned in the same way as a normal
function return value --- typically in registers for small return
values, or in a fixed spot on the stack or via a hidden pointer
argument for larger ones --- and similarly for values passed in to the
generator upon initial invocation or resumption.  (If you’re trying to
implement this yourself in C, the hidden pointer may not be so hidden.
The key point is that it doesn't make sense for the iterator to
allocate memory for a large return value; its client should do that.)

Perhaps it would be nice to automatically (statically or at allocation
time) initialize variables of generator type when they are created so
that they contain the proper start address for the first resumption;
but if you’re allocating them on the heap, you probably need some kind
of operation to do the initialization explicitly, and in any case,
generators that take arguments are very useful.  So perhaps the syntax
should look something like this:

    int generator range(int start, int stop, int step)
    {
      for (int ii = start; ii &amp;lt; stop; ii += step) yield(ii);
    }

    // ... and here we create a temporary variable on the stack.
    for (int ii: range(0, 10, 1)) printf("%d\n", ii);

    // But we can also create a variable:
    range rr;
    // Later initialize it.  There is no problem using function-call
    // syntax for this, since C can already do function-call syntax
    // with some types of values, namely functions and function
    // pointers.  It just needs a rule for how to apply it to a new
    // type.
    rr(0, 10, 1);
    // And later still use it:
    for (int ii: rr) printf("%d\n", ii);    

Since they will normally all be invoked from the same place, all of
the yield points in the generator need to take the same set of
arguments, if they can take arguments, and all that are not the first
entry point need to return the same type, as well.  (Or types, if
you’re wandering a bit further afield from C.)

When the generator finally does finish, it needs a way to tell its
caller that it’s finished and should not be resumed again.  The
main alternatives are:

- A second return value, or CPU flag.  This is easy to achieve at the
  machine-code level.

- Twiddling the return address; for example, you could add the length
  of a short jump instruction to it, avoiding the need for a
  conditional jump in the caller, but it still needs an unconditional
  jump in the caller.

- A `longjmp()` (or equivalent) to the end of the loop, using a
  `jmp_buf` or equivalent passed as a hidden parameter.

There’s also the question of how to *write* a wider variety of
invocations than `for (foo: somegen) { ... }`.  I mean, to do
Python’s:

    try:
        x = somegen.next()
    except StopIteration:
        y()
    else:
        z(x)

you could do:

    sometype x;
    for (x: somegen) goto ok;
    y();
    if (0) {
    ok:
      z(x);
    }

but that seems rather obscure!

A reasonable, if not very C-like, syntax might be

    z(next(somegen) { y(); goto somewhere; });

That’s only reasonable because you almost always want to handle the
end of an iterator using some kind of nonlocal transfer of control,
even if only a `break;`.

### Generator objects with just the local variables ###

How big is this in-memory generator object?  One approach would be to
make the in-memory generator object contain just the local variables,
program counter, and perhaps temporary variables of the generator
routine, and never to point the stack pointer to it.  (Unless it
happens to be the top thing in the stack frame of some other function,
that is.)  In this case, the generator’s local-variable references
will need to indirect through a context pointer; this is basically the
same thing GCC does with nested functions, and if you want to make the
generator resumption look like a normal function call to the caller,
you could use the same hack GCC uses for that --- as Norm Hardy showed
me one day, GCC wedges this into C by generating a two- or
three-instruction trampoline on the stack that loads the context
pointer and then jumps to the function.

In that case, the generator code itself can execute its “yield” by
stuffing the yielded value into the appropriate place and calling a
yield routine with the address of its generator object.  The yield
routine is something like `savu`/`aretu` in V6 Unix: it sticks what
ought to be its return address into the generator object it’s passed,
and then pops it off the stack before returning --- to the callsite
that had last resumed the generator.

### Generator objects containing entire thread stacks ###

Lua has a more powerful coroutine facility: its coroutines have entire
stacks of their own, so their callees, or callees' callees, can yield
control on their behalf --- so a Lua coroutine is essentially a whole
thread, but one that only runs when you pass control to it explicitly.

Could you do that in C?  `swapcontext()` does more or less that
already, but with the usual risks of stack overflows.  To get rid of
that risk, instead of having a special “generator” kind of function,
with an icky extra indirection on its local variable accesses, you’d
have a “coroutine stack for” type-constructor that could be applied to
any stack-depth-bounded function, like the existing type-constructors
“pointer to” or “array of ints”, whose size was the
statically-computed maximum stack depth for a particular function.

Providing such types would be really hard to do with a normal C
toolchain, since its size would depend on the entire library you were
linking against --- the maximum stack depths and nonrecursivity of the
nonrecursive library functions would become part of their ABI! --- but
it would be entirely reasonable to implement for embedded targets
where the whole linked program might be 16KiB.  It does, however,
require compiling the entire transitive call tree below a function
qwertz in order to figure out how big “coroutine stack for qwertz” was
going to be, so multipass compilers as well as separate compilation
would be pretty much out.

Then the “yield” routine would be more like a normal task switch, like
`swapcontext`, simply switching stacks and returning.

This approach should make it possible to run dozens, if not hundreds,
of cooperative threads, each with full-fledged stacks, within a
microcontroller with 4KiB of RAM.

Ruby’s iterator mechanism, and implementing it in a C-like language
-------------------------------------------------------------------

Ruby’s iterators are almost, but not quite, completely different from
Python iterators.

It occurred to me that Ruby’s iterator mechanism, which I think
originally came from CLU, should *also* be practical to implement in a
language for embedded programming --- in particular, it doesn’t
require garbage collection, but it is much more convenient.  I think
it supports higher-order composition of iterator transformations to
more or less the same extent as Python's mechanism, but I'm still not
quite sure.

So, what are Ruby iterators?

Ruby has several mechanisms for passing around procedures, which is to
say writing higher-order functions, which have the appearance of
having been added one by one over time.  The simplest one is
remarkably effective and remarkably limited:

    def each start, stop, step
      i = start
      while i &amp;lt; stop
        yield i
        i += step
      end
    end

    each 1, 10, 2 do |i|
      puts i
    end

So here we’ve extended the language with a looping construct which
takes a sort of lambda (called, in Ruby, a block) as an argument, but
*without reifying the block as a first-class value*.  Passing
arguments to the block is done with a special-purpose `yield`
construct, which in Ruby is an expression.  `yield` by itself means to
invoke the block with no arguments.  So there is, by construction, no
way to pass the block anywhere else, or do anything other than just
invoke it.  (Other Ruby mechanisms do allow you to do a wider range of
things.)

I think this design comes from CLU iterators originally.

That doesn’t prevent you from doing the usual downward-funarg things
with it; you just have to eta-expand them:

    def dotimes n
      each 0, n, 1 do |i|
        yield i
      end
    end

It does, of course, prevent you from returning the block or storing it
into a data structure, but in a much more elegant way than Pascal’s
downward-funarg mechanism.  It also prevents the block from being
recursive, that is, directly or indirectly calling itself.

You can use this block construct for more than just iteration; for
example, here it provides a comparator for a sort routine:

    # (yield a, b) &amp;lt;= 0 whenever a is metaphorically less than or equal to b.
    def insertion_sort! array
      (0...array.length).each do |nsorted|
        # Find latest point to insert next value:
        pos, to_insert = nsorted, array[nsorted]
        (nsorted-1).downto 0 do |ii|
          if (yield array[ii], to_insert) &amp;lt;= 0
            break
          end
          pos = ii
        end

        # Move later items over to make space, then insert it
        (nsorted-1).downto pos do |ii|
          array[ii+1] = array[ii]
        end
        array[pos] = to_insert
      end
    end

    string_array = ["yes", "YES", "no", "Yes!", "No!"]
    insertion_sort!(string_array) do |a, b|
      a.downcase &amp;lt;=&amp;gt; b.downcase
    end

    p string_array

Traditionally, however, methods like `insertion_sort` above are still
called “iterators”, even though this is somewhat imprecise.

### Efficient compilation of iterators without a heap ###

It occurred to me that you can also *compile* this in a simpler way
than Pascal’s general downward-funarg mechanism, which GCC also
implements for C, albeit without the type-system restrictions that
make it safe in Pascal.

C’s function-pointer mechanism is very simple to implement.  A
function pointer is a single word, and can be invoked (without
arguments or caller-saves, anyway) with a single instruction:

    call *%ecx         # and call the function pointed to

Pascal’s mechanism, by contrast, requires two words --- the function
pointer and a context pointer:

    struct fp { int (*fp)(); void *cp; };

The minimal implementation of the call is therefore two instructions,
plus an additional instruction either inside or outside the procedure;
here it is shown outside:

    mov (%eax), %ecx   # load function pointer
    mov 4(%eax), %edx  # load context pointer
    call *%ecx

And then of course you have to allocate memory for this two-word
structure and initialize it.

(As I mentioned above, GCC provides this facility for C, too.)

Then, inside the nested function, some local variable references
indirect through the context pointer, and some do not.  This means
that some variable references are slower, and the compiler is more
complicated.

It occurred to me that, under specific circumstances, you can
implement Ruby-like blocks more simply.  Later I realized that this
was something I had learned and forgotten from Raphael Finkel’s book,
“Advanced Programming Language Design”, although I think I may be
explaining a technique here isn’t in Finkel.

The key fact is that, since the block cannot be recursive, it doesn’t
need a stack frame for its parameters or local variables.  It can
simply use storage carved out of its caller’s stack frame.  This means
that it only needs one context pointer --- the usual stack frame
pointer --- not two.  And that means that variable references inside
the block can be compiled the same as those outside of it.

(None of the following applies to SPARC or similar register-window
architectures.)

### With a separate frame pointer ###

In runtimes that have a separate frame pointer and stack pointer, and
where functions store their parameters in their stack frame, this is
particularly easy to arrange.  The block is represented as a simple
code pointer, as in C.  The block is invoked by:

1. pushing the iterator’s frame pointer onto the stack;
2. pushing arguments to the block onto the stack or into registers;
3. loading the address for the block code;
4. restoring the iterator’s caller’s frame pointer (which the iterator
   saved on the stack much earlier);
5. calling the block code, pushing a return address onto the stack, or
   wherever your architecture likes to keep it.

For example:

        push %ebp               # save frame pointer
        mov -8(%ebp), %eax      # block parameter 1
        mov -16(%ebp), %ecx     # block parameter 2
        mov -20(%ebp), %edx     # code address of block
        mov 4(%ebp), %ebp       # restore caller’s frame pointer
        call *%edx              # invoke block
        pop %ebp

In the zero-argument case, if we already have the block in a register,
with no caller-saves, this reduces to:

        push %ebp               # save frame pointer
        mov 4(%ebp), %ebp       # restore caller’s frame pointer
        call *%edx              # invoke block
        pop %ebp

Upon entry to the block, the frame pointer is already set up to point
to the stack frame that the block belongs to, so access to local
variables is simply by indexing off the frame pointer.  (And the block
doesn’t need the usual function prologue/epilogue of push %ebp; mov
%esp, %ebp and pop %ebp.)  Block parameters are accessible by indexing
off the stack pointer or in registers.  The block returns control to
the iterator with a single `ret` instruction, and either the `ret` or
the caller pops the parameters, if they were pushed on the stack
instead of passed in registers.

This allows the block to call further functions with abandon; their
stack frames will be pushed on top of the stubby stack frame
containing the block’s return address.

However, nonlocal control flow from the block requires care, as do
nested blocks.  If the block executes a `goto`, `break`, or `return`
that leaves the block, the stack pointer must first be adjusted to
deallocate the iterator’s frame.  In some languages, you also need to
run cleanups such as `finally`, `unwind-protect`, `ensure`, or C++
destructors contained in the iterator.  But, absent cleanups, this can
be handled by storing the desired stack pointer in the frame itself;
it can then be restored with a single indexed load:

        mov 28(%ebp), %esp

(`return` in Ruby, like `^` in Smalltalk, exits the function
containing the block, not just the block itself.  Ruby has a `next`
which exits just the block, returning a value to the iterator.)

Nested blocks are slightly tricky because they may want to access the
parameters of their outer block; for that to work, the outer block
must store its parameters into local variables in the stack frame.

Although it's probably not worth it, this could be made simpler by
giving the iterator a pointer to store those parameters at; instead of
just passing a code address, we pass, Pascal-like, a pointer to a
structure containing the block code address and its parameters, such
as:

    struct twoargblock { void (*cp)(); int param1; int param2; }

Then the sequence to invoke the block becomes something like this
instead:

        push %ebp               # save frame pointer
        mov -20(%ebp), %edx     # address of block structure
        mov -8(%ebp), %ecx      # block parameter 1
        mov %ecx, 4(%edx)
        mov -16(%ebp), %ecx     # block parameter 2
        mov %ecx, 8(%edx)
        mov 4(%ebp), %ebp       # restore caller’s frame pointer
        mov (%edx), %edx        # load code address
        call *%edx              # invoke block

This probably simplifies the compiler, but particularly since most
blocks don’t have other blocks nested inside of them, it’s probably a
net loss for performance.  Even blocks that contain nested blocks may
be better off simply storing the parameters in a slot in the stack
frame on entry:

        mov %eax, -40(%ebp)
        mov %ebx, -44(%ebp)

...because that avoids the need to allocate space for, initialize, and
fetch the field containing the block code address.

### Without a separate frame pointer ###

If you access your local variables by indexing off the stack pointer,
the situation is considerably hairier.  It would seem that the
iterator can’t simply restore its caller’s stack pointer before
invoking the block, because then if the block calls any functions,
those functions’ stack frames will overwrite the iterator’s own stack
frame, causing a crash.

But there’s a solution, as found in Finkel’s book, which can even be
implemented unreliably with no compiler support in pure C.  You just
have to locate the iterator’s stack frame far enough away from its
caller’s stack frame that it won’t get overwritten!  Finkel’s
solution, IIRC, is to stack-allocate a big unused array in a
trampoline function, and then have the iterator eventually return to
its caller by longjmp(), avoiding the trashed return address of the
trampoline function (which probably points into the block).  The
iterator also uses longjmp() to invoke the block, and the block uses
longjmp() to jump back down the stack into the iterator each time it
finishes.

If the compiler can statically bound the stack depth of the block, you
can do slightly better.  Before calling the iterator, the caller
allocates enough stack space for the deepest possible call stack that
the block might invoke, and passes both the block’s code address and
the stack pointer to use to call the block with.  Then the iterator’s
invocation of the block looks like this:

        mov %esp, %eax          # save iterator’s stack pointer
        mov %ecx, %esp          # restore block’s stack pointer
        push %eax
        call *%edx              # invoke block
        pop %esp                # restore iterator’s stack pointer

With this approach, you need only pop the single-word block return
address from the stack before a nonlocal transfer of control out of
the block.
&lt;/pre&gt;</description>
    <dc:creator>Kragen Javier Sitaker</dc:creator>
    <dc:date>2012-03-27T11:13:13</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.culture.people.kragen.thinking">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.culture.people.kragen.thinking</link>
  </textinput>
</rdf:RDF>
