<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.comp.lang.smalltalk.exupery">
    <title>gmane.comp.lang.smalltalk.exupery</title>
    <link>http://blog.gmane.org/gmane.comp.lang.smalltalk.exupery</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/291"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/290"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/276"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/275"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/274"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/273"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/270"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/262"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/258"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/257"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/255"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/254"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/249"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/244"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/243"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/242"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/236"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/231"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/226"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/225"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/291">
    <title>Exupery method cache</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/291</link>
    <description>Hi,

I just fired up an image where I compiled a few methods. When I start
the image again, I can't compile because Exupery complains, that the
plugin is missing.

Is this how it's supposed to be or is there a bug in my vm-compilation/
image-compilation?
</description>
    <dc:creator>Markus Fritsche</dc:creator>
    <dc:date>2008-10-05T18:41:48</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/290">
    <title>An update on progress</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/290</link>
    <description>The three things that have been added since the last release are:
 * More x86 addressing modes
 * Generic primitives (calling out to the interpreter's primitives)
 * Eliot's closure bytecodes and Freetype support merged into the VM

More x86 addressing modes and generic primitives have helped out
with performance.

The addressing modes help out in send/return code which accesses
VM variables. Previously the address of the variable had to be loaded
into a register, then that used to access it. Now it can be done in
a single instruction.

Generic primitive support improves the number of cases Exupery
can increase performance. If a primitive is called now that Exupery
doesn't know how to compile, it can now dispatch to it via a PIC which
is much faster than running through the interpreters lookup code.

Eliot's closure bytecodes have been merged in because the Pharo
developers would like to start using them. I'd like to see proper
closure support. At the moment the VM will run closure code but
Exupery will not be able to compile it. That will take another VM
change, and an Exupery compiler change to free up the "unused" slot in
MethodContext.

The subpixel changes to bitblt have been merged into the VM, and
the exupery-dev project on the Universes modified to load the packages
required to build the FreeType plugin. I've got makefiles that work,
they're not ideal but I ran into autoconf/automake issues that
prevented using a better solution.


Also, I think I know what one of the remaining bugs is. When returning
into compiled code from interpreted code the compiled context is not
being marked as a root. Contexts are real objects, and need to keep
the garbage collector book-keeping. Any objects in old space must be
marked as roots if they point to new space.  To avoid write barrier
checks on context writes (all stack operations, and temporary
assignments), all contexts are marked as roots if they're in old space
when they're entered. Exupery's code does the marking in the return
bytecode because there are less returns than points to return to.
Every send, or potential send has an entry point, which is much more
than the number of return bytecodes.

Next, I'll investigate this potential bug then fix it if it's real.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-10-05T17:02:32</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/276">
    <title>ExuperyPlugin?</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/276</link>
    <description>Hello,

I'm trying to build a win32 exupery vm - but there is no ExuperyPlugin
in the monticello package. Is this intended?
</description>
    <dc:creator>Markus Fritsche</dc:creator>
    <dc:date>2008-10-04T21:50:08</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/275">
    <title>Planning for Exupery 0.15</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/275</link>
    <description>The two areas most in need of improvement before a 1.0 now are run
time performance and reliability. Hopefully 0.15 will lead to a decent
improvement in both. First runtime performance as the end of 0.14
involved a decent round of testing and debugging.

Here's some benchmarks:

  arithmaticLoopBenchmark  417 compiled  94 ratio: 4.436
  bytecodeBenchmark        725 compiled 262 ratio: 2.767
  sendBenchmark            692 compiled 403 ratio: 1.717
  doLoopsBenchmark         389 compiled 385 ratio: 1.010
  pointCreation            423 compiled 426 ratio: 0.993
  largeExplorers           198 compiled 199 ratio: 0.995
  compilerBenchmark        245 compiled 249 ratio: 0.984
  Cumulative Time          401 compiled 260 ratio 1.542

The primary goal is to improve the last two benchmarks, the
two macro benchmarks. Both benchmarks use a profiler to decide
what to compile, the goal is to compile enough methods to make
a difference reasonably quickly so the benchmark doesn't take
too long to run.

Here's the profile for compilerBenchmark:
CPU: Core 2, speed 3005.67 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
Counted INST_RETIRED.ANY_P events (number of instructions retired) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        samples  %        image name               app name                 symbol name
4122385  62.5654  4860169  58.3687  squeak                   squeak                   interpret
447635    6.7937  715498    8.5928  anon (tgid:6321 range:0xb1c91000-0xb7bf0000) squeak                   (no symbols)
224375    3.4053  412666    4.9560  squeak                   squeak                   exuperyCreateContext
157809    2.3951  269117    3.2320  squeak                   squeak                   exuperyIsNativeContext
126427    1.9188  230642    2.7699  squeak                   squeak                   allocateheaderSizeh1h2h3doFillwith
96316     1.4618  107350    1.2892  squeak                   squeak                   sweepPhase
87506     1.3281  47304     0.5681  squeak                   squeak                   lookupMethodInClass
53014     0.8046  71782     0.8621  squeak                   squeak                   markAndTrace
52262     0.7932  84112     1.0102  squeak                   squeak                   exuperySetupMessageSend
51999     0.7892  76301     0.9163  squeak                   squeak                   exuperyCallMethod
50920     0.7728  84568     1.0156  squeak                   squeak                   instantiateContextsizeInBytes
47231     0.7168  31047     0.3729  no-vmlinux               no-vmlinux               (no symbols)
42841     0.6502  52130     0.6261  MiscPrimitivePlugin      MiscPrimitivePlugin      primitiveStringHash
42560     0.6459  77447     0.9301  squeak                   squeak                   activateNewMethod

Only 14% of the time is going into code compiled by Exupery and it's
helper functions. 62% of the time is still in the main interpreter
loop. Interestingly the ratio between time in native code and time in
exuperyCreateContext is the same as the send benchmark so it's likely
that the native code is mostly in send processing. Either being called
from interpreted code or sending to compiled code.  The native code is
executing 1.6 instructions per cycle, the CPU maxes out at 4
instructions per cycle.1.6 instructions per cycle would be excellent
for an Athlon but Cores are more efficient, it's still good though.

Half of the time spent in exuperySetupMessage send is going to
dispatching to unhandled primitives, the other half will be
going to sends to interpreted code.

There's a few obvious things to do to improve performance:
 * Implement more addressing modes
 * Natively compile calls to C primitives
 * Implement the ^true, ^false, and ^ nil primitives
 * Remove jumps to jumps.

Implementing more addressing modes looks the most promising. It should
speed up most of the benchmarks as all but the bytecode benchmark
spend significant time in code that suffers badly from a single
missing addressing mode especially object creation code and
send/return code. 

The current send optimisation is PICs which only work when sending
from compiled code to compiled code. Sends to and from interpreted
code are about the same speed or a little slower than interpreted to
interpreted sends. It's true that this can be avoided by compiling
more methods so most sends are compiled to compiled but it's much
easier to decide what to compile if compiling anything is likely
to lead to a speed improvement and not risk a speed loss.

Compiling the call to the primitive function into native code will
allow the primitives to be dispatched via PIC instead of needing to go
through exuperySetupMessageSend. Half of the calls to
exuperySetupMessageSend in the compiler benchmark are for primitives,
in the large explorers benchmark three quarters of the calls are for
primitives. That time will disappear. Evaluating blocks uses a
primitive send which takes a large proportion of the block dispatch
time. 

There's a handful of primitives that are implemented inside the main
interpret loop. ^ true, ^ false, and ^ nil are some of them. They
often show up when they fail to inline as Exupery can not yet compile
them. If code uses them, then compiling it will cause a large time
loss due to using a full primitive dispatch compared with the
interpreter. Given how simple they are implementing them makes sense.

Exupery can create code that jumps directly to an unconditional
jump. This does happen in some inner loops. The jumps should be
modified to go to the target jump's destination. Jumping to a jump
makes the CPU's front end's life difficult. In the compiler benchmark
only for 9% of the time are the reservation stations full, which
indicates that for most of the time the front end can not keep up with
instruction execution.

Here's an example of the kind of code that's commonly generated with
addressing mode problems. This example is from the method return
sequence. Every compiled method goes through a block like this when
returning:
  (block24
    (mov #nilObj eax)
    (mov (eax) eax)
    (mov eax (8 ecx))
    (mov #activeContext eax)
    (mov ebx (eax))
    (mov #youngStart eax)
    (mov #activeContext ebx)
    (mov (ebx) ebx)
    (cmp (eax) ebx)
    (jumpUnsignedGreaterEqualThan block25)
    (mov #activeContext eax)
    (mov (eax) eax)
    (mov (eax) ebx)
    (mov 1073741824 eax)
    (and ebx eax)
    (jnz block25)
    (mov 2400 eax)
    (mov #rootTableCount ecx)
    (cmp eax (ecx))
    (jumpSignedGreaterEqualThan block26)
    (jmp block27)
   )

The problem is instructions like "(mov #nilObj eax)" the address
should be encoded in the memory access that uses it. There's no
need to move an address into a register before using it. There's
other problems besides not handling literal indirect addressing but
the literal indirect problem is the largest.by a long shot.

I'm going to add literal indirect addressing first as it's harder to
estimate what it'll do to overall performance but it is a problem for
almost all the benchmarks.

It would also be worthwhile improving the profiling tools. It should
be relatively easy to get oprofile to show the compiled method names
instead of lumping all compiled code into the "anon" memory bucket.
It would also be worthwhile and easy to write some code to read the
oprofile files and compute the ratios rather than calculate them by
hand. 

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-08-08T21:38:57</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/274">
    <title>[ANN] Exupery 0.14 is released</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/274</link>
    <description>
The major improvement is to the speed of register allocation. I've
fixed a couple of major performance problems so register allocation
takes 50% of compilation time even for the largest methods. Register
allocation now appears to take roughly linear time.

There's still plenty of room to improve compilation time. Every stage
copies the entire intermediate graph to produce the input to the next
stage which is redundant, most stages only change a few places. The
register allocator's liveness analyser still uses Sets to represent
which variables are live rather than bit vectors.

This release can compile cascades, the last missing core language
feature. Exupery still can only compile a handful of the core
primitives, it only compiles #at: for pointer objects. Cascades were
added because they were used more in 3.10, the choice was either
delete 2 system tests or add cascades. I delayed the release to add
cascades.

There's a few bug fixes of old bugs but this release is not noticeably
more reliable than the previous release. It should be a bit more
reliable though, especially when running the new Exupery VM. The new
Exupery VM just has a single bug fix in it.

Here's the benchmarks:
  arithmaticLoopBenchmark 414 compiled  94 ratio: 4.404
  bytecodeBenchmark       726 compiled 264 ratio: 2.750
  sendBenchmark           707 compiled 454 ratio: 1.557
  doLoopsBenchmark        388 compiled 398 ratio: 0.975
  pointCreation           433 compiled 423 ratio: 1.024
  largeExplorers          257 compiled 258 ratio: 0.996
  compilerBenchmark       248 compiled 249 ratio: 0.996 
  Cumulative Time         419 compiled 275 ratio  1.519

  ExuperyBenchmarks&gt;&gt;arithmeticLoop              105ms
  SmallInteger&gt;&gt;benchmark                        362ms
  InstructionStream&gt;&gt;interpretExtension:in:for: 6051ms
  Average 612.691

The key benchmark for this release is the last one, compiling
interpretExtension:in:for: which now only takes 6 seconds. With
previous version of the compiler it used to take over 2 minutes
to compile.

The change in times in the other benchmarks are mostly due to
me upgrading from an Athlon 64 2.2GHz to a Core 2 3.0GHz. The
register allocator should be slightly more efficient especially
when compiling send heavy code.

Reliability and compile time performance has dominated the
last few releases. Now run time performance and reliability are
the biggest issues. Exupery still can crash after about an hour's
active use depending on what's being done.

Bryce

P.S. I don't think it'll be that useful to have VMs for all
platforms. The old VM 0.12 should be fine for messing around.
The new VM would be useful if anyone felt like trying to capture
bugs though.  

If anyone want's to rebuild the VM for non-linux platforms, I'll delay
announcing the release for a few days. I doubt it's worth the effort
of re-creating a VM build environment though. I suspect that the major
remaining bug has to do with de-compilation of contexts and may
require VM changes to fix.
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-07-27T21:25:58</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/273">
    <title>Exupery update preparing for the 0.14 release</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/273</link>
    <description>
The Exupery 0.14 release is in final testing now. The major
improvement is to register allocator performance. The register
allocator should also produce slightly better code due to the
changes. There's also now support for cascades, the last major missing
language feature.

The main gain in this release is register allocation is much faster on
large methods. Now it takes about 50% of the compilation time for all
methods, it used to take almost all the time for large methods. This
means compilation time is now roughly linear with method size.

The major improvements for 0.14 were done by mid April. The release
has been delayed due to upgrading to 3.10. Exupery's testing uncovered
two bugs in 3.10 and four tests were failing due to changes in the
base image.

Two tests have been deleted as they were trying to compile methods
that have been removed. Those methods were involved in bugs found by
the stress test. There should be unit tests that cover any changes to
the compiler required to fix them.

Two tests were failing because of cascades. Cascades have been added
so they pass again. The other option was to either delete the tests or
release with failing tests on 3.10.

All that's left to do before releasing is run the stress test again
and do some Seaside development with Exupery running. Both were done
successfully before upgrading to the latest squeak-dev 3.10 images.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-06-15T19:59:50</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/270">
    <title>Problem with ColouringRegisterAllocator</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/270</link>
    <description>in method addInterferenceEdge: firstRegister to: secondRegister

is stumbled upon error: key not found when it does:

secondNode := interferenceGraph at: secondRegister.

an interferenceGraph keys  = a Set(t9 t11 t14 t10 t2 t24 t22 t19 t25
t3 t8 t15 eax t20 t21 t16 t11 edx t17 t6 t3 t26 t12 t2 t4 t21 ecx t16
t12 t9 t23 t1 t5 t20 t13 t15 t14 t5 t27 t1 t6 t19 eax t13 t4 t17 t18
t28 t7 t8 t18 t10 t7)

and secondRegister = eax.

It looks like MedMachineRegister needs #= and #hash methods to make
things working correctly with dictionaries.

I'll try add these methods, lets see if issue will disappear.

Btw, i don't know why or where it creates another instance of same
register (MedMachineRegister name: #eax), in my code i using:
'machine registerNamed: #eax', which should create instance only once,
and then return same instance for consequent calls.

</description>
    <dc:creator>Igor Stasenko</dc:creator>
    <dc:date>2008-06-01T18:25:06</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/262">
    <title>Register spilling mechanism</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/262</link>
    <description>I'm currently want to make an inlining for methods of specific format
(lets call them 'native' methods),
these methods should work as ST methods (same calling convention &amp;
stack layout when you entering method), but
these methods don't do polymorphic sends unless specified exclusively,
instead all sends are inlined statically with compiler.
I don't want to describe much details here, just want you to help me
how to deal with register spilling when inlining.

Suppose i inlining a send like:

object message: a+b with: c+d with: a+d with: ....

so, all arguments to method now should use machine temporary registers
instead of stack, because it's more effective.

What i fear of that if register allocator sees that there is not
enough registers in CPU to fit them all it starts using stack for
storing temp values
and there is some situations where i need to know exact stack depth
for a method.
Is there any way in Exupery to get know, how many registers will be
spilled on stack, so this information can be used by my compiler?


</description>
    <dc:creator>Igor Stasenko</dc:creator>
    <dc:date>2008-04-22T12:44:08</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/258">
    <title>Debugging a bug that wasn't.</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/258</link>
    <description>
I'm testing and debugging before the 0.14 release. The major benefit
is faster compile times and better generated code. There's around a
10% gain or loss for the macro benchmarks depending on what's
compiled.

The bug was triggered by the below code. Originally it was compiling
"^[42] value". The set-up shown below is what generates the fault
without needing to compile the code. The fault was odd because it
wasn't crashing, and wasn't stuck in an infinite loop but the image
had locked up. 

I was investigating by interrupting execution with gdb and looking at
both the Smalltalk stack using "p printCallStack()" and the C stack
using "where". Normally, if compiled code was corrupting the object
memory it would crash fairly quickly. That it kept executing, which
could be seen because the Smalltalk stack kept changing was weird.  If
the GC's got a corrupt view of the object memory then normally it'll
end up really corrupting it when it decides to try to interpret the
middle of an object as an object header.

The "Error signal." at the front of the test is because running the
test will lock up the image. This allows the rest of the suite to
be run.

    testInterruptCausesCrashes
"Crashing is a failure"
| processes |
Error signal.
self compileExpression: 'self createPoint'.
processes := (1 to: 10) collect: 
    [:each| [[(ExuperyProfiler
               spyOn: [(Delay forMilliseconds: 500) wait])
         queueCompilationCommandsOn: SharedQueue2 new] repeat] fork.
"Force a garbage collect on every allocation"
SmalltalkImage current vmParameterAt: 5 put: 1.
10000 timesRepeat: [self example].
        Exupery log: 'after running example'.
"Reset garbage collects to a sensible value"
SmalltalkImage current vmParameterAt: 5 put: 4000.
processes do: [:each| each terminate].

    createPoint
"^ 10 &lt; at &gt; 20"

What I think is happening is the 10 profiling processes end up
consuming all the time and starving the user level process so it
never completes and never resets the garbage collector's collection
rate to a sane value. Having produced the lockup twice without any
compiled code, I'm reasonably happy that this is the case.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-04-20T09:38:45</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/257">
    <title>Deciding how to fix a bug.</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/257</link>
    <description>
I've started the final release testing for Exupery 0.14. There are two
bugs captured with tests so far. The first has something to do with
interrupts, it needs a few running profilers to cause the bug to
happen. I'm hoping this one is the source of the unrepeatable bugs
that were in the the previous release, if so solving it may allow
Exupery to run for much longer as a background compiler.

The second bug is caused by the current work. What's happening is the
currentContext register is being spilled but is being used by code
generated in the register allocator to load the stack. It may also be
needed by spill code to spill back into the stack.

The simplest fix is to avoid spilling the currentContext. I doubt that
spilling the currentContext is a good idea as it'll make spills of
stack registers much more complex. Not spilling currentContext is
probably the right thing to do unless the spill heuristics are
improved to avoid spilling currentContext then spilling many stack
registers that will use it to load themselves from the stack.

A theoretically better fix would be to get the register allocator to
reload the currentContext if required when generating spill code that
needs it. This is nicer as it doesn't require making another register
special. There are a few practical problems to make sure that this
doesn't lead to very bad code when we decide to spill currentContext
then spill other registers. Also, the only gain to handling
currentContext as a real register is allowing it to be spilled when
appropriate which could be very nice if it was a special register that
would be loaded from it's real location rather than spilled from the
stack.

My feeling is the right solution for now is to avoid spilling
currentContext and leave the better solution until a time when it's
appropriate to develop a more complete solution including both proper
spill heuristics and reloading spilled registers from the
interpreter's variables.

The background compiler compiled over 1200 methods before running into
one where the currentContext was spilled. So at worst, forcing it into
a register will only make 1 method in a thousand worse, and may make
it better.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-04-12T21:40:47</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/255">
    <title>Floating point support</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/255</link>
    <description>Hello Bryce,

i'd like to ask you, how do you envision adding floating point support
to Exupery
to make it able to use FPU.

The problem i see, is that it would require adding a type to registers
to be able to use float-type values in intermediate code
representation.

Or else, how it would look like, when i need to encode floating point
operations like:

temp1 := floatValue1.
temp2 := floatValue2.
temp3 := temp1 + temp2. "here compiler should generate FPU
instructions instead of integer addition"

Or maybe i'm looking at problem at wrong angle?
Maybe easier to make intermediate in format like:

unaryFloatOp(argmunentAddress, resultAddress)
binaryFloatOp(argmunent1Address, argument2Address, resultAddress)
compareFloatsOp(arg1address, arg2address)

then compiler don't have to deal with float registers (in register
spilling code and other optimization patterns).

Another thing, is support of byte-wide operations, like
loading/storing byte at specific address. And of course being able to
do some operations with byte-sized values.
How do you plan to support this?

The reason, why i'm asking about this, is that i'm currently busy with
this: http://wiki.squeak.org/squeak/6041

As you maybe remember i had plans to do such system before. And now i
spent some time to push it to the point, where it can become a reality
:)

</description>
    <dc:creator>Igor Stasenko</dc:creator>
    <dc:date>2008-03-16T13:05:06</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/254">
    <title>The next step towards Exupery 0.14</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/254</link>
    <description>
The next step after improving the spill heuristics is likely
to be one of the following:
 * Spilling context registers directly to the context
 * Adding a few more primitives (String&gt;&gt;at: and return true, false, or
   nil)
 * More work on speeding up compilation
 * More bug chasing.

There's two reasons to want to spill context registers directly
into the context rather than onto the C stack. First it'll reduce
memory traffic for sends with many arguments or a deep stack. At
the moment the entire stack is copied from the context into registers
when the method is entered (and possibly spilled on the C stack)
then copied back onto the context on exit. For send heavy code this
can lead to code that reads the stack into registers to immediately 
write it back out again. This is especially wasteful if that section
of code doesn't use any of the stack variables copied. The second
reason is to allow temporaries and arguments to be held in registers,
if they're moved into registers then it's likely there will be more
spilling as the x86 only has 6 registers available for general use.

The compiler benchmark can try to inline String&gt;&gt;at: and the large
explorers benchmark can try to inline return true and false. Not
having these primitives may be causing Exupery to be slower for some
parts of both benchmarks than the interpreter. These primitives are
inlined into the main interpretation function so are especially
efficient to interpret.

Compilation of large methods has been sped up significantly but medium
and small methods have only smaller gains. Further performance gains
would be nice, if only to make it faster to run larger acceptance
tests.

Exupery is currently reliable just enough to develop with. I saw
a crash after a few hours of Seaside development. This is barely
good enough as it's easy to recover using the changes file. I don't
know how to reproduce that crash so haven't tried to debug it.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-01-08T20:02:26</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/249">
    <title>Compiler heuristics</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/249</link>
    <description>It's a theoretical question, rather than practical concerning Exupery,
but i'd like to know, what you know/thinking about special compiler
heuristics concerning simplifying expressions, when do inlining.

The problem, in basic can be demonstrated by following example.

Suppose you having a code:

mainMethod
   object = 0 ifTrue: [ object method1 ] ifFalse: [ object method2]

and now method bodies:

method1
   object = 0 ifTrue: [ do something] ifFalse: [ do something else ]

method2
   object ~= 0 ifTrue: [ do something] ifFalse: [ do something else ]

Now, the question is, when you inlining method1/2 into mainMethod
you can see, that tests in inlined methods become redundant.
If we inline both methods, code will look like:

mainMethod
   object = 0 ifTrue: [
       object = 0 ifTrue: [ do something]
       ifFalse: [ do something else ]  ]
   ifFalse: [
       object ~= 0 ifTrue: [ do something]
       ifFalse: [ do something else ] ]

Is there any techniques, which can help in removing such redundancy,
when analyzing code by compiler?

</description>
    <dc:creator>Igor Stasenko</dc:creator>
    <dc:date>2007-12-25T03:11:00</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/244">
    <title>Thinking about Exupery 0.14</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/244</link>
    <description>
The primary goal for the next releases will be making the following
benchmarks more compelling. I've added a compile time benchmark as
there are a few performance bugs in the compiler that should be
removed.

  arithmaticLoopBenchmark 1396 compiled  128 ratio: 10.906
  bytecodeBenchmark       2111 compiled  460 ratio:  4.589
  sendBenchmark           1637 compiled  668 ratio:  2.451
  doLoopsBenchmark        1081 compiled  715 ratio:  1.512
  pointCreation           1245 compiled 1317 ratio:  0.945
  largeExplorers           728 compiled  715 ratio:  1.018
  compilerBenchmark        483 compiled  489 ratio:  0.988
  Cumulative Time         1125 compiled  537 ratio   2.093

  ExuperyBenchmarks&gt;&gt;arithmeticLoop                249ms
  SmallInteger&gt;&gt;benchmark                         1112ms
  InstructionStream&gt;&gt;interpretExtension:in:for: 113460ms
  Average                                         3155.360

First, I'll get the register allocator to allocate each section of
method separately. After that, I'll probably do some work on further
optimising the register allocator but I might work on improving the
generated native code.

Register allocating each section separately will both allow for better
and faster allocation. It will make it easy to avoid dealing with
registers and interference from other sections of the code and will
reduce the size of the problem. Colouring register allocation written
well should be on average n log n time but the performance bugs will
raise that to probably n^2.

It's possible that just allocating each section of the method
separately will be enough to bring allocation down to a reasonable
time. It should definitely help for the larger methods but is unlikely
to do anything for the arithmaticLoop and will only help the bytecode
benchmark slightly. Compiling quicker will make it easier to run more
extensive tests.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2007-12-17T21:50:31</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/243">
    <title>[ANN] Exupery 0.13 Release</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/243</link>
    <description>
Exupery 0.13 is released. This release is available from both the
Universes and SqueakMap. A prebuilt image and the 0.12 VMs can be
found from here:

      http://wiki.squeak.org/squeak/3945

This version has a few more bug fixes and some support to run
Seaside. This release is reliable enough to use Seaside with.
There's likely to still be bugs but it is reliable enough to
play with or load during development.

Either use Seaside 2.9 or the wbk version of Seaside 2.8. There's a
minor change to continuations to make it easier to safely serialise
them.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2007-12-12T21:03:34</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/242">
    <title>Preparing for Exupery 0.13</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/242</link>
    <description>
Exupery now runs Seaside for over a few hours without crashing.  It's
time to release again. (1) I had Exupery and Seaside running for 12
hours on Sunday though only about 2-3 hours were using it solidly.

This release will not need new VMs as there's been no VM changes.

The next release will focus on speeding up compilation. The first
step will be getting the register allocator to allocate each section
of a method separately. Methods are often divided into many sections
that are not connected as each message send exits the method then
re-enters it. Register allocating each section separately should speed
up register allocation, it will make allocation more accurate as it
will ignore register interference in other sections, and it will make
it easier to avoid loading context variables (including the stack)
that are not used in that section.

After register allocating each section separately it's debatable
whether it will be better to continue optimising compilation speed as
a few methods take several minutes to compile or to focus on speeding
up the compiled code by avoiding loading unused stack variables and
moving temporaries and arguments into registers. Making compilation
faster seems like the best option because it will make testing faster
which will speed up development and bug fixing.

Bryce

(1) You'll need to either use my version of Seaside 2.8 or Seaside
2.9. There's a tiny change required to continuations to allow Exupery
to easily deoptimise contexts before they're serialised. 
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2007-12-10T21:08:04</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/236">
    <title>Speedup from Exupery for SqueakElib</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/236</link>
    <description>Hi All!

I have reached step 2.4 in my SqueakElib plan documented here: 
http://wiki.squeak.org/squeak/6011, but I don't know anything about Exupery. 
I suppose my first question is whether I am right in assuming that I will 
see speedups if I incorporate Exupery?

My next question is related to the work I did in steps 1.3, 1.4, and 1.5.  I 
modified the VM to use larger contexts (80+7), to add extra long jump 
bytecodes for jump:, jumpIfTrue: and jumpIfFalse:, and to add bytecodes for 
doing receiver class tests.  Will these additions cause problems with 
Exupery?

If all is ok so far, I want to build my modified Windows VM with Exupery, so 
I am reading http://wiki.squeak.org/squeak/5904.  I am confused as to which 
VMMaker I can use to build.  I currently have VMMaker-tpr.58.mcz loaded. 
Can you confirm that I need to either overwrite this VMMaker with the one 
from the Exupery repository or that I need to build fresh with the one from 
the Exupery repository?   Also, where is the VMMaker from the Exupery 
repository - where is the Exupery repository?  Finally, what is the best 
Subversion client for Windows and how do I connect to this Exupery 
repository?

Thanks and Cheers,
Rob 
</description>
    <dc:creator>Rob Withers</dc:creator>
    <dc:date>2007-11-17T20:14:56</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/231">
    <title>Exupery 0.12 release progress</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/231</link>
    <description>
We've a Windows VM thanks to Andy.

There's an Linux VM and a pre-built Exupery development image.
There's also entries in SqueakMap and Package Universes.

Links to the pre-build items are here:
http://wiki.squeak.org/squeak/3945

The Package Universes Exupery Development package doesn't fully
work. To use it load CommandShell from SqueakMap after loading
it. This is because GraphViz needs the latest CommandShell
which hasn't yet made it to universes. I've emailed Lex to get
this sorted.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2007-10-28T22:05:40</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/226">
    <title>Planning Exupery 0.13</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/226</link>
    <description>
My current plan for 0.13 is:
  * Get background compiler to ignore methods that failed to compile
  * Don't flush the code cache on every class change
  * Fix more bugs
  * Add some instrumentation to get quantitative information on time
    spent compiling and the percentage of time spent in compiled code.

The goal is to make the background compiler practical. It will still
be limited due to missing features slowing down compiled code and slow
compile times. A test would be to run the background compiler until
Exupery was compiled then to start playing with tinyBenchmarks and to
see the improvement as it compiles the methods and inlines the
primitives.

Currently the background compiler doesn't avoid methods that have
failed to compile once. This is a problem as the compilation queue
could get full of methods that don't compile blocking compilation of
methods that do compile.

Flushing the code cache on every class change is overkill. It also
reduces the value of some of the acceptance testing as it can't
effectively test compiling tests that change classes and several do.

It would be nice if Exupery reported the percentage of time spent in
compiled code to gauge how well the background compiler was doing. I
can measure this using oprofile but not in Squeak. It would also be
nice if Exupery recorded how long a method took to compile to gauge
how much of a problem compilation speed currently is. It would be nice
if Exupery recorded why methods failed to compile so it was easy to
figure out how important adding the remaining missing features is.


The other possibility for 0.13 would be to optimise compilation time.
There are a few methods that take minutes to compile due to known
performance bugs in the register allocator. Increasing the speed of
the compiler will both make compilation more practical and make it
cheaper to compile more code in tests.

The third major thing required before a 1.0 would be to implement
cascades and a few more bytecodes. This should allow Exupery to
optimise more code successfully. This is aiming to optimise run time
performance. I'm not considering this for 0.13 but is needed before
1.0.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2007-10-25T21:11:12</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/225">
    <title>Preparing for Exupery 0.12</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/225</link>
    <description>
I've decided to release what I've got. There's more than enough in
this release to justify releasing. The background compiler now runs
for about a hour. It'll still need some work before it is practical,
that's probably next.

This release
  * Has a background compiler that can be run for about a hour
  * Has many bug fixes
  * Merges in David and John's 32 bit clean work and makes Exupery
    32 bit clean.

Andy, when can you build a Window's VM?

Mathieu is planning on building a Mac VM.

To use the background compiler:

First initialise Exupery to prepare the code cache. This will
also clear out any currently compiled code.

   Exupery initialiseExupery

Then start the background compiler using the following expression.

   Exupery startBackgroundCompilation.

Then when done use the following to stop it.
   
   Exupery stopBackgroundCompilation.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2007-10-25T20:24:35</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/221">
    <title>An update after ESUG.</title>
    <link>http://comments.gmane.org/gmane.comp.lang.smalltalk.exupery/221</link>
    <description>
ESUG saw a lot of progress for Exupery. There is now a preliminary
Mac port and 2 bugs were fixed. There is a Mac VM available here:

 http://wiki.squeak.org/squeak/3945

This version is compiled without optimisation. It looks like there's a
bug in GCC 4.0.1 which is preventing sends from working in Exupery.
Exupery works fine with gcc 3.3.3 and gcc 4.1.2. Hopefully there will
be a more polished Mac VM soon compiled with optimisations. Thanks to
Matthieu for building this.

Two bugs were fixed. The major bug was due Exupery overflowing
the root table, this was the one that was causing the crashes that
I was debugging. There was a minor bug that was due to failing to
mark Exupery's interpreter integration objects which as far as I'm
aware has never caused a crash, it's fixed too.

I'm starting to think about the next release seriously as I'm
hopeful that it's only one bug fix away. The pre-release tasks
are:
  * Fix the current bug
  * Test that Exupery can compile in the background for an hour
  * Check that the stress test still runs
  * Get Exupery installing from Package Universes
  * Build an Exupery development package on Package Universes
  * Integrate the 2G VM fixes from Dave and John.

I'd like to make it easier to build Exupery development images,
Exupery's development tools use more packages than Exupery needs to
run. The two things that need to be done is automating the build and
making sure that the catalogue has up to date pointers to to
packages. Some packages still point to the old minnow location for the
Squeak swiki.

I'd also like to integrate the 2G unsigned pointer comparison fixes
as they were causing crashes on my laptop after fixing the root table
bugs. At the time, I thought it was image corruption which is
plausible after debugging a GC related bug.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2007-09-04T08:45:17</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.comp.lang.smalltalk.exupery">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.comp.lang.smalltalk.exupery</link>
  </textinput>
</rdf:RDF>
