<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.comp.lang.smalltalk.exupery">
    <title>gmane.comp.lang.smalltalk.exupery</title>
    <link>http://blog.gmane.org/gmane.comp.lang.smalltalk.exupery</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/275"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/274"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/273"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/272"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/271"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/270"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/269"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/268"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/267"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/266"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/265"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/264"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/263"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/262"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/261"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/260"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/259"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/258"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/257"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/256"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/275">
    <title>Planning for Exupery 0.15</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/275</link>
    <description>The two areas most in need of improvement before a 1.0 now are run
time performance and reliability. Hopefully 0.15 will lead to a decent
improvement in both. First runtime performance as the end of 0.14
involved a decent round of testing and debugging.

Here's some benchmarks:

  arithmaticLoopBenchmark  417 compiled  94 ratio: 4.436
  bytecodeBenchmark        725 compiled 262 ratio: 2.767
  sendBenchmark            692 compiled 403 ratio: 1.717
  doLoopsBenchmark         389 compiled 385 ratio: 1.010
  pointCreation            423 compiled 426 ratio: 0.993
  largeExplorers           198 compiled 199 ratio: 0.995
  compilerBenchmark        245 compiled 249 ratio: 0.984
  Cumulative Time          401 compiled 260 ratio 1.542

The primary goal is to improve the last two benchmarks, the
two macro benchmarks. Both benchmarks use a profiler to decide
what to compile, the goal is to compile enough methods to make
a difference reasonably quickly so the benchmark doesn't take
too long to run.

Here's the profile for compilerBenchmark:
CPU: Core 2, speed 3005.67 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
Counted INST_RETIRED.ANY_P events (number of instructions retired) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        samples  %        image name               app name                 symbol name
4122385  62.5654  4860169  58.3687  squeak                   squeak                   interpret
447635    6.7937  715498    8.5928  anon (tgid:6321 range:0xb1c91000-0xb7bf0000) squeak                   (no symbols)
224375    3.4053  412666    4.9560  squeak                   squeak                   exuperyCreateContext
157809    2.3951  269117    3.2320  squeak                   squeak                   exuperyIsNativeContext
126427    1.9188  230642    2.7699  squeak                   squeak                   allocateheaderSizeh1h2h3doFillwith
96316     1.4618  107350    1.2892  squeak                   squeak                   sweepPhase
87506     1.3281  47304     0.5681  squeak                   squeak                   lookupMethodInClass
53014     0.8046  71782     0.8621  squeak                   squeak                   markAndTrace
52262     0.7932  84112     1.0102  squeak                   squeak                   exuperySetupMessageSend
51999     0.7892  76301     0.9163  squeak                   squeak                   exuperyCallMethod
50920     0.7728  84568     1.0156  squeak                   squeak                   instantiateContextsizeInBytes
47231     0.7168  31047     0.3729  no-vmlinux               no-vmlinux               (no symbols)
42841     0.6502  52130     0.6261  MiscPrimitivePlugin      MiscPrimitivePlugin      primitiveStringHash
42560     0.6459  77447     0.9301  squeak                   squeak                   activateNewMethod

Only 14% of the time is going into code compiled by Exupery and it's
helper functions. 62% of the time is still in the main interpreter
loop. Interestingly the ratio between time in native code and time in
exuperyCreateContext is the same as the send benchmark so it's likely
that the native code is mostly in send processing. Either being called
from interpreted code or sending to compiled code.  The native code is
executing 1.6 instructions per cycle, the CPU maxes out at 4
instructions per cycle.1.6 instructions per cycle would be excellent
for an Athlon but Cores are more efficient, it's still good though.

Half of the time spent in exuperySetupMessage send is going to
dispatching to unhandled primitives, the other half will be
going to sends to interpreted code.

There's a few obvious things to do to improve performance:
 * Implement more addressing modes
 * Natively compile calls to C primitives
 * Implement the ^true, ^false, and ^ nil primitives
 * Remove jumps to jumps.

Implementing more addressing modes looks the most promising. It should
speed up most of the benchmarks as all but the bytecode benchmark
spend significant time in code that suffers badly from a single
missing addressing mode especially object creation code and
send/return code. 

The current send optimisation is PICs which only work when sending
from compiled code to compiled code. Sends to and from interpreted
code are about the same speed or a little slower than interpreted to
interpreted sends. It's true that this can be avoided by compiling
more methods so most sends are compiled to compiled but it's much
easier to decide what to compile if compiling anything is likely
to lead to a speed improvement and not risk a speed loss.

Compiling the call to the primitive function into native code will
allow the primitives to be dispatched via PIC instead of needing to go
through exuperySetupMessageSend. Half of the calls to
exuperySetupMessageSend in the compiler benchmark are for primitives,
in the large explorers benchmark three quarters of the calls are for
primitives. That time will disappear. Evaluating blocks uses a
primitive send which takes a large proportion of the block dispatch
time. 

There's a handful of primitives that are implemented inside the main
interpret loop. ^ true, ^ false, and ^ nil are some of them. They
often show up when they fail to inline as Exupery can not yet compile
them. If code uses them, then compiling it will cause a large time
loss due to using a full primitive dispatch compared with the
interpreter. Given how simple they are implementing them makes sense.

Exupery can create code that jumps directly to an unconditional
jump. This does happen in some inner loops. The jumps should be
modified to go to the target jump's destination. Jumping to a jump
makes the CPU's front end's life difficult. In the compiler benchmark
only for 9% of the time are the reservation stations full, which
indicates that for most of the time the front end can not keep up with
instruction execution.

Here's an example of the kind of code that's commonly generated with
addressing mode problems. This example is from the method return
sequence. Every compiled method goes through a block like this when
returning:
  (block24
    (mov #nilObj eax)
    (mov (eax) eax)
    (mov eax (8 ecx))
    (mov #activeContext eax)
    (mov ebx (eax))
    (mov #youngStart eax)
    (mov #activeContext ebx)
    (mov (ebx) ebx)
    (cmp (eax) ebx)
    (jumpUnsignedGreaterEqualThan block25)
    (mov #activeContext eax)
    (mov (eax) eax)
    (mov (eax) ebx)
    (mov 1073741824 eax)
    (and ebx eax)
    (jnz block25)
    (mov 2400 eax)
    (mov #rootTableCount ecx)
    (cmp eax (ecx))
    (jumpSignedGreaterEqualThan block26)
    (jmp block27)
   )

The problem is instructions like "(mov #nilObj eax)" the address
should be encoded in the memory access that uses it. There's no
need to move an address into a register before using it. There's
other problems besides not handling literal indirect addressing but
the literal indirect problem is the largest.by a long shot.

I'm going to add literal indirect addressing first as it's harder to
estimate what it'll do to overall performance but it is a problem for
almost all the benchmarks.

It would also be worthwhile improving the profiling tools. It should
be relatively easy to get oprofile to show the compiled method names
instead of lumping all compiled code into the "anon" memory bucket.
It would also be worthwhile and easy to write some code to read the
oprofile files and compute the ratios rather than calculate them by
hand. 

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-08-08T21:38:57</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/274">
    <title>[ANN] Exupery 0.14 is released</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/274</link>
    <description>
The major improvement is to the speed of register allocation. I've
fixed a couple of major performance problems so register allocation
takes 50% of compilation time even for the largest methods. Register
allocation now appears to take roughly linear time.

There's still plenty of room to improve compilation time. Every stage
copies the entire intermediate graph to produce the input to the next
stage which is redundant, most stages only change a few places. The
register allocator's liveness analyser still uses Sets to represent
which variables are live rather than bit vectors.

This release can compile cascades, the last missing core language
feature. Exupery still can only compile a handful of the core
primitives, it only compiles #at: for pointer objects. Cascades were
added because they were used more in 3.10, the choice was either
delete 2 system tests or add cascades. I delayed the release to add
cascades.

There's a few bug fixes of old bugs but this release is not noticeably
more reliable than the previous release. It should be a bit more
reliable though, especially when running the new Exupery VM. The new
Exupery VM just has a single bug fix in it.

Here's the benchmarks:
  arithmaticLoopBenchmark 414 compiled  94 ratio: 4.404
  bytecodeBenchmark       726 compiled 264 ratio: 2.750
  sendBenchmark           707 compiled 454 ratio: 1.557
  doLoopsBenchmark        388 compiled 398 ratio: 0.975
  pointCreation           433 compiled 423 ratio: 1.024
  largeExplorers          257 compiled 258 ratio: 0.996
  compilerBenchmark       248 compiled 249 ratio: 0.996 
  Cumulative Time         419 compiled 275 ratio  1.519

  ExuperyBenchmarks&gt;&gt;arithmeticLoop              105ms
  SmallInteger&gt;&gt;benchmark                        362ms
  InstructionStream&gt;&gt;interpretExtension:in:for: 6051ms
  Average 612.691

The key benchmark for this release is the last one, compiling
interpretExtension:in:for: which now only takes 6 seconds. With
previous version of the compiler it used to take over 2 minutes
to compile.

The change in times in the other benchmarks are mostly due to
me upgrading from an Athlon 64 2.2GHz to a Core 2 3.0GHz. The
register allocator should be slightly more efficient especially
when compiling send heavy code.

Reliability and compile time performance has dominated the
last few releases. Now run time performance and reliability are
the biggest issues. Exupery still can crash after about an hour's
active use depending on what's being done.

Bryce

P.S. I don't think it'll be that useful to have VMs for all
platforms. The old VM 0.12 should be fine for messing around.
The new VM would be useful if anyone felt like trying to capture
bugs though.  

If anyone want's to rebuild the VM for non-linux platforms, I'll delay
announcing the release for a few days. I doubt it's worth the effort
of re-creating a VM build environment though. I suspect that the major
remaining bug has to do with de-compilation of contexts and may
require VM changes to fix.
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-07-27T21:25:58</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/273">
    <title>Exupery update preparing for the 0.14 release</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/273</link>
    <description>
The Exupery 0.14 release is in final testing now. The major
improvement is to register allocator performance. The register
allocator should also produce slightly better code due to the
changes. There's also now support for cascades, the last major missing
language feature.

The main gain in this release is register allocation is much faster on
large methods. Now it takes about 50% of the compilation time for all
methods, it used to take almost all the time for large methods. This
means compilation time is now roughly linear with method size.

The major improvements for 0.14 were done by mid April. The release
has been delayed due to upgrading to 3.10. Exupery's testing uncovered
two bugs in 3.10 and four tests were failing due to changes in the
base image.

Two tests have been deleted as they were trying to compile methods
that have been removed. Those methods were involved in bugs found by
the stress test. There should be unit tests that cover any changes to
the compiler required to fix them.

Two tests were failing because of cascades. Cascades have been added
so they pass again. The other option was to either delete the tests or
release with failing tests on 3.10.

All that's left to do before releasing is run the stress test again
and do some Seaside development with Exupery running. Both were done
successfully before upgrading to the latest squeak-dev 3.10 images.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-06-15T19:59:50</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/272">
    <title>Re: Problem with ColouringRegisterAllocator</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/272</link>
    <description>The bug was in my code

was

machine
^ machine ifNil: [ MachineX86 new]

instead of:

machine
^ machine ifNil: [ machine := MachineX86 new]

2008/6/2  &lt;bryce&lt; at &gt;kampjes.demon.co.uk&gt;:



</description>
    <dc:creator>Igor Stasenko</dc:creator>
    <dc:date>2008-06-05T09:32:57</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/271">
    <title>Problem with ColouringRegisterAllocator</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/271</link>
    <description> &gt; in method addInterferenceEdge: firstRegister to: secondRegister
 &gt; 
 &gt; is stumbled upon error: key not found when it does:
 &gt; 
 &gt; secondNode := interferenceGraph at: secondRegister.
 &gt; 
 &gt; an interferenceGraph keys  = a Set(t9 t11 t14 t10 t2 t24 t22 t19 t25
 &gt; t3 t8 t15 eax t20 t21 t16 t11 edx t17 t6 t3 t26 t12 t2 t4 t21 ecx t16
 &gt; t12 t9 t23 t1 t5 t20 t13 t15 t14 t5 t27 t1 t6 t19 eax t13 t4 t17 t18
 &gt; t28 t7 t8 t18 t10 t7)
 &gt; 
 &gt; and secondRegister = eax.
 &gt; 
 &gt; It looks like MedMachineRegister needs #= and #hash methods to make
 &gt; things working correctly with dictionaries.
 &gt; 
 &gt; I'll try add these methods, lets see if issue will disappear.

There should only ever be one instance of each register. The bug is
creating the second one. That said, you may find it easier to
impleemnt = and hash than track it down. Using identity is a
deliberate design decision.

Having multiple versions of the same item may cause problems. I
think you may be OK with duplicate registers but am not sure.

 &gt; Btw, i don't know why or where it creates another instance of same
 &gt; register (MedMachineRegister name: #eax), in my code i using:
 &gt; 'machine registerNamed: #eax', which should create instance only once,
 &gt; and then return same instance for consequent calls.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-06-02T20:11:34</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/270">
    <title>Problem with ColouringRegisterAllocator</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/270</link>
    <description>in method addInterferenceEdge: firstRegister to: secondRegister

is stumbled upon error: key not found when it does:

secondNode := interferenceGraph at: secondRegister.

an interferenceGraph keys  = a Set(t9 t11 t14 t10 t2 t24 t22 t19 t25
t3 t8 t15 eax t20 t21 t16 t11 edx t17 t6 t3 t26 t12 t2 t4 t21 ecx t16
t12 t9 t23 t1 t5 t20 t13 t15 t14 t5 t27 t1 t6 t19 eax t13 t4 t17 t18
t28 t7 t8 t18 t10 t7)

and secondRegister = eax.

It looks like MedMachineRegister needs #= and #hash methods to make
things working correctly with dictionaries.

I'll try add these methods, lets see if issue will disappear.

Btw, i don't know why or where it creates another instance of same
register (MedMachineRegister name: #eax), in my code i using:
'machine registerNamed: #eax', which should create instance only once,
and then return same instance for consequent calls.

</description>
    <dc:creator>Igor Stasenko</dc:creator>
    <dc:date>2008-06-01T18:25:06</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/269">
    <title>Re: Register spilling mechanism</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/269</link>
    <description> &gt; Don't worry about it. I'm planning to overthrow the rule of C and make
 &gt; every bit in VM be implemented as native methods
 &gt; which can be compiled at run time. :)
 &gt; So, i care little about what calling convention C or other libraries having.
 &gt; All foreign calls will be handled by FFI class(es) (compiled as
 &gt; anything else as well), while inside i'll have methods compiled to
 &gt; machine code and i'm free to choose any calling convention for them,
 &gt; as well as choose own stack layout which will be convenient for GC and
 &gt; object memory.
 &gt; That's why i asked how i can control these aspects with Exupery.

If an signal comes in from the OS, then the signal will be executed on
the machine (C) stack, so the stack pointer must always be correct. If
not you risk having your C stack trashed by the signal.

 &gt; &gt;
 &gt; &gt;   &gt; Can be there any contracts between Exupery and caller, where i can
 &gt; &gt;   &gt; state that code should compile under certain rules, like:
 &gt; &gt;   &gt; - specify a set of registers, which should be preserved after given
 &gt; &gt;   &gt; code done working
 &gt; &gt;   &gt; - specify, what registers code expects to be preserved when doing calls
 &gt; &gt;   &gt; - specify, what registers should be reverted just before ret/non-local
 &gt; &gt;   &gt; jump instructions
 &gt; &gt;
 &gt; &gt;  It would be possible to control what you want to, but there's no
 &gt; &gt;  published interface. The intermediate languages will change as
 &gt; &gt;  Exupery evolves.
 &gt; &gt;
 &gt; 
 &gt; Thank you for explanation. I will look forward for any updates
 &gt; concerning these features.
 &gt; Maybe, as temporary solution, i can patch register allocator to get
 &gt; some feedback on how many stack space i need, and then
 &gt; re-run it again.
 &gt; Or i can try to measure maximum number of live temps in code by own.

The best bet would be to read the final stack height from the register
allocator. Though for now it'll always be 64 bytes big. Look
at SpillInserter and MachineX86&gt;&gt;newSpillSlot. That's the code
responsible for managing the stack.

Also look at the loadStack and the storeStack instructions as they
generate the Smalltalk stack loading and storing instructions in the
register allocator after all spills have been made. This is to avoid
loading spilled values from the Smalltalk stack.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-04-24T20:16:37</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/268">
    <title>Re: Register spilling mechanism</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/268</link>
    <description>2008/4/24  &lt;bryce&lt; at &gt;kampjes.demon.co.uk&gt;:

Don't worry about it. I'm planning to overthrow the rule of C and make
every bit in VM be implemented as native methods
which can be compiled at run time. :)
So, i care little about what calling convention C or other libraries having.
All foreign calls will be handled by FFI class(es) (compiled as
anything else as well), while inside i'll have methods compiled to
machine code and i'm free to choose any calling convention for them,
as well as choose own stack layout which will be convenient for GC and
object memory.
That's why i asked how i can control these aspects with Exupery.


Thank you for explanation. I will look forward for any updates
concerning these features.
Maybe, as temporary solution, i can patch register allocator to get
some feedback on how many stack space i need, and then
re-run it again.
Or i can try to measure maximum number of live temps in code by own.




</description>
    <dc:creator>Igor Stasenko</dc:creator>
    <dc:date>2008-04-24T00:08:09</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/267">
    <title>Re: Register spilling mechanism</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/267</link>
    <description> &gt; 2008/4/22  &lt;bryce&lt; at &gt;kampjes.demon.co.uk&gt;:
 &gt; &gt;
 &gt; &gt; Igor Stasenko writes:
 &gt; &gt;   &gt; More broader explanation why i need this..
 &gt; &gt;   &gt; I want to allow direct stack manipulation for native methods to some extent.
 &gt; &gt;   &gt;
 &gt; &gt;   &gt; In code, one can write:
 &gt; &gt;   &gt;
 &gt; &gt;   &gt; (object1 expression1) push.
 &gt; &gt;   &gt; (object2 expression2) push.
 &gt; &gt;   &gt; address call.
 &gt; &gt;   &gt;
 &gt; &gt;   &gt; Now i need to make sure that Exupery will produce correct stack frame
 &gt; &gt;   &gt; for a call, regardless what code inlined in expression1/expression2
 &gt; &gt;   &gt; it should not interfere with top stack layout, which should be:
 &gt; &gt;   &gt;
 &gt; &gt;   &gt; &lt;result of expression1&gt;
 &gt; &gt;   &gt; &lt;result of expression2&gt;
 &gt; &gt;   &gt; &lt;return address&gt;
 &gt; &gt;   &gt;
 &gt; &gt;   &gt; when entering routine at (address call).
 &gt; &gt;
 &gt; &gt;  Are you talking about the C stack or the Smalltalk stack?
 &gt; &gt;
 &gt; &gt;  If you're talking about the Smalltalk stack then in Exupery it's only
 &gt; &gt;  dealt with in the front half of Exupery. It's handled in the
 &gt; &gt;  ByteCodeReader and IntermediateSimplifier. Exupery uses exactly the
 &gt; &gt;  same stack locations as the interpreter so doesn't need to check the
 &gt; &gt;  stack height. It'll break for exactly the same methods that will cause
 &gt; &gt;  the interpreter to crash. Exupery does model the stack in both classes
 &gt; &gt;  so it would be fairly easy to monitor stack height if you were
 &gt; &gt;  interested.
 &gt; &gt;
 &gt; 
 &gt; No, forget about Smalltalk stack, i plan to use Exupery at lower
 &gt; levels to produce machine code fed by my own compiler.
 &gt; 
 &gt; &gt;  The C stack is maintained by Exupery, and is currently a fixed size
 &gt; &gt;  but will be dynamically sized sometime in the future. Exupery will
 &gt; &gt;  fail to compile if it uses more C stack than exists. The amount of C
 &gt; &gt;  stack used isn't known until after register allocation because that's
 &gt; &gt;  where registers are spilled to.
 &gt; &gt;
 &gt; 
 &gt; I don't know what you mean under 'C stack'.
 &gt; Isn't Exupery compiles directly to machine code? Then why you speaking
 &gt; about C stack at all? ;)

The machine stack which C also uses. The stack follow's C's
conventions for calls and returns. It's the stack that the C
in the interpreter uses.

 &gt; As i understand (correct me if i'm wrong) by fixed size stack you
 &gt; mean, that after done register allocation, Exupery determines the
 &gt; exact size of stack, and then it inserts an instruction in method's
 &gt; preamble to allocate it, like:
 &gt; 
 &gt; push %ebp
 &gt; mod %ebp, %esp
 &gt; sub %esp, stacksize
 &gt; 
 &gt; and at return point it does reverse:
 &gt; 
 &gt; add %esp, stacksize
 &gt; pop %ebp
 &gt; ret
 &gt; 
 &gt; or, does just 'ret' , in case if caller are responsible from cleaning
 &gt; the stack and can restore %ebp.

It generates sequences like you suggest but they're generated
as low level intermediate then instruction selected before
register allocation. So the size is set before the number of
spills is known. I've been using a fixed size stack frame as
at temporary measure, it's been good enough for now, there isn't
a great deal of pressure on that memory especially as Exupery
can now spill directly into the context frame.

If you're planning on generating C style calls in code be careful.
If a GC happens as part of the call any object may be moved. Exupery
doesn't really save the C (machine) stack because it doesn't have any
state at any time it would risk making calls, it moves all the state
back into real objects so the GC can see it.

 &gt; Can be there any contracts between Exupery and caller, where i can
 &gt; state that code should compile under certain rules, like:
 &gt; - specify a set of registers, which should be preserved after given
 &gt; code done working
 &gt; - specify, what registers code expects to be preserved when doing calls
 &gt; - specify, what registers should be reverted just before ret/non-local
 &gt; jump instructions

It would be possible to control what you want to, but there's no
published interface. The intermediate languages will change as
Exupery evolves.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-04-23T21:34:21</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/266">
    <title>Re: Register spilling mechanism</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/266</link>
    <description>2008/4/22  &lt;bryce&lt; at &gt;kampjes.demon.co.uk&gt;:

No, forget about Smalltalk stack, i plan to use Exupery at lower
levels to produce machine code fed by my own compiler.


I don't know what you mean under 'C stack'.
Isn't Exupery compiles directly to machine code? Then why you speaking
about C stack at all? ;)

As i understand (correct me if i'm wrong) by fixed size stack you
mean, that after done register allocation, Exupery determines the
exact size of stack, and then it inserts an instruction in method's
preamble to allocate it, like:

push %ebp
mod %ebp, %esp
sub %esp, stacksize

and at return point it does reverse:

add %esp, stacksize
pop %ebp
ret

or, does just 'ret' , in case if caller are responsible from cleaning
the stack and can restore %ebp.

And by dynamic stack size, you mean that instead of preallocating
fixed amount of bytes on stack, in future, it can use pushes/pops in
code based on demand?

I having nothing against both models, just need a safe way how i can
manipulate stack directly without being endangered by register
allocator to corrupt stack layout which developer expecting to have.

Let me elaborate the problem in the way how i see it.
Suppose my compiler generates the code in form, what we call 'Exupery
low-level intermediate'.
So, there are a 'virtual' machine instructions arranged in blocks and
using N temporary registers.
Now, after register allocator did its job, we are know, that we need
extra K bytes on stack to hold temporary values which is not fit in
CPU registers at some points of code.
And now, an instruction selector for particular temps should use
instructions to store/read values on stack.
But also there are need to generate instructions to preallocate the
stack space to hold these values and deallocate it at some point,
where they no longer needed.
Now, who is controlling these actions? And can outer subsystem
influence where to put stack allocation instruction, for instance?

Can be there any contracts between Exupery and caller, where i can
state that code should compile under certain rules, like:
- specify a set of registers, which should be preserved after given
code done working
- specify, what registers code expects to be preserved when doing calls
- specify, what registers should be reverted just before ret/non-local
jump instructions




</description>
    <dc:creator>Igor Stasenko</dc:creator>
    <dc:date>2008-04-22T22:56:53</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/265">
    <title>Re: Register spilling mechanism</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/265</link>
    <description> &gt; More broader explanation why i need this..
 &gt; I want to allow direct stack manipulation for native methods to some extent.
 &gt; 
 &gt; In code, one can write:
 &gt; 
 &gt; (object1 expression1) push.
 &gt; (object2 expression2) push.
 &gt; address call.
 &gt; 
 &gt; Now i need to make sure that Exupery will produce correct stack frame
 &gt; for a call, regardless what code inlined in expression1/expression2
 &gt; it should not interfere with top stack layout, which should be:
 &gt; 
 &gt; &lt;result of expression1&gt;
 &gt; &lt;result of expression2&gt;
 &gt; &lt;return address&gt;
 &gt; 
 &gt; when entering routine at (address call).

Are you talking about the C stack or the Smalltalk stack?

If you're talking about the Smalltalk stack then in Exupery it's only
dealt with in the front half of Exupery. It's handled in the
ByteCodeReader and IntermediateSimplifier. Exupery uses exactly the
same stack locations as the interpreter so doesn't need to check the
stack height. It'll break for exactly the same methods that will cause
the interpreter to crash. Exupery does model the stack in both classes
so it would be fairly easy to monitor stack height if you were
interested.

The C stack is maintained by Exupery, and is currently a fixed size
but will be dynamically sized sometime in the future. Exupery will
fail to compile if it uses more C stack than exists. The amount of C
stack used isn't known until after register allocation because that's
where registers are spilled to.

It's also possible to spill directly into the context (into the
Smalltalk stack) which is done for stack registers. That's new in
0.14 and will be extended to temporaries as well.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-04-22T20:39:13</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/264">
    <title>Re: Debugging a bug that wasn't.</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/264</link>
    <description>Igor Stasenko writes: 
 &gt; Okay. So, the problem that forcing to collect garbage after single
 &gt; allocation seem too stressing for your tests.
 &gt; Then why not divide test on parts - one for testing GC, another one
 &gt; for testing interrupts which causing active process switching?

There's no bug hence the title of this thread. For a week I thought
there was a bug because with one instance it was appearing and
disappearing based on whether I compiled the code or not. That was
just a consequence of a very close race.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-04-22T19:19:07</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/263">
    <title>Re: Register spilling mechanism</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/263</link>
    <description>More broader explanation why i need this..
I want to allow direct stack manipulation for native methods to some extent.

In code, one can write:

(object1 expression1) push.
(object2 expression2) push.
address call.

Now i need to make sure that Exupery will produce correct stack frame
for a call, regardless what code inlined in expression1/expression2
it should not interfere with top stack layout, which should be:

&lt;result of expression1&gt;
&lt;result of expression2&gt;
&lt;return address&gt;

when entering routine at (address call).

</description>
    <dc:creator>Igor Stasenko</dc:creator>
    <dc:date>2008-04-22T13:01:16</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/262">
    <title>Register spilling mechanism</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/262</link>
    <description>I'm currently want to make an inlining for methods of specific format
(lets call them 'native' methods),
these methods should work as ST methods (same calling convention &amp;
stack layout when you entering method), but
these methods don't do polymorphic sends unless specified exclusively,
instead all sends are inlined statically with compiler.
I don't want to describe much details here, just want you to help me
how to deal with register spilling when inlining.

Suppose i inlining a send like:

object message: a+b with: c+d with: a+d with: ....

so, all arguments to method now should use machine temporary registers
instead of stack, because it's more effective.

What i fear of that if register allocator sees that there is not
enough registers in CPU to fit them all it starts using stack for
storing temp values
and there is some situations where i need to know exact stack depth
for a method.
Is there any way in Exupery to get know, how many registers will be
spilled on stack, so this information can be used by my compiler?


</description>
    <dc:creator>Igor Stasenko</dc:creator>
    <dc:date>2008-04-22T12:44:08</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/261">
    <title>Re: Debugging a bug that wasn't.</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/261</link>
    <description>2008/4/20  &lt;bryce&lt; at &gt;kampjes.demon.co.uk&gt;:

Okay. So, the problem that forcing to collect garbage after single
allocation seem too stressing for your tests.
Then why not divide test on parts - one for testing GC, another one
for testing interrupts which causing active process switching?




</description>
    <dc:creator>Igor Stasenko</dc:creator>
    <dc:date>2008-04-21T22:18:01</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/260">
    <title>Re: Debugging a bug that wasn't.</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/260</link>
    <description> &gt; GC signals a semaphore after finishing (don't remember - is weak
 &gt; finalization process using it or different one).
 &gt; I'm not sure, can it somehow interfere with given case?
 &gt; 
 &gt; If weak finalization loop creates new objects, it can lead to
 &gt; exclusive looping in it, because user processes running with lower
 &gt; priority.
 &gt; 

The problem has nothing to do with finalisation. The issue is running
a GC for every allocation slows down execution a lot. Enough that
there's a race to get through the test code before all 10 profiling
threads start profiling. If they all start profiling then they can
consume all available CPU.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-04-20T20:51:08</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/259">
    <title>Re: Debugging a bug that wasn't.</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/259</link>
    <description>2008/4/20  &lt;bryce&lt; at &gt;kampjes.demon.co.uk&gt;:

GC signals a semaphore after finishing (don't remember - is weak
finalization process using it or different one).
I'm not sure, can it somehow interfere with given case?

If weak finalization loop creates new objects, it can lead to
exclusive looping in it, because user processes running with lower
priority.

finalizationProcess

[true] whileTrue:
[FinalizationSemaphore wait.
FinalizationLock critical:
[FinalizationDependents do:
[:weakDependent |
weakDependent ifNotNil:
[weakDependent finalizeValues.
"***Following statement is required to keep weakDependent
from holding onto its value as garbage.***"
weakDependent := nil]]]
ifError:
[:msg :rcvr | rcvr error: msg].
].

see, if FinalizationSemaphore having excess signals after executing
cycle once, it continues with execution.
I think better to place FinalizationSemaphore initSignals at the end
of weak finalization, so it wouldn't be triggered by garbage
collection which may be happen in weak finalization routines.




</description>
    <dc:creator>Igor Stasenko</dc:creator>
    <dc:date>2008-04-20T13:25:45</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/258">
    <title>Debugging a bug that wasn't.</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/258</link>
    <description>
I'm testing and debugging before the 0.14 release. The major benefit
is faster compile times and better generated code. There's around a
10% gain or loss for the macro benchmarks depending on what's
compiled.

The bug was triggered by the below code. Originally it was compiling
"^[42] value". The set-up shown below is what generates the fault
without needing to compile the code. The fault was odd because it
wasn't crashing, and wasn't stuck in an infinite loop but the image
had locked up. 

I was investigating by interrupting execution with gdb and looking at
both the Smalltalk stack using "p printCallStack()" and the C stack
using "where". Normally, if compiled code was corrupting the object
memory it would crash fairly quickly. That it kept executing, which
could be seen because the Smalltalk stack kept changing was weird.  If
the GC's got a corrupt view of the object memory then normally it'll
end up really corrupting it when it decides to try to interpret the
middle of an object as an object header.

The "Error signal." at the front of the test is because running the
test will lock up the image. This allows the rest of the suite to
be run.

    testInterruptCausesCrashes
"Crashing is a failure"
| processes |
Error signal.
self compileExpression: 'self createPoint'.
processes := (1 to: 10) collect: 
    [:each| [[(ExuperyProfiler
               spyOn: [(Delay forMilliseconds: 500) wait])
         queueCompilationCommandsOn: SharedQueue2 new] repeat] fork.
"Force a garbage collect on every allocation"
SmalltalkImage current vmParameterAt: 5 put: 1.
10000 timesRepeat: [self example].
        Exupery log: 'after running example'.
"Reset garbage collects to a sensible value"
SmalltalkImage current vmParameterAt: 5 put: 4000.
processes do: [:each| each terminate].

    createPoint
"^ 10 &lt; at &gt; 20"

What I think is happening is the 10 profiling processes end up
consuming all the time and starving the user level process so it
never completes and never resets the garbage collector's collection
rate to a sane value. Having produced the lockup twice without any
compiled code, I'm reasonably happy that this is the case.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-04-20T09:38:45</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/257">
    <title>Deciding how to fix a bug.</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/257</link>
    <description>
I've started the final release testing for Exupery 0.14. There are two
bugs captured with tests so far. The first has something to do with
interrupts, it needs a few running profilers to cause the bug to
happen. I'm hoping this one is the source of the unrepeatable bugs
that were in the the previous release, if so solving it may allow
Exupery to run for much longer as a background compiler.

The second bug is caused by the current work. What's happening is the
currentContext register is being spilled but is being used by code
generated in the register allocator to load the stack. It may also be
needed by spill code to spill back into the stack.

The simplest fix is to avoid spilling the currentContext. I doubt that
spilling the currentContext is a good idea as it'll make spills of
stack registers much more complex. Not spilling currentContext is
probably the right thing to do unless the spill heuristics are
improved to avoid spilling currentContext then spilling many stack
registers that will use it to load themselves from the stack.

A theoretically better fix would be to get the register allocator to
reload the currentContext if required when generating spill code that
needs it. This is nicer as it doesn't require making another register
special. There are a few practical problems to make sure that this
doesn't lead to very bad code when we decide to spill currentContext
then spill other registers. Also, the only gain to handling
currentContext as a real register is allowing it to be spilled when
appropriate which could be very nice if it was a special register that
would be loaded from it's real location rather than spilled from the
stack.

My feeling is the right solution for now is to avoid spilling
currentContext and leave the better solution until a time when it's
appropriate to develop a more complete solution including both proper
spill heuristics and reloading spilled registers from the
interpreter's variables.

The background compiler compiled over 1200 methods before running into
one where the currentContext was spilled. So at worst, forcing it into
a register will only make 1 method in a thousand worse, and may make
it better.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-04-12T21:40:47</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/256">
    <title>Floating point support</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/256</link>
    <description> &gt; Hello Bryce,
 &gt; 
 &gt; i'd like to ask you, how do you envision adding floating point support
 &gt; to Exupery
 &gt; to make it able to use FPU.
 &gt; 
 &gt; The problem i see, is that it would require adding a type to registers
 &gt; to be able to use float-type values in intermediate code
 &gt; representation.
 &gt; 
 &gt; Or else, how it would look like, when i need to encode floating point
 &gt; operations like:
 &gt; 
 &gt; temp1 := floatValue1.
 &gt; temp2 := floatValue2.
 &gt; temp3 := temp1 + temp2. "here compiler should generate FPU
 &gt; instructions instead of integer addition"
 &gt; 
 &gt; Or maybe i'm looking at problem at wrong angle?
 &gt; Maybe easier to make intermediate in format like:
 &gt; 
 &gt; unaryFloatOp(argmunentAddress, resultAddress)
 &gt; binaryFloatOp(argmunent1Address, argument2Address, resultAddress)
 &gt; compareFloatsOp(arg1address, arg2address)
 &gt; 
 &gt; then compiler don't have to deal with float registers (in register
 &gt; spilling code and other optimization patterns).

I'll probably create floating point operations and have floating point
registers. The operations will be to make instruction selection easy,
the registers to play nicely with the register allocator. Floating
point registers are a separate register bank independent from the
integer registers.

I'm planning on supporting SSE2 first. It should be widely enough
deployed, and it's easier to deal with than the legacy x87 stack based
floating point instruction set. SSE2 is also the default in 64 bit
mode as the number of SSE2 registers was doubled from 8 to 16 but the
x87 register set left unchanged.

 &gt; Another thing, is support of byte-wide operations, like
 &gt; loading/storing byte at specific address. And of course being able to
 &gt; do some operations with byte-sized values.
 &gt; How do you plan to support this?

With an 8 bit mem operation. Make operations that make it easy to load
and store a 8 bit quantity. Operate on it in 32 bit registers.

 &gt; The reason, why i'm asking about this, is that i'm currently busy with
 &gt; this: http://wiki.squeak.org/squeak/6041
 &gt; 
 &gt; As you maybe remember i had plans to do such system before. And now i
 &gt; spent some time to push it to the point, where it can become a reality
 &gt; :)

I remember.

Bryce
</description>
    <dc:creator>bryce&lt; at &gt;kampjes.demon.co.uk</dc:creator>
    <dc:date>2008-03-16T21:37:00</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/255">
    <title>Floating point support</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.smalltalk.exupery/255</link>
    <description>Hello Bryce,

i'd like to ask you, how do you envision adding floating point support
to Exupery
to make it able to use FPU.

The problem i see, is that it would require adding a type to registers
to be able to use float-type values in intermediate code
representation.

Or else, how it would look like, when i need to encode floating point
operations like:

temp1 := floatValue1.
temp2 := floatValue2.
temp3 := temp1 + temp2. "here compiler should generate FPU
instructions instead of integer addition"

Or maybe i'm looking at problem at wrong angle?
Maybe easier to make intermediate in format like:

unaryFloatOp(argmunentAddress, resultAddress)
binaryFloatOp(argmunent1Address, argument2Address, resultAddress)
compareFloatsOp(arg1address, arg2address)

then compiler don't have to deal with float registers (in register
spilling code and other optimization patterns).

Another thing, is support of byte-wide operations, like
loading/storing byte at specific address. And of course being able to
do some operations with byte-sized values.
How do you plan to support this?

The reason, why i'm asking about this, is that i'm currently busy with
this: http://wiki.squeak.org/squeak/6041

As you maybe remember i had plans to do such system before. And now i
spent some time to push it to the point, where it can become a reality
:)

</description>
    <dc:creator>Igor Stasenko</dc:creator>
    <dc:date>2008-03-16T13:05:06</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.comp.lang.smalltalk.exupery">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.comp.lang.smalltalk.exupery</link>
  </textinput>
</rdf:RDF>
