Saturday 22 March 2014

Saturday arvo elf-loader hack-a-thon

So I hacked a bit more of the elf loader today. Initially it was just documenting where I got to so I could work out where to go next. I wrote up a bit more background and detail on how it works. Then I cleared out the out of date experimental stuff and focussed on the ez_ interfaces.

But then I got a bit side-tracked ...

Compact startup

First I thought i'd see if i could replace crt0 with my own. Apart from an initialisation issue with bss and data there isn't really anything wrong with the bundled one ... apart from it dragging in a bunch of C support stuff which isn't necessary if writing small kernel code that doesn't need the full libc.

So I took the crt0.s and stripped out a bunch of stuff. The trampoline to a potentially off-core start routine (it's going to be on-core). The atexit init (not necessary). The constructors init (hopefully not necessary). The .bss clearing (it's problematic when you have possibly more than one block of bss such as shared and given that .data isn't re-initialised anyway the C language behaviour is already broken). And the argument setting (three zero's isn't useful for anything and isn't even correct).

I toyed with the idea of passing arguments to kernel but decided to just have a void main instead.

I just pass -nostartfiles e-crt0minimal.o to the link line to replace the standard start-up code.

 e-gcc -Wl,-r -o e-test-reloc.elf -nostartfiles e-crt0minimal.o e-test-reloc.o -le-lib

Worth it? Probably ...

Minimal crt0:

$ e-size e-test-workgroup-a.elf
   text    data     bss     dec     hex filename
    548      64     120     732     2dc e-test-workgroup-a.elf

Standard crt0:

$ e-size e-test-workgroup-a.elf
   text    data     bss     dec     hex filename
   1418    1196     128    2742     ab6 e-test-workgroup-a.elf

When you've only got 32K to play with saving 2K isn't to be sniffed at.

Matching addresses (aka fake hsa)

Next I looked at just using mmap to map the shared memory so pointers can be shared between the epiphany and the host arm directly. I tried to use e_get_platform_info() to get the list of memory blocks but for some odd reason that zeros out the memory array pointer? Odd. So ... I just access the struct directly via an extern instead.

This is just an implementation of stuff from a previous post but using the platform_info to find the addresses.

I have no idea whether this will work on a multi-epiphany setup but since I don't have one it's not something i'll lose sleep over :)

*COM* symbols

About this point I noticed that some symbols weren't being allocated any location in the output file and thus could not be resolved during the loader-linker execution. These were symbols marked with the section id of COMMON. I had hit this before but I had forgotten all about it. Last time I solved it using a linker script but I found I can just pass '-d' to the linker to achieve the same result which puts any such values into bss.

Automatic remote-core on-chip symbol resolution

Then I had a look into implementing fully automatic resolution of remote but on-chip symbols. The options are limited but the desired target core can be indicated by the symbol name.

For example:

program a:
   extern int buffer[12];

program b:
   // current cell relative +0, +1
   extern int buffer_0_1[12];

   // group relative 0,1
   extern int buffer$0$1[12];

Or some variation thereof. This is easy enough to parse and implement, and not too ugly to use.

But it would mean that the binary for every core would need to be linked individually and thus it wouldn't be possible to just copy the same code across to multiple cores when they share the implementation. For this reason I've dropped the idea for now. Having to use e_get_global_address() on a weak symbol isn't too difficult.

No comments: