Haxxoring the elf format for 1k/4k stuff
category: code [glöplog]
greetings to iblis!
Quote:
339bytes + (strlen(libraryname)+9) bytes per library + 4 bytes per symbol
wow! we want a paper please *_*
Yup I'll write something about it and release sauce once I've actually made a prod with it. I'd rather not disclose the source just yet because if I did someone would surely beat me to it :)
So what does import-by-hash refer to if you're storing the full names of the libraries you're importing?
Full names for the libraries, hashes of the function names you want to use. As far as I know there's no way to import the libraries themselves by hash, and it wouldn't represent much of a saving (if any) because generally you only import 2 or 3 libraries and the standard elf way of importing libraries is small anyway.
parapete: Linux kernel makes a few automatic assumptions if the ELF header is modified to an incompatible form so basically you can save(by means of a better compression rate) a few bytes by changing certain information bytes (like for example architecture information byte iirc) to 0x00. Also changing header's section access rights to RWX or equal probably helps a bit with the compression.
Kudos for your&leblane's research. Shall buy you guys a beer at some demoparty.
Kudos for your&leblane's research. Shall buy you guys a beer at some demoparty.
waffle, on the contrary we owe you a beer for doing 1k on Linux in the first place :)
We fiddled about with redundant bytes in the header a bit. I found that you get the best pack rate by zeroing most of them, although there might be an opportunity to store hashes in the header, I've not tested this yet though.
I'm ashamed to admit I did have a quick look at "Paeaeministeri Vaeyrynen greets Accession only" in a hex editor and it looks like you're using the traditional dlopen+dlsym technique. Is that right? Are there any more tricks at work there?
We fiddled about with redundant bytes in the header a bit. I found that you get the best pack rate by zeroing most of them, although there might be an opportunity to store hashes in the header, I've not tested this yet though.
I'm ashamed to admit I did have a quick look at "Paeaeministeri Vaeyrynen greets Accession only" in a hex editor and it looks like you're using the traditional dlopen+dlsym technique. Is that right? Are there any more tricks at work there?
For anyone who's interested, here's my port of flow2 to Linux. 143 bytes smaller than the original windows version, and it even includes proper timer code :)
It's written in 100% asm, and the source is included. It's really meant as a proof of concept of the technique. It requires hand crafted elf headers, so an exe packer or crinkler-style replacement linker needs to be written to make the technique useful for intros written in C/C++. I was hoping to use an ld linker script to do it, but ld seems to be far less flexible than I'd hoped.
It's written in 100% asm, and the source is included. It's really meant as a proof of concept of the technique. It requires hand crafted elf headers, so an exe packer or crinkler-style replacement linker needs to be written to make the technique useful for intros written in C/C++. I was hoping to use an ld linker script to do it, but ld seems to be far less flexible than I'd hoped.
parapete, leblane: I love you, guys :)
And just to piss on leblane's bonfire a bit, I have 794 bytes now :D
1024 - 143 -749 = 132 bytes gain?
wow flow2 on linux! Very cool work.
*bump*
@leblane: thx for the flow2 source code =)
@leblane: thx for the flow2 source code =)
Quote:
To be fair I guess you dont have to bother with all that nasty getprocaddress crap I had to deal with under windows to get the shaders up and running. Right?143 bytes smaller than the original windows version
I got a self compilable version of crings and it's only 585 bytes.
problem is you need gcc and glut dev... for those who are interested:
crings in 585 bytes
I'm sure this can be be done better, but this was thrown together in 2 hours.
problem is you need gcc and glut dev... for those who are interested:
crings in 585 bytes
I'm sure this can be be done better, but this was thrown together in 2 hours.
Self compiling? You might want to put that in a different thread then ;)
The self compiling technique does have potential for making tiny "executables" but I find it really impure. I guess it's only as impure as using a gzip dropper but then I plan on eradicating the need for that soon enough.
The self compiling technique does have potential for making tiny "executables" but I find it really impure. I guess it's only as impure as using a gzip dropper but then I plan on eradicating the need for that soon enough.
Why not write the intro in Python/Perl/etc instead of compiling C?
Deltafire: Because Python/Perl/etc is usually a few orders of maginude slower than C. For the typical "one GL_QUAD and a huge fragment shader" 1k intros that might not matter, though.
I just had that silly idea.. why not write a perl script that expands to C code that gets compiled ;)
why not write a quine that outputs it's source code and compiles itself?
Write a perl script to perform inverse BWT on your C source code!
This is the fruit of my labours from this thread (sauce included). </spam>
did same pouet topic/discussion/article exists for win32 ??? the online thing i found on the web is this http://www.phreedom.org/solar/code/tinype/
i believe that this stuff with a algo that do some LZ , decompress the stuff in memory then execute it would be nice (is this possible in win32, or is it easier to write the stuff to hdd then execute it (like bat/cab do) ?)
seems some top 1k (like himalaya or tracie) use much more advanced techniques (like context mixing) but i have no idea how to implement this properly
i believe that this stuff with a algo that do some LZ , decompress the stuff in memory then execute it would be nice (is this possible in win32, or is it easier to write the stuff to hdd then execute it (like bat/cab do) ?)
seems some top 1k (like himalaya or tracie) use much more advanced techniques (like context mixing) but i have no idea how to implement this properly
I'm currently trying a bit arround too, the first step was to improve the make_it_4k by string loader from fit (can be done smaller).
with a nice import.h & import.o you can now use the exported symbols within normal C code.
parapete... now, can we do that with import by hash too? the by string method is so 2001.
Let's make linux more interesting for 4k intros again!
Code:
bits 32
extern dlopen, dlsym
global import
global oglBegin
global oglEnd
global sdlInit
global sdlQuit
; constant definitions
RTLD_NOW equ 2
section .text
import:
;-1... we'll inc that.
mov edi, dword (data-1)
_lib:
inc edi
mov al, byte [edi]
;done?
test al, al
jz _zero
;--- we have a library...
push dword RTLD_NOW
push edi
call dlopen
mov ebx, eax
pop eax
pop eax
;--- done!
_func:
;goto next string
xor eax, eax
xor ecx, ecx
not cl
repne scasb
mov al, byte [edi]
test al, al
;two zeros found? -> new lib follows!
jz _lib
;--- we have a function we want to import.
push edi
push ebx
call dlsym
;store it... all function names have to be longer than 4 (or 5 with trailing '\0')
stosd
pop eax
pop eax
;--- done - goto next func
jmp _func
_zero:
ret
section .data
data:
db "libGL.so",0
oglBegin db "glBegin",0
oglEnd db "glEnd",0,0 ; two zeros, new lib follows
db "libSDL.so",0
sdlInit db "SDL_Init",0
sdlQuit db "SDL_Quit",0,0,0 ; three zeros -> DONE!
with a nice import.h & import.o you can now use the exported symbols within normal C code.
parapete... now, can we do that with import by hash too? the by string method is so 2001.
Let's make linux more interesting for 4k intros again!
Okay, I'm currently writing "HOW DO I 1K IN LINUX?" (or a commented source at least). The whole technique I used is too complicated to sum up sufficiently in a paragraph, what I can tell you now though is that you want to avoid dlopen and dlsym altogether.