State of art for 64bit EXE/ELF compression.
category: code [glöplog]
Just wondering whats the latest progress on x64 compression for intros and things?
As people knew for years, I been working on and off on my own LZMA1 based packer. It existed in several forms but after my 5 year personal haitus, I decided to get back into coding demos again. Which means me working on my packer (plus personal DRM) again. With support for x64.
Just wondering whats the state of binary compression these days?
* I noticed on consoles its upkr, but that can be ported to PC.
* Is Squishy still the dominant one for 64bit intros, and has there been progress on kkrunchy-like code preprocessing for 64bit, or is still E8 filters still the go? I noticed Razor's prods even use UPX uncustomized.
* Whats packing on Linux like? Is there now specific compressing linkers?
As people knew for years, I been working on and off on my own LZMA1 based packer. It existed in several forms but after my 5 year personal haitus, I decided to get back into coding demos again. Which means me working on my packer (plus personal DRM) again. With support for x64.
Just wondering whats the state of binary compression these days?
* I noticed on consoles its upkr, but that can be ported to PC.
* Is Squishy still the dominant one for 64bit intros, and has there been progress on kkrunchy-like code preprocessing for 64bit, or is still E8 filters still the go? I noticed Razor's prods even use UPX uncustomized.
* Whats packing on Linux like? Is there now specific compressing linkers?
IIUC, Squishy's code processing is even more sophisticated (and effective) than kkrunchy's, as it is tightly integrated with the context modelling, i.e. rather than predicting from preceding bytes, it predicts from preceding instructions, with knowledge of the roles each byte has in the instruction encoding (simply put).
Hmmm, so it might not be worth much effort then pursuing with LZMA1 alone and using something similar to 7zip's code filter. Or even abusing recent Windows compression APIs like for LZMS and similar.
Might just work and flesh out the packer just for fun (and add antidebug+etc to it anyway as a fork of it).
Might just work and flesh out the packer just for fun (and add antidebug+etc to it anyway as a fork of it).
Quote:
re: Linux: best in town is currently epoqe's cold, but it's not publicly available. The other best option is probably to use smol and oneKpaq.
Any dedicated intro (16kb-96kb) packer for Linux around? Last I seen there was some LZMA shenagigans with shell scripts/cat or something? I do wonder then if a small depacker (like 500 or so bytes for LZMA1 like I have), will help there? Or does the current Linux hackery around work enough assuming that everyone has xz or something?
Quote:
IIUC, Squishy's code processing is even more sophisticated (and effective) than kkrunchy's, as it is tightly integrated with the context modelling, i.e. rather than predicting from preceding bytes, it predicts from preceding instructions, with knowledge of the roles each byte has in the instruction encoding (simply put).
I noticed LZMA2 currently has a similar system in BCJ2, I wonder how that would go in size-optimized form.
Quote:
Quote:IIUC, Squishy's code processing is even more sophisticated (and effective) than kkrunchy's, as it is tightly integrated with the context modelling, i.e. rather than predicting from preceding bytes, it predicts from preceding instructions, with knowledge of the roles each byte has in the instruction encoding (simply put).
I noticed LZMA2 currently has a similar system in BCJ2, I wonder how that would go in size-optimized form.
Sorry, only seeing this now.
You can use xz or LZMA just fine (though for 16..64k, I'd advise using an in-memory decompressor using memfd_create, like vondehi. oneKpaq is meant for sizes between 1k and 8k, it's probably too slow for anything bigger. There's no 64k-focussed (de)compressor (like Squishy or kkrunchy) for Linux, though.
BTW, the (de)compression tools (oneKpaq/vondehi) are completely orthogonal to smol (so you can use smol+oneKpaq, or smol+xz/vondehi, however you like). I've seen several instances of people using the regular GCC/clang linker with xz/lzma because they thought smol was only meant for 4ks, while it works just fine with 64k binaries.
xz/LZMA2's BCJ2 is a bit of a hack for one single special-case pattern. Compressors that look for patterns in a more general way will vastly outperform it.
Oh wow, thanks for the link for vondehi, it took me down a rabbit hole. I wonder how hard it would be to do a UPX replacement for Linux using that :)
I remember the itch.io guy doing a ELF packer in Rust. I wonder if he still has his posts up, I would love to read those too. Exploring ELF in general for EXE/SO packing sounds kinda fun, not just x64 PE.
I remember the itch.io guy doing a ELF packer in Rust. I wonder if he still has his posts up, I would love to read those too. Exploring ELF in general for EXE/SO packing sounds kinda fun, not just x64 PE.
Quote:
I decided to get back into coding demos again.
I just wanted to say: Excellent!
Quote:
IIUC, Squishy's code processing is even more sophisticated (and effective) than kkrunchy's, as it is tightly integrated with the context modelling, i.e. rather than predicting from preceding bytes, it predicts from preceding instructions, with knowledge of the roles each byte has in the instruction encoding (simply put).
For kkrunchy it's a pre/postprocess where different data gets put into separate 'streams' and then crunched. Squishy keeps around multiple contexts and switches between them based on the type of data being compressed at that point in time (e.g. 'this is data, this is an opcode, this is an argument', etc).
Quote:
Quote:
IIUC, Squishy's code processing is even more sophisticated (and effective) than kkrunchy's, as it is tightly integrated with the context modelling, i.e. rather than predicting from preceding bytes, it predicts from preceding instructions, with knowledge of the roles each byte has in the instruction encoding (simply put).
For kkrunchy it's a pre/postprocess where different data gets put into separate 'streams' and then crunched. Squishy keeps around multiple contexts and switches between them based on the type of data being compressed at that point in time (e.g. 'this is data, this is an opcode, this is an argument', etc).
Isnt that similar to BCJ2 tho? Either way, for the x86 port, might just use kkrunchy and for x64, use BCJ/BCJ2. Older versions of my packer just used BCJ1 and a compiler generated then-optimized LZMA decoder (1000~ bytes, way less than UPX's 2-3kb), new ones use a coldgolfed 514 byte version. Might just add in upkr for shits and giggles. Not sure how to approach ARM yet.
Quote:
Quote:I decided to get back into coding demos again.
I just wanted to say: Excellent!
Release schedule will be purely sporadic (so if it makes a demoparty's deadline, it does, if not, always a next one). I learnt over the years in hiding and haitus that I cannot rush art which was a problem in my previous prods. Picked up doing physical artwork and so learnt a lot about myself in the process and overcame my block of perfectionism, which was basically why i didnt do a prod in 5 years (I felt if I couldnt at least surpass the old ones, I might as well not try at all). Massive amounts of therapy helped as well as improving physical fitness as well, to give me self confidence again.
Quote:
Quote:Quote:
IIUC, Squishy's code processing is even more sophisticated (and effective) than kkrunchy's, as it is tightly integrated with the context modelling, i.e. rather than predicting from preceding bytes, it predicts from preceding instructions, with knowledge of the roles each byte has in the instruction encoding (simply put).
For kkrunchy it's a pre/postprocess where different data gets put into separate 'streams' and then crunched. Squishy keeps around multiple contexts and switches between them based on the type of data being compressed at that point in time (e.g. 'this is data, this is an opcode, this is an argument', etc).
Isnt that similar to BCJ2 tho? Either way, for the x86 port, might just use kkrunchy and for x64, use BCJ/BCJ2. Older versions of my packer just used BCJ1 and a compiler generated then-optimized LZMA decoder (1000~ bytes, way less than UPX's 2-3kb), new ones use a coldgolfed 514 byte version. Might just add in upkr for shits and giggles. Not sure how to approach ARM yet.
Quote:
Quote:Quote:
IIUC, Squishy's code processing is even more sophisticated (and effective) than kkrunchy's, as it is tightly integrated with the context modelling, i.e. rather than predicting from preceding bytes, it predicts from preceding instructions, with knowledge of the roles each byte has in the instruction encoding (simply put).
For kkrunchy it's a pre/postprocess where different data gets put into separate 'streams' and then crunched. Squishy keeps around multiple contexts and switches between them based on the type of data being compressed at that point in time (e.g. 'this is data, this is an opcode, this is an argument', etc).
Isnt that similar to BCJ2 tho? Either way, for the x86 port, might just use kkrunchy and for x64, use BCJ/BCJ2. Older versions of my packer just used BCJ1 and a compiler generated then-optimized LZMA decoder (1000~ bytes, way less than UPX's 2-3kb), new ones use a coldgolfed 514 byte version. Might just add in upkr for shits and giggles. Not sure how to approach ARM yet.
Quote:
Quote:Quote:
IIUC, Squishy's code processing is even more sophisticated (and effective) than kkrunchy's, as it is tightly integrated with the context modelling, i.e. rather than predicting from preceding bytes, it predicts from preceding instructions, with knowledge of the roles each byte has in the instruction encoding (simply put).
For kkrunchy it's a pre/postprocess where different data gets put into separate 'streams' and then crunched. Squishy keeps around multiple contexts and switches between them based on the type of data being compressed at that point in time (e.g. 'this is data, this is an opcode, this is an argument', etc).
Isnt that similar to BCJ2 tho? Either way, for the x86 port, might just use kkrunchy and for x64, use BCJ/BCJ2. Older versions of my packer just used BCJ1 and a compiler generated then-optimized LZMA decoder (1000~ bytes, way less than UPX's 2-3kb), new ones use a coldgolfed 514 byte version. Might just add in upkr for shits and giggles. Not sure how to approach ARM yet.
what you're talking about only looks at branch targets, and changes them from absolute to relative and vice versa, while not reordering code bytes or anything. kkrunchy almost entirely disassembles x86 to figure out what byte represents what, and puts them in separate streams to be compressed. squishy goes even further by having maths rule out which byte means what instead of having 'static' rules about this. hence, in comparison, BCJ1/2 are kind of garbage if you have a full 64k to play with
kkrunchy's "stream reordering" principle is explained here: https://www.farbrausch.de/~fg/seminars/workcompression.html. squishy is explained in one of ferris' youtube videos (and is tbh what you should try to target nowadays, as opposed to kkruncy-style fixed rulesets). oneKpaq and crinkler do something like squishy as well, except with lower complexity and worse algorithms (hence why they are too slow for 64ks)
shit how did 3 copies of the same quote end up here? sorry.
Quote:
Quote:Quote:I decided to get back into coding demos again.
I just wanted to say: Excellent!
Release schedule will be purely sporadic (so if it makes a demoparty's deadline, it does, if not, always a next one). I learnt over the years in hiding and haitus that I cannot rush art which was a problem in my previous prods. Picked up doing physical artwork and so learnt a lot about myself in the process and overcame my block of perfectionism, which was basically why i didnt do a prod in 5 years (I felt if I couldnt at least surpass the old ones, I might as well not try at all). Massive amounts of therapy helped as well as improving physical fitness as well, to give me self confidence again.
Mate, I had also this back in the day, could not make music, well "real" music, I just made joke tracks. I had zero self belief and felt I couldnt surpass or even reach the same level. Some of this was from what some sceners said, behind my back, to fex. Yolk back then. To me they were "yeah cool nice" etc.
Of course we all were kids back then...
Physical fitness is a real lifesaver.
Quote:
shit how did 3 copies of the same quote end up here? sorry.
Extremely disappointing compression ratio :(
