Let's Get Brute'al by Arise
------------------------------------------------------------------------------- " l e t ' s g e t b r u t e ' a l " . ------------------------------------------------------------------------------- 4k intro by arise designs, for the almighty commodore 64. still not dead. yet. released at moonshine dragons party 2019 @ opole, poland - 17-19 may 2019. it reached some place there ;) c r e d i t s . ------------------------------------------------------------------------------- code: wacek. sample replay routine uses mahoney's method. help/tools code: mojzesh. music/sampling: wacek. samples taken from "technoid (dub mix)" by technoid. these are 11 kHz 8-bit samples, accept no limitations ;) "graphics": wacek. design: bimber & wacek. ideas: mojzesh & wacek. r e q u i r e m e n t s . ------------------------------------------------------------------------------- the intro requires 8580 sid and unmodified roms to function properly. please use the supplied tester executable to verify. in case of vice, set up 8580 and disable digi booster! as it creates unpleasant distortion clicks. use the real hardware if possible, as it just sounds better on real machine. if the samples sound crap, you're probably doing something wrong ;) t h i s i s i m p o r t a n t . ------------------------------------------------------------------------------- kordee, zdrowiej brachu! trzymamy kciuki. t h e s t o r y . ------------------------------------------------------------------------------- [wacek] the basic idea behind this intro was born in my head many years ago, probably still in the '90s. the idea was to pack samples in a lossy way, using reference byte sequences present in c64 roms memory. mojzesh gave it an appropriate name when working on this intro: "brute crunch". it sums up the idea quite nicely ;) but back in the 90s i did not consider the available sampling replay methods good enough to play with that idea. especially that i hated 4bit samples, and owned a 8580 based c64 ;) so the idea did not come into any execution. that changed recently with many new 8-bit sample playback routines invented, out of which mahoney's one i consider superior both in quality and simplicity ;) so, "brute crunch" came back to me as an idea to do a 4k with some samples. but, what is brute crunch anyway? we take a sample sequence (source), and we take the "static" memory we have meaning basic, chargen and kernal roms (reference). then we decide on a "frame" (number of bytes) that we will be searching in the reference memory. we choose the closest one (comparison method is a separate issue), and we store it as offset of the reference memory (2 bytes). so, for example, if we use 32 bytes frame, we compress the whole sample stream to ~6% of original size. and we all know that samples compress lousy, right? ;) sounds great and easy, right? fuck nope. the devil is in the details. first of all, statistics are a bitch. taking into account only the available roms, we have 20kB of reference, so roughly 20,400 byte combinations to choose from. that is far from enough, and the results were obviously crap. so the first thing we added to the method was 4kB of some ram generated bytes. my music producing intution suggested some sinuses with different amplitudes and cycles, and a basic flat climbing 0-255 sequence. as it turnes out, with the chosen sample that was an excellent choice (what a shocker ;). that gave us ~24,500 combinations. better, but still not enough. we needed big data! so we thought about adding one more byte to the result, as a byte modifier. we started with the ADD and EOR methods. so, every sequence would be modified by $00,$01,$02...$ff by either ADDing that modifier byte to every byte in the sequence with carry discarded (ADD method), or EORing every byte with the modifier byte (EOR method). now, the amount of possible combinations rapidly grew to ~6,300,000 :) and statistics became far more favourable. the crunched samples started sounding decently at 16-17 bytes frame. still, we wanted more. the more combinations you have, the better probability that we can find better matches with longer frames, which equals better compression. before, we were using ADD or EOR methods separately, and comparing the resulting streams. why not have both? we called this MIX method. every frame could be done with one or the other method. but how to store this info without sacrificing too much? well, if we look at the offset data, it is in reality not 2 bytes, but since the highest offset is around $5fff, we can easily store it in 15 bit, using the other 1 bit for saving EOR/ADD flag. that gave us ~12,600,000 combinations. and finally, we doubled that to ~25,200,000 combinations, by adding one more bit which marked if the sequence would be "normal" (fwd) or reversed (rev). now, with that amount of possibilities, things looked promising, and statistics switched from bitch to a lady ;) another question is the comparison method. it should be procedural for our purposes, because we cannot compare every combination by ear. so we started with simple variance between the frame and the reference. variance is a arithmetic average of suqared differences, it is quite fast to calculate and intuitively, should give us decent results. and so it did. we still wanted to improve and deliberated and tested other methods. there was a concept to use calculation of a trend within the sequence, and then compare the slope and the intercept. the most sophisticated method i was thinking about was FFT, fast fourier transform which would give us the possibility to compare sequences by frequencies. and mojzesh actually coded that comparison algorhithm, but a bit suprisingly it did not give us superior results to variance comparison. few things played into that probably, the size of the frame, the low bit resolution of the samples themselves come to mind. so finally, we came back and stick to the variance method as the optimal one, both from the perspective of computational power needed, and quality of the final outcome. the resulting streams did sound good enough. now, 25 million comparisons per sample frame is far too much to test with 1mhz, we can safely agree on that. here is where mojzesh made it all possible with his mad pc coding skills! for the purpose of this 4k he has written a special tool that makes all the statistical comparisons with the blazing power of multi-threaded modern cpu. it makes all the analysis, that would probably take at least few days on the c64, in just few minutes on a pc. the tool was written in golang, and had undergone many changes, tweaks and optimisations during the last few weeks. without this effort from mojzesh, this intro would not exist, as i had to test many different versions of the sample, with different byte sizes, looking for the (close to) perfect balance between the size and quality of the output. so, that's the story behind the brute crunch method. as for the code on the commodore side, there was also a lot of work to make it all play out nicely and fit in the memory. much of the data inside the 4k is somehow compressed even before the final exec compression done with alz64. the 256-byte long sinus sequence is compressed with delta to save space. same goes for mahoney's $d418 value table for sample playback. even when we are not doing anything in the code, it is "nothing code" optimized for compression ;) ultimately, i managed to make it small enough to have some additional text displayed before decrunch ;) all in all, three weeks work. i hope you enjoy it. [mojzesh] hello folks! when wacek shared his idea of brute force sample compression on our arise's slack channel, i have instantly picked up the challenge. i knew from the beginning that this algorithm is rather very computationally intensive. so my natural choice of language for implementing it on a pc was golang, in which i love to code because it's powerful, easy, and fun to use. i highly recommend go whenever you're trying to build a tool which goal is to squeeze every bit of computing power out of all your cpu cores. as wacek explained already, our initial tests shown that a trivial implementation of this algorhithm would be way too slow, leading us to days of wasted time whilst waiting for the results. my tool was able to shrink that waiting time to just 35 seconds on my dual-xeon 40 threads machine :) thanks to goroutines! fft method was even more challenging, as the entire fft algorithm required to operate on complex numbers (with real and imaginary parts). my approach was to first calculate "rainbow" tables of fft spectrums per block of sample and per block of ROM "sample". almost 4gm of RAM was needed to compute all of the spectrums; unfortunately because of small blocks used by this method there were many very similar fft spectrums for completely different kinds of samples blocks (sounding differently), which was causing a lot of noises and distortions as result. so finally we've dropped this method and opted for the MIX one - and here it is. i had a lot of fun working on this project! enjoy! t h e e n d . ------------------------------------------------------------------------------- [c] arise designs 2019. w e a r e o n e . -------------------------------------------------------------------------------
[ back to the prod ]