Let's Get Brute'al by Arise :: nfo

Let's Get Brute'al by Arise

 -------------------------------------------------------------------------------
                                       " l e t ' s   g e t   b r u t e ' a l " .
 -------------------------------------------------------------------------------
  4k intro by arise designs, for the almighty commodore 64. still not dead. yet.
      released at moonshine dragons party 2019 @ opole, poland - 17-19 may 2019.
                                                  it reached some place there ;)
 c r e d i t s .
 -------------------------------------------------------------------------------
 code:            wacek. sample replay routine uses mahoney's method.
 help/tools code: mojzesh.
 music/sampling:  wacek. samples taken from "technoid (dub mix)" by technoid.
                  these are 11 kHz 8-bit samples, accept no limitations ;)
 "graphics":      wacek.
 design:          bimber & wacek.
 ideas:           mojzesh & wacek.

 r e q u i r e m e n t s .
 -------------------------------------------------------------------------------
 the intro requires 8580 sid and unmodified roms to function properly. please 
 use the supplied tester executable to verify. in case of vice, set up 8580 and
 disable digi booster! as it creates unpleasant distortion clicks.
 use the real hardware if possible, as it just sounds better on real machine.
 if the samples sound crap, you're probably doing something wrong ;)

 t h i s   i s   i m p o r t a n t .
 -------------------------------------------------------------------------------
 kordee, zdrowiej brachu! trzymamy kciuki.

 t h e   s t o r y .
 -------------------------------------------------------------------------------
 [wacek]
 the basic idea behind this intro was born in my head many years ago, probably
 still in the '90s. the idea was to pack samples in a lossy way, using reference
 byte sequences present in c64 roms memory. mojzesh gave it an appropriate name
 when working on this intro: "brute crunch". it sums up the idea quite nicely ;) 
 but back in the 90s i did not consider the available sampling replay methods
 good enough to play with that idea. especially that i hated 4bit samples, and 
 owned a 8580 based c64 ;) so the idea did not come into any execution.
 that changed recently with many new 8-bit sample playback routines invented,
 out of which mahoney's one i consider superior both in quality and simplicity ;)

 so, "brute crunch" came back to me as an idea to do a 4k with some samples.
 but, what is brute crunch anyway?

 we take a sample sequence (source), and we take the "static" memory we have
 meaning basic, chargen and kernal roms (reference). then we decide on a "frame"
 (number of bytes) that we will be searching in the reference memory. 
 we choose the closest one (comparison method is a separate issue), and we store
 it as offset of the reference memory (2 bytes). so, for example, if we use 
 32 bytes frame, we compress the whole sample stream to ~6% of original size.
 and we all know that samples compress lousy, right? ;)

 sounds great and easy, right? fuck nope. the devil is in the details.

 first of all, statistics are a bitch. taking into account only the available 
 roms, we have 20kB of reference, so roughly 20,400 byte combinations to choose
 from. that is far from enough, and the results were obviously crap.

 so the first thing we added to the method was 4kB of some ram generated bytes.
 my music producing intution suggested some sinuses with different amplitudes
 and cycles, and a basic flat climbing 0-255 sequence. as it turnes out, with
 the chosen sample that was an excellent choice (what a shocker ;).

 that gave us ~24,500 combinations. better, but still not enough. we needed
 big data! so we thought about adding one more byte to the result, as a byte
 modifier. we started with the ADD and EOR methods. so, every sequence would
 be modified by $00,$01,$02...$ff by either ADDing that modifier byte to every
 byte in the sequence with carry discarded (ADD method), or EORing every byte
 with the modifier byte (EOR method).
 now, the amount of possible combinations rapidly grew to ~6,300,000 :) and
 statistics became far more favourable. the crunched samples started sounding
 decently at 16-17 bytes frame.

 still, we wanted more. the more combinations you have, the better probability
 that we can find better matches with longer frames, which equals better 
 compression. 

 before, we were using ADD or EOR methods separately, and comparing 
 the resulting streams. why not have both? we called this MIX method. every
 frame could be done with one or the other method. but how to store this info
 without sacrificing too much? well, if we look at the offset data, it is in
 reality not 2 bytes, but since the highest offset is around $5fff, we can
 easily store it in 15 bit, using the other 1 bit for saving EOR/ADD flag.
 that gave us ~12,600,000 combinations.

 and finally, we doubled that to ~25,200,000 combinations, by adding one more
 bit which marked if the sequence would be "normal" (fwd) or reversed (rev).

 now, with that amount of possibilities, things looked promising, and 
 statistics switched from bitch to a lady ;)

 another question is the comparison method. it should be procedural for
 our purposes, because we cannot compare every combination by ear. so we 
 started with simple variance between the frame and the reference. 
 variance is a arithmetic average of suqared differences, it is quite fast
 to calculate and intuitively, should give us decent results.
 and so it did.

 we still wanted to improve and deliberated and tested other methods. there
 was a concept to use calculation of a trend within the sequence, and then
 compare the slope and the intercept. 
 the most sophisticated method i was thinking about was FFT, fast fourier
 transform which would give us the possibility to compare sequences by
 frequencies. and mojzesh actually coded that comparison algorhithm, but
 a bit suprisingly it did not give us superior results to variance comparison.
 few things played into that probably, the size of the frame, the low bit 
 resolution of the samples themselves come to mind.
 so finally, we came back and stick to the variance method as the optimal one,
 both from the perspective of computational power needed, and quality of the 
 final outcome. the resulting streams did sound good enough.

 now, 25 million comparisons per sample frame is far too much to test with
 1mhz, we can safely agree on that. here is where mojzesh made it all possible
 with his mad pc coding skills! for the purpose of this 4k he has written
 a special tool that makes all the statistical comparisons with the blazing
 power of multi-threaded modern cpu. it makes all the analysis, that would 
 probably take at least few days on the c64, in just few minutes on a pc.
 the tool was written in golang, and had undergone many changes, tweaks and
 optimisations during the last few weeks. without this effort from mojzesh,
 this intro would not exist, as i had to test many different versions of the
 sample, with different byte sizes, looking for the (close to) perfect
 balance between the size and quality of the output.

 so, that's the story behind the brute crunch method. as for the code on the
 commodore side, there was also a lot of work to make it all play out nicely
 and fit in the memory. much of the data inside the 4k is somehow compressed
 even before the final exec compression done with alz64. the 256-byte long 
 sinus sequence is compressed with delta to save space. same goes for mahoney's
 $d418 value table for sample playback. even when we are not doing anything 
 in the code, it is "nothing code" optimized for compression ;)
 ultimately, i managed to make it small enough to have some additional text
 displayed before decrunch ;)

 all in all, three weeks work. i hope you enjoy it.

 [mojzesh]
 hello folks! when wacek shared his idea of brute force sample compression on 
 our arise's slack channel, i have instantly picked up the challenge. i knew 
 from the beginning that this algorithm is rather very computationally intensive. 
 so my natural choice of language for implementing it on a pc was golang, 
 in which i love to code because it's powerful, easy, and fun to use. i highly 
 recommend go whenever you're trying to build a tool which goal is to squeeze 
 every bit of computing power out of all your cpu cores.

 as wacek explained already, our initial tests shown that a trivial 
 implementation of this algorhithm would be way too slow, leading us to days 
 of wasted time whilst waiting for the results. my tool was able to shrink that 
 waiting time to just 35 seconds on my dual-xeon 40 threads machine :) thanks
 to goroutines! 

 fft method was even more challenging, as the entire fft algorithm required to
 operate on complex numbers (with real and imaginary parts). my approach was 
 to first calculate "rainbow" tables of fft spectrums per block of sample and 
 per block of ROM "sample". almost 4gm of RAM was needed to compute all of the 
 spectrums; unfortunately because of small blocks used by this method there were 
 many very similar fft spectrums for completely different kinds of samples 
 blocks (sounding differently), which was causing a lot of noises 
 and distortions as result.

 so finally we've dropped this method and opted for the MIX one - and here it
 is. i had a lot of fun working on this project! enjoy!

 t h e   e n d .
 -------------------------------------------------------------------------------
                                 [c] arise designs 2019.   w e   a r e   o n e .
 -------------------------------------------------------------------------------
[ back to the prod ]