Help with 4k
category: offtopic [glöplog]
Using crinklers /REPORT option and yasm/nasm can give you quite awesome results if you are willing to invest some time.
Even if it looks like a total pain to go 100% ASM, it's not that bad - since you are not going to write 40k LOC in ASM anyways.
I still claim that you can achieve the best results size/compression wise by taking the development path with the most control over all your bytes and that's going ASM.
Even if it looks like a total pain to go 100% ASM, it's not that bad - since you are not going to write 40k LOC in ASM anyways.
I still claim that you can achieve the best results size/compression wise by taking the development path with the most control over all your bytes and that's going ASM.
Based on my experiences byte-squeezing our winning 1k entry for Assembly this year, I'd say, go 100% asm if you want to get the most bang for byte.
Originally I thought that the machine code produced by Visual C is already so clean and compresses so well with Crinkler that it would be a waste of time trying to do better "manually". I mean, it's sometimes incredible what sort of optimizations it has figured out, it's almost like the C code was only some kind of "description of intent". But I was so wrong. At some point it started to look impossible to get the sort of features we wanted in the 1024 bytes allowed, so Rale ruthlessly cut away things from my music routine to get enough bytes for visuals. So I had no other choice than start fiddling with assembly code manually. (Well, actually I had already made pure asm music routines earlier, but somehow I couldn't get them to compress well enough - and they were musically the wrong style as well) So I took Visual C's asm output and started trying out different ways to say something equivalent, but so that it compresses better. This was a route that seemed to have no end, and no limit as to how well it can compress. The more hours I spent looking at the asm code and changing it, the more I learned about my own music routine, and the better I could get it to compress. Eventually, I was working in all asm code, and doing new features straight to asm, without doing a C version. It was great fun, and I don't remember doing that so much since the 1990s. :)
I also did the same thing to the program's main() routine, and I got quite an incredible amount of bytes off of it, even though I had originally thought it was already as good as it could possibly be. In the end, there was 0 bytes of code coming from any C stuff at all, it was all asm code. The point here is not at all "size optimization" as it was traditionally. It's about getting smaller output from Crinkler. A 100% increase in uncompressed size can really mean 50% reduction in compressed size! Some people have said this is useless waste of time, because what compresses and doesn't compress is totally random... but I say it is far from random, and if you spend hours looking at the asm code and trying out different things, you start to get a feeling to it, and you can actually have it mostly in your own control. Of course, there are surprises all the time, but it's not random, it can be learned, and it's totally doable. But if you don't go 100% asm, you are definitely not getting anywhere near the tightest possible packed code. (Unless Crinkler support is built in the C compiler so that it does all the work for you, but don't hold your breath)
Originally I thought that the machine code produced by Visual C is already so clean and compresses so well with Crinkler that it would be a waste of time trying to do better "manually". I mean, it's sometimes incredible what sort of optimizations it has figured out, it's almost like the C code was only some kind of "description of intent". But I was so wrong. At some point it started to look impossible to get the sort of features we wanted in the 1024 bytes allowed, so Rale ruthlessly cut away things from my music routine to get enough bytes for visuals. So I had no other choice than start fiddling with assembly code manually. (Well, actually I had already made pure asm music routines earlier, but somehow I couldn't get them to compress well enough - and they were musically the wrong style as well) So I took Visual C's asm output and started trying out different ways to say something equivalent, but so that it compresses better. This was a route that seemed to have no end, and no limit as to how well it can compress. The more hours I spent looking at the asm code and changing it, the more I learned about my own music routine, and the better I could get it to compress. Eventually, I was working in all asm code, and doing new features straight to asm, without doing a C version. It was great fun, and I don't remember doing that so much since the 1990s. :)
I also did the same thing to the program's main() routine, and I got quite an incredible amount of bytes off of it, even though I had originally thought it was already as good as it could possibly be. In the end, there was 0 bytes of code coming from any C stuff at all, it was all asm code. The point here is not at all "size optimization" as it was traditionally. It's about getting smaller output from Crinkler. A 100% increase in uncompressed size can really mean 50% reduction in compressed size! Some people have said this is useless waste of time, because what compresses and doesn't compress is totally random... but I say it is far from random, and if you spend hours looking at the asm code and trying out different things, you start to get a feeling to it, and you can actually have it mostly in your own control. Of course, there are surprises all the time, but it's not random, it can be learned, and it's totally doable. But if you don't go 100% asm, you are definitely not getting anywhere near the tightest possible packed code. (Unless Crinkler support is built in the C compiler so that it does all the work for you, but don't hold your breath)
Exactly.
you could opt to generate chunks of compressor challenged asm from c, if you so must
but for 1k thats moot
but for 1k thats moot
It's also entirely possible to hand-write asm that compresses worse than the stuff the C compiler creates... You need to spend hours with the code. It's not rocket science, all it takes is a lot of hours. The thing that IS difficult is judging if the stuff is actually good enough, can it be made good enough or should you throw it away completely. And then actually throwing it away... (at least this is difficult for me, YMMV)
Not sure if this is reassuring or depressing to read yzi's experience. With my 1ks, I always have the feeling that I could gain 30-50 bytes if I stopped optimizing the "wrong" code/way :( So many things to try.
asm yez
If someone thinks it's depressing... well the things you have to do are quite simple. Try to produce as much similar bytes and bits as possible. ;) If you have lots of bytes that Crinkler can't guess very well, i.e. the bytes are marked with red color, then it's a sign that you should find a different structure for it. Different order, different way of saying it. Or say a totally different thing. Forget the old ways of what "good" asm code looks like... It doesn't necessarily matter if you can keep things in registers, or use registers efficiently. It might not be so good to be able to have single byte instructions, if the opcode numbers happen to be totally unique and not found elsewhere in the code. Try to do the same thing with two-byte instructions that have something in common with some other bytes somewhere. Or even larger instructions.
I guess most tiny-intro coders have toyed with the idea of being able to express the intent of the code as some kind of formulas, and then letting a compiler brute-force through all possible ways of saying it, and choosing the combination that produces the smallest output. :) Maybe this could be a reality some day, but currently there are too many options to cover exhaustively, so it's more like art than brute-force calculation. In this regard, I think Javascript code has the most potential of compressing well, because (I guess) there's so much more ways to express the same intent. I'm not really a Javascript coder at all, so this is just guessing.
I guess most tiny-intro coders have toyed with the idea of being able to express the intent of the code as some kind of formulas, and then letting a compiler brute-force through all possible ways of saying it, and choosing the combination that produces the smallest output. :) Maybe this could be a reality some day, but currently there are too many options to cover exhaustively, so it's more like art than brute-force calculation. In this regard, I think Javascript code has the most potential of compressing well, because (I guess) there's so much more ways to express the same intent. I'm not really a Javascript coder at all, so this is just guessing.
less 1 bits in raw opcode is better compression. huff yo code. float is evil. token repeating would make the rest kinda compresible. look how api calls are a repeatable sequences. what more? it's kinda...
Try to do everything via the stack, try pushing immediate values instead of registers, try moving the pushes and pops around other instructions if it doesn't break the code, try to move the compare and jump instructions further from each other, try using other registers than you currently use, try to use less data and more code or the other way around
Some possibly outdated memories from eight years ago:
- Minimize the number of different external functions used
- Avoid local dynamic variables and arrays since they generate all sorts of initialization code. Static don't.
- Compiler parameters did make a difference
- Float is evil, but double is even worse. sinf/cosf and so on work with floats instead of doubles.
- Minimize the number of different external functions used
- Avoid local dynamic variables and arrays since they generate all sorts of initialization code. Static don't.
- Compiler parameters did make a difference
- Float is evil, but double is even worse. sinf/cosf and so on work with floats instead of doubles.
oo: no, I didn't completed my intro yet. I'm basically missing the music - wanna help?
Regarding the sound seek/pause/play, I wrote the following code, just call the related methods for seek/pause/resume..
Regarding the the 4k version of GNU Rocket, I basically wrote the following code, in debug mode the project includes a modified GNU Rocket source, it's very compact when compiled in standalone mode...
I also made a little utility to convert the Rocket exported files to a c++ const:
Regarding the sound seek/pause/play, I wrote the following code, just call the related methods for seek/pause/resume..
Regarding the the 4k version of GNU Rocket, I basically wrote the following code, in debug mode the project includes a modified GNU Rocket source, it's very compact when compiled in standalone mode...
I also made a little utility to convert the Rocket exported files to a c++ const:
Code:
Hope this helps...
// Animations and timeline
const int R4K_Rows[]= {
0, // eye_pos_x
0, // eye_pos_y
0, 400, // eye_pos_z
0, // eye_trg_x
0, // eye_trg_y
0, 400}; // eye_trg_z
const int R4K_Values[]= {
0, // eye_pos_x
-51, // eye_pos_y
896, 2048, // eye_pos_z
0, // eye_trg_x
256, // eye_trg_y
768, 1536}; // eye_trg_z
const char R4K_KeyType[]= {
0, // eye_pos_x
0, // eye_pos_y
1, 0, // eye_pos_z
0, // eye_trg_x
0, // eye_trg_y
1, 0}; // eye_trg_z
const char R4K_KeyCount[]= {
1, // eye_pos_x
1, // eye_pos_y
2, // eye_pos_z
1, // eye_trg_x
1, // eye_trg_y
2}; // eye_trg_z
What type of music do you need? Duration? Style? Tempo?
Hey again TLM,
and thank you very much for your reply and the piece of code.
Yesterday, after talking with two "4k-veterans" I ended up to use a different approach (which was my plan-B all the time): to use bass.dll with debug-release (no buffering, easy to sync with) during the syncinc, and only use 4klang with the actual 4k-release.
However, I was about to figure out how to customize the Rocket sync_player to "export" something else than those track-files. Your code may help me a lot with that (when that time comes, soon).
I also may try what I was trying at the first place: to do the syncing with 4klang also in the debug-release.
What comes to music (I mean helping you with a track), if you like my music (just find my two 4k-intros in Youtube (Black Patch, Decoloring Darkness), just let me know.
Thanks again,
oo
and thank you very much for your reply and the piece of code.
Yesterday, after talking with two "4k-veterans" I ended up to use a different approach (which was my plan-B all the time): to use bass.dll with debug-release (no buffering, easy to sync with) during the syncinc, and only use 4klang with the actual 4k-release.
However, I was about to figure out how to customize the Rocket sync_player to "export" something else than those track-files. Your code may help me a lot with that (when that time comes, soon).
I also may try what I was trying at the first place: to do the syncing with 4klang also in the debug-release.
What comes to music (I mean helping you with a track), if you like my music (just find my two 4k-intros in Youtube (Black Patch, Decoloring Darkness), just let me know.
Thanks again,
oo
Punqtured and oo, thanks for offering help with the music.
Currently, Ferris is having a fight with my synth. so far it's looking 50/50, it looks like he might manage to pull something together, but if he doesn't I'm back to the drawing board as I don't have a spare 1.5kb for a 4klang. I'll probably ask for help if the shit hits the fan...
oo, I'm running the following:
Debug mode: Slave GNU Rocket + audio sync ontop of an audio buffer (buffer is generated by the synth once)
Release mode: Standalone GNU Rocket + simple audio buffer play (no seeking, pause & resume)
Both modes get the "intro time" from the audio interface (Audio_GetCurrentTime)
Keep in mind, that I am no 4k expert in anyway. All I can say is that this setup works for me...
Currently, Ferris is having a fight with my synth. so far it's looking 50/50, it looks like he might manage to pull something together, but if he doesn't I'm back to the drawing board as I don't have a spare 1.5kb for a 4klang. I'll probably ask for help if the shit hits the fan...
oo, I'm running the following:
Debug mode: Slave GNU Rocket + audio sync ontop of an audio buffer (buffer is generated by the synth once)
Release mode: Standalone GNU Rocket + simple audio buffer play (no seeking, pause & resume)
Both modes get the "intro time" from the audio interface (Audio_GetCurrentTime)
Keep in mind, that I am no 4k expert in anyway. All I can say is that this setup works for me...
TLM,
yeah, maybe one option would be to "skip" the whole rocket_sync in the 4k-release. To gain that the "sync_save_tracks" and "save_track" methods could me customized in the way they give you c++ code to your busyloop (a timeline in a way).
I was thinking and dreaming about this the whole night ;).
Thanks again,
Olli
yeah, maybe one option would be to "skip" the whole rocket_sync in the 4k-release. To gain that the "sync_save_tracks" and "save_track" methods could me customized in the way they give you c++ code to your busyloop (a timeline in a way).
I was thinking and dreaming about this the whole night ;).
Thanks again,
Olli
There's also Clinkters you could use. It's definitely possible to do tracks <1k and there's some pretty good documentation for implementing it in your prod. And it provides a nice VST for the musician to use, too :) Only 2 weeks after it's release, Assembly 4k compo feature a couple of entries using it, so it's definitely doable in whatever little time you've got.
Punqtured, Good point and a good alternative. Thanks
TLM,
thanks for your help. I got everything working (rewriting everything, but your code helped with pause/resume). I also tried your tool for converting binary track-files to arrays - works quite ok (even though I completely rewrote the replayer).
I think I'll customize the "save_tracks" and "save_track" in the way they'll export a suitable file for including (automatically).
Thanks again,
oo
thanks for your help. I got everything working (rewriting everything, but your code helped with pause/resume). I also tried your tool for converting binary track-files to arrays - works quite ok (even though I completely rewrote the replayer).
I think I'll customize the "save_tracks" and "save_track" in the way they'll export a suitable file for including (automatically).
Thanks again,
oo
oo, that very cool!
can you please share the replayer and the save_tracks rewrite after you done with it?
can you please share the replayer and the save_tracks rewrite after you done with it?
TLM,
sure!
BR,
oo
sure!
BR,
oo
Hey TLM,
here's the code. I customized Rocket's "saving" in a way, it exports arrays automatically. After some heavy experiments I ended up - after all - using your "replayer" with only some minor changes.
device.c (the code is quite bloat, but it doesn't matter since it does not naturally affect to the size of the actual intro):
intro.cpp:
here's the code. I customized Rocket's "saving" in a way, it exports arrays automatically. After some heavy experiments I ended up - after all - using your "replayer" with only some minor changes.
device.c (the code is quite bloat, but it doesn't matter since it does not naturally affect to the size of the actual intro):
Code:
static int save_track(const struct sync_track *t, const char *path)
{
#ifdef OO
int i;
FILE *fp = fopen("rocket_tracks.cpp", "a");
fprintf(fp, "%d, ",t->num_keys);
fclose(fp);
for (i = 0; i < (int)t->num_keys; ++i) {
FILE *fp = fopen("rocket_rows.cpp", "a");
fprintf(fp, "%d, ",t->keys[i].row);
fclose(fp);
fp = fopen("rocket_values.cpp", "a");
fprintf(fp, "%1.2f, ",t->keys[i].value);
fclose(fp);
fp = fopen("rocket_types.cpp", "a");
fprintf(fp, "%d, ",t->keys[i].type);
fclose(fp);
}
#else
int i;
FILE *fp = fopen(path, "wb");
if (!fp)
return -1;
fwrite(&t->num_keys, sizeof(size_t), 1, fp);
for (i = 0; i < (int)t->num_keys; ++i) {
char type = (char)t->keys[i].type;
fwrite(&t->keys[i].row, sizeof(int), 1, fp);
fwrite(&t->keys[i].value, sizeof(float), 1, fp);
fwrite(&type, sizeof(char), 1, fp);
}
fclose(fp);
#endif
return 0;
}
void sync_save_tracks(const struct sync_device *d)
{
int i;
#ifdef OO
FILE *fp = fopen("rocket_tracks.cpp", "w");
fprintf(fp, "#define ROCKET_TRACKS %d\nconst int rocket_keyframes[]={\n",d->data.num_tracks);
fclose(fp);
//rows
fp = fopen("rocket_rows.cpp", "w");
fprintf(fp, "const int rocket_rows[]={\n");
fclose(fp);
//values
fp = fopen("rocket_values.cpp", "w");
fprintf(fp, "const float rocket_values[]={\n");
fclose(fp);
//types
fp = fopen("rocket_types.cpp", "w");
fprintf(fp, "const char rocket_types[]={\n");
fclose(fp);
#endif
for (i = 0; i < (int)d->data.num_tracks; ++i) {
const struct sync_track *t = d->data.tracks[i];
save_track(t, sync_track_path(d->base, t->name));
}
#ifdef OO
//endings
fp = fopen("rocket_tracks.cpp", "a");
fprintf(fp, "\n};");
fclose(fp);
fp = fopen("rocket_rows.cpp", "a");
fprintf(fp, "\n};");
fclose(fp);
fp = fopen("rocket_values.cpp", "a");
fprintf(fp, "\n};");
fclose(fp);
fp = fopen("rocket_types.cpp", "a");
fprintf(fp, "\n};");
fclose(fp);
#endif
}
intro.cpp:
Code:
//init music -------------------------------------------------------------------------------------------------------------------------
#ifdef _4KLANG
long music_start_offset = 0;
bool music_is_playing = false;
static const float music_samples_per_row = 60.0f * (float)SAMPLE_RATE / (float)BPM / (float)ROCKET_ROWS_PER_BEAT;
float music_get_position(HWAVEOUT h)
{
#if defined(_ROCKET)
return (float)(music_start_offset + MMTime.u.sample);
#else
return (float)MMTime.u.sample;
#endif
}
float rocket_get_row(HWAVEOUT h)
{
float pos = music_get_position(h);
return pos / music_samples_per_row;
}
#endif
//init rocket ------------------------------------------------------------------------------------------------------------------------
#ifdef _ROCKET
#include "../sync/sync.h"
#ifndef SYNC_PLAYER
static void rocket_pause(void *d, int flag)
{
HWAVEOUT h = *((HWAVEOUT *)d);
if (flag){
waveOutPause(h);
music_is_playing = false;
} else {
waveOutRestart(h);
music_is_playing = true;
}
}
static void rocket_set_row(void *d, int row)
{
HWAVEOUT h = *((HWAVEOUT *)d);
music_start_offset = row * (long)music_samples_per_row;
WaveHDR.lpData = (LPSTR)(float*)(lpSoundBuffer + music_start_offset * 2);
WaveHDR.dwBufferLength = (MAX_SAMPLES - (DWORD)music_start_offset) * 2;
//send buffer to waveOut
waveOutReset ( h );
waveOutPrepareHeader( h, &WaveHDR, sizeof(WaveHDR) );
waveOutWrite ( h, &WaveHDR, sizeof(WaveHDR) );
//put on pause
if (!music_is_playing)
waveOutPause(h);
}
static int rocket_is_playing(void *d)
{
return music_is_playing;
}
static struct sync_cb rocket_cb = {
rocket_pause,
rocket_set_row,
rocket_is_playing
};
#endif
#endif
//init rocket-player (for 4k-release)-----------------------------------------------------------------------------------------
#ifdef _ROCKET_PLAYER
#include "..\Debug Client\rocket_tracks.cpp"
#include "..\Debug Client\rocket_rows.cpp"
#include "..\Debug Client\rocket_values.cpp"
#include "..\Debug Client\rocket_types.cpp"
#define KEY_STEP 0
#define KEY_RAMP 3
#define KEY_SMOOTH 2
void rocket_player(void *d, float fp[16], int first_fp)
{
HWAVEOUT h = *((HWAVEOUT *)d);
float row = (float)rocket_get_row(h);
int index = 0;
for (UINT32 i = 0; i < ROCKET_TRACKS; i++)
{
int key = rocket_keyframes[i] - 2;//last?
for (;(key>=0) && (rocket_rows[index + 1] < row); key--, index++);
//type, ratio
int type = rocket_types[index];
float ratio = type == KEY_STEP ? 0 : (row - rocket_rows[index]) / (rocket_rows[index + 1] - rocket_rows[index]);
if (ratio > 1) ratio = 1;
if (type == KEY_RAMP) ratio *= ratio;
if (type == KEY_SMOOTH) ratio *= ratio * (3 - 2 * ratio);
//interpolation
fp[first_fp + i] = rocket_values[index] + (rocket_values[index + 1] - rocket_values[index]) * ratio;
//next key
index += key+2;
}
}
#endif
Code:
//parameters to the shaders
static float fp[4*4];
//loop
do
{
#ifdef _4KLANG
waveOutGetPosition(hWaveOut, &MMTime, sizeof(MMTIME));
//time
fp[2] = music_get_position(hWaveOut);
//envelope
fp[3] = (&_4klang_envelope_buffer)[((MMTime.u.sample >> 8) << 5) + 2*2+0];
#else
fp[2] = (float)GetTickCount()*44.1f;//44100/1000
#endif
//rocket
#ifdef _ROCKET
float row = rocket_get_row(hWaveOut);
//get values
fp[4] = (float)sync_get_val(r_red, row);
fp[5] = (float)sync_get_val(r_green, row);
//sync the tracker
#ifndef SYNC_PLAYER
if (sync_update(rocket, (int)floor(row), &rocket_cb, (void *)&hWaveOut))
sync_connect(rocket, "localhost", SYNC_DEFAULT_PORT);
#endif
#endif
//rocket_player
#ifdef _ROCKET_PLAYER
rocket_player(hWaveOut, fp, 4);//first index
#endif
glColor3us((unsigned short)GetTickCount(),0,0);
glRects(-1,-1,1,1);
SwapBuffers(hDC); //wglSwapLayerBuffers( hDC, WGL_SWAP_MAIN_PLANE );
//update shader
((PFNGLUNIFORM4FVPROC)wglGetProcAddress("glUniform4fv"))( ((PFNGLGETUNIFORMLOCATIONPROC)wglGetProcAddress("glGetUniformLocation"))(p,"fp"), 4, fp );
#ifdef _4KLANG
}while (MMTime.u.sample < MAX_SAMPLES && !GetAsyncKeyState(VK_ESCAPE) );
#else
}while (!GetAsyncKeyState(VK_ESCAPE) );
#endif
Wow, cool rocket-job people. I wonder if there's something I can do to make these things smoother... like upstreaming code to write C code for the track-data. The tiny replayer could also be included in the distribution, that way it's easier to keep it in sync with the rest of the code, no?
kusma, I think it will be great to have this included in the source.
I think that the basic idea is to have something like:
void getAllTracks(float row, float* curerntValues)
An interesting thing I have done in my failed 4k attempt is to have a shader array uniform that has the tracks values array assign to it. It basically allows shader access to any track channel without adding any code.
I think that the basic idea is to have something like:
void getAllTracks(float row, float* curerntValues)
An interesting thing I have done in my failed 4k attempt is to have a shader array uniform that has the tracks values array assign to it. It basically allows shader access to any track channel without adding any code.
kusma: brilliant idea.
Having a common tiny replayer for release-mode compiles of prods would help a lot.
Having a common tiny replayer for release-mode compiles of prods would help a lot.