4k intro, C vs assembler?
category: code [glöplog]
The interactive GCC is interesting Scali! I though there was a difference between --i and i--... where --i would decrement the value without reading within a register. But, seem this is not the case.
i-- and --i are only different if you try to read them. --i decrements before the read, i-- decrements after. Otherwise they're interchangeable.
Seeing threads like this shows how the scene is dead[/sarcasm]
You guys rock.
You guys rock.
I was thinking that --i was translate into a single instruction while a i-- was translated into i = i - 1, where a register was used in order to compute it. Well, I guess modern compiler look around and optimize things. But, assuming no optimization, I though that was the way it was translated.
"Performance" x86 assembler coding is extremely difficult due to all the things to consider with pipelining, prefetching, cache size,etc... but size coding is much easier if you don't actually care about the performance:
Just pickup whatever instruction would work nicely, possibly use all these obsolete 8086 instructions (rep stos anyone?), make sure you output the actual generated code to not get tricked by hidden prefixes, heck, you can even cheat by using the segment registers if you want.
It's not because they are set to zero by default in modern OS that you can't use them if you know what you are doing I guess ?
Just pickup whatever instruction would work nicely, possibly use all these obsolete 8086 instructions (rep stos anyone?), make sure you output the actual generated code to not get tricked by hidden prefixes, heck, you can even cheat by using the segment registers if you want.
It's not because they are set to zero by default in modern OS that you can't use them if you know what you are doing I guess ?
Dbug: like Blueberry et al said in this thread, the only thing that matters is the size of your compressed i.e. final executable code. I don't think it's at all trivial to "just pickup whatever instruction would work nicely". To get optimal results, you'd basically have to try out with brute force all combinations of equivalently behaving instructions, and pick the one that compresses the best as a whole.
yzi: What about a pseudo assembler with some macro definition of instructions with alternate representations of semantically equivalent constructs, you write the program using the macro instructions and you let the build system try all the combinations and then try to pack them :)
There is a saying that states that a program can't have no bugs.
So is it true to say that a program, in our case intros, can ALWAYS be size optimized?
This would imply that the limit would be 1 byte? Wich is impossible?
So is it true to say that a program, in our case intros, can ALWAYS be size optimized?
This would imply that the limit would be 1 byte? Wich is impossible?
Something like that, I guess.
Quote:
I though there was a difference between --i and i--... where --i would decrement the value without reading within a register. But, seem this is not the case.
Yes, that is a common 'wisdom' in C/C++ circles. Not sure where that originally came from, but it's nonsense obviously. There's no reason why a register can't be used in both scenarios.
There IS a difference, but that only applies to objects. Namely, prefix and postfix need to be implemented as two separate operators, in which case you can have different code for one and the other:
http://stackoverflow.com/questions/3846296/overloading-of-the-operator
As you can see, the postfix version makes a copy of the object, which would make it less efficient (which may translate in it not just using a register directly, who knows, that may be where all that nonsense started years ago).
Lol that would actually be a funny compo: The 1 byte challenge.
"Do realtime gfx and music with only 1 byte"
And rrrola would do a prod with 3d spheres and bumb-mapping...
"Do realtime gfx and music with only 1 byte"
And rrrola would do a prod with 3d spheres and bumb-mapping...
There're already some 0 byte intros out there...
wasn't there a topic about that 1byte intro already? and it's impossible. you need atleast a load some arithmetic loop with a store and perhaps a jump instruction and minimum 1 byte data.
sure thing i'm thinking about that again. just about how to transform the compressed stream to get it compressible again? :D
sure thing i'm thinking about that again. just about how to transform the compressed stream to get it compressible again? :D
Quote:
lol, this is funny...And rrrola would do a prod with 3d spheres and bumb-mapping...
Old thread, but I'll write this here anyway.
I found a very nice C-to-asm workflow last year when size-optimizing the Tendrils 1k intro. I took Visual Studio's asm output listing and basically copy-pasted it into the original C source code (some syntax changes and variable name un-mangling were necessary) where it had come from, replacing the main() function with a pure asm version, but as inline asm, not using a stand-alone assembler. Then I started changing the code piece by piece, by looking at Crinkler's HTML report, and trying out different ways to say the same thing. I was of course also able to identify and remove machine code statements or blocks of statements that were simply unnecessary. In the end, it was 100% hand-crafted asm, even though it was nominally "Visual C" code. But anyway, we ended up with a quite nice code structure where it's possible to even plug in C functions for trying out something quickly.
I don't know how well this translateś from 1k to 4k, but maybe someone finds it useful.
I found a very nice C-to-asm workflow last year when size-optimizing the Tendrils 1k intro. I took Visual Studio's asm output listing and basically copy-pasted it into the original C source code (some syntax changes and variable name un-mangling were necessary) where it had come from, replacing the main() function with a pure asm version, but as inline asm, not using a stand-alone assembler. Then I started changing the code piece by piece, by looking at Crinkler's HTML report, and trying out different ways to say the same thing. I was of course also able to identify and remove machine code statements or blocks of statements that were simply unnecessary. In the end, it was 100% hand-crafted asm, even though it was nominally "Visual C" code. But anyway, we ended up with a quite nice code structure where it's possible to even plug in C functions for trying out something quickly.
I don't know how well this translateś from 1k to 4k, but maybe someone finds it useful.
...maybe I wasn't clear about the point. I tried to say that by using that approach, you don't have to choose between C or asm. You can do your quick sketching and prototyping in C, and then gradually move to 100% hand-written asm, when you're happy with the contents.
Guys, I know that this is not the most relevant thing for 4K on a PC, but the very fact that you can compress your assembly by a compressor is telling you something about how size (in)efficient your code is. My real experience with size codes is assembly on 8-bit platforms; and there I would normally expect that a program that was truly well optimized for size cannot actually be compressed any further. Compressors are pretty stupid normally, you should always be able to beat one. Compressing by one of the standard compressors should literally increase the size of the program, even if decompresser length is not taken into the account.
TinyC+fsg(win32)
introspec: PC opcode formats and sizes are vastly different than 8-bit platforms though.
Gargaj, I understand, however... :)
I was always curious if performance-wise there is a gain to do the tricks from 8bits on modern PCs, like extreme unrolled codes, very specifically hardwired to the effect you want to code. You know, unrolled codes for every pixel of a tunnel effect. When people say compilers optimize they mean typical optimizations a robot can do. At least it's not crazy optimize ideas and special unrolled codes a human brain can think of.
But maybe it's a terror to have unrolled code for 1080p tunnel, overflowing the cache or something. Maybe it's not worth it when you have so powerful machines, many cores processing and such. I am just curious if someone tried it or if it's not gonna get it. Maybe I will try in the future with some of the easy stuff, that only need per scanline unrolled codes and compare with a similar C code.
But maybe it's a terror to have unrolled code for 1080p tunnel, overflowing the cache or something. Maybe it's not worth it when you have so powerful machines, many cores processing and such. I am just curious if someone tried it or if it's not gonna get it. Maybe I will try in the future with some of the easy stuff, that only need per scanline unrolled codes and compare with a similar C code.
ASSsembler, because there is ASS in it...
introspec: you have it completely wrong, and it is just as wrong for 8-bit or any bit machine code. For some ideas, read about Crinkler and think about what it does. There's a good article about it on the net.
What would you think, which of these two alternatives is most probably better for modern size-coding?
...given that "push eax" is a one-byte instruction.
It does not matter how big your uncompressed size is. The only thing that matters is, how small your compressor can make it. And your job is to find ways to produce stuff that the compressor likes.
What would you think, which of these two alternatives is most probably better for modern size-coding?
Code:
push 0
push 0
push 0
push 0
Code:
sub eax, eax
push eax
push eax
push eax
push eax
...given that "push eax" is a one-byte instruction.
It does not matter how big your uncompressed size is. The only thing that matters is, how small your compressor can make it. And your job is to find ways to produce stuff that the compressor likes.
introspec: I know where you're coming from... there is an art to writing highly compact code, and it's an art that's particularly valued in the 8-bit world. However, if your goal is to minimize the size of the compressed output, then there are two possible approaches: write highly compact code, or write highly compressible code. I do think that writing compressible code can be an art too - it's not just a sign of inferior coding. in fact, I'll stick my neck out and say that this was the key insight that Farbrausch had back in 2000, which has revolutionised PC intros ever since.
To give a concrete example: when I made Haiku (which to this day is still the only thing I've done which really explores this side of size-coding), I was very much in the 'compact code' school of thought. The music player is a whole load of nested loops: repeated phrases, consisting of repeated notes, where each note decays according to a loop counter. If I was doing it all over again, I would consider unrolling most of those loops (as far as possible within the Spectrum's memory limitations) and letting the compressor do the work.
To give a concrete example: when I made Haiku (which to this day is still the only thing I've done which really explores this side of size-coding), I was very much in the 'compact code' school of thought. The music player is a whole load of nested loops: repeated phrases, consisting of repeated notes, where each note decays according to a loop counter. If I was doing it all over again, I would consider unrolling most of those loops (as far as possible within the Spectrum's memory limitations) and letting the compressor do the work.
I mean, Crinkler's arithmetic coder can encode stuff to something like one or two bits per uncompressed instruction. I find it very, very hard to understand how you could somehow code asm instructions into something smaller than that, without compression. ;) LOL