pouët.net

Go to bottom

Noob 64-bit code questions (Win64)

category: code [glöplog]
 
Hi,

I never bothered much about 64-bit code so far, but now I would like to have a closer look, maybe someone can enlighten the ignorant newbie a bit :)

I know Win64 uses the LLP64 scheme ('long long, pointers are 64-bit'), meaning that sizeof(int) == 4 and sizeof(long) == 4. Furthermore, Win64 defines BOOL as 32-bit
Code:typedef int BOOL;


My question: Is there a performance penalty when using the x64 instruction set and accessing non 64-bit variables? I assume this mainly concerns loading from memory into registers? What happens when I use an uint32_t as a counter variable in a for loop? Will the generated code just use a full 64-bit register and all compares/branches will use full 64-bit access?

Also, if anyone has some good links regarding these topics, please share :)
Thanks.
added on the 2021-10-18 12:33:19 by arm1n arm1n
There's a table of latency, throughput and port usage for instructions. I imagine that performance penalties come from 64-bit values filling the caches faster than 32-bit values though, and not from the instructions themselves. The performance characteristics of modern CPUs can be surprising and unintuitive, so either way it's best to benchmark!
added on the 2021-10-18 13:22:16 by absence absence
Standard data size on x64 is identical to 32bit x86, except for pointers (obviously).

Using 32bit values in registers usually performs equally well as 64bit, however using them in memory halves the memory footprint (read: cache usage and bandwidth).

Note that using 64bit values requires using the REX prefix byte in the instruction, thus increasing code size. This is negligeable, though, as it is also needed as soon as one of the extended registers (R8-R15, etc.) is used, and using them is usually more efficient than swapping non-extended registers or reverting to memory.

Converting values between different bit sizes often requires an extra instruction, often 1 cycle ops, but still better to avoid.

Using 16bit values (typically: "short" in C) requires an operand size prefix byte, which not only increases code size but on some CPUs may also have an additional performance penalty.

Depending on what you do better use "bool", which is usualy 8 bit only, instead of the (outdated) BOOL. Accessing them is usually fast as accessing single bytes is a common task, also the compiler may choose from multiple instructions to pick out the best suitable one.

There is on thing to avoid, old but increasingly important:

Always align the data structures in memory, that is any 4 byte value should reside at memory addressed which are a multiple of 4 and 64 bit values at a multiple of 8.
Normally the compiler and memory management does this for you, unless you apply the "#pragma pack" compiler directive with a value lower than the contained data sizes.

Note, if you want to write fast and portable code better use either plain int/short/char or int16_fast_t/int32_fast_t/int64_fast_t etc. instead of the fixed size variants:
These are only recommended if you require an exact representation of the memory footprint, e.g. when reading or writing files directly from/to disk or network, or when passing them by pointer/reference to external APIs.
added on the 2021-10-18 17:30:06 by T$ T$
Quote:
Note, if you want to write fast and portable code better use either plain int/short/char or int16_fast_t/int32_fast_t/int64_fast_t etc. instead of the fixed size variants


I wonder what criteria compiler vendors use to decide the width of e.g. int_fast32_t on x86-64. GCC uses 32-bit in Windows and 64-bit in Linux, which seems rather arbitrary and makes me question the value of those typedefs.
added on the 2021-10-18 18:16:17 by absence absence
In 64-bit mode, operations on signed 32-bit int will be slower than operations on signed 64-bit int, as you need separate sign-extending instructions (the default is zero-extension).

In general, the usual optimization advice applies: Write good code first, then run the profiler and look at the generated code in your hotspot.
added on the 2021-10-18 18:27:06 by Sesse Sesse
Quote:
Note, if you want to write fast and portable code better use either plain int/short/char or int16_fast_t/int32_fast_t/int64_fast_t etc. instead of the fixed size variants:

I'd say if it is supposed to be portable and you don't want to use the provided typedefs, better roll your own typedefs or at least statically assert that the built-in types can hold the values you intend to store in them ("int" doesn't have to be 32 bits wide on every platform, and especially with "long" there are actual differences between 64-bit Windows and Linux code).
Same gcc version? Same standard library, same standard library version?

They do have value if implemented properly. For instance IIRC MSVC uses 32 bit types for fast16_t, just as one would expect.
added on the 2021-10-18 18:33:32 by Moerder Moerder
Thanks for all the hints so far, much appreciated!
added on the 2021-10-18 19:46:48 by arm1n arm1n

login

Go to top