guess what this function does
category: general [glöplog]
static inline uint32_t YeahImNotGonnaGiveYouTheNameOfCourse(uint32_t v)
{
uint32_t tmp = v - ((v >> 1) & 0x55555555);
tmp = (tmp & 0x33333333) + ((tmp >> 2) & 0x33333333);
uint32_t r = ((tmp + (tmp >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
return r;
}
BTW I didn't come up with this
{
uint32_t tmp = v - ((v >> 1) & 0x55555555);
tmp = (tmp & 0x33333333) + ((tmp >> 2) & 0x33333333);
uint32_t r = ((tmp + (tmp >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
return r;
}
BTW I didn't come up with this
i'll let ryg do this, he's into this kinky stuff.
That function counts the number of set bits in a value.
wild stab in the dark, its one of those interview questions about counting the number of 1 bits ? There are millions of variations but this looks like one of them.
http://gurmeetsingh.wordpress.com/2008/08/05/fast-bit-counting-routines/
I must say that __builtin_popcount in GNU was new to me.
Anyway, yeah, one of the most pointless things to know.
I must say that __builtin_popcount in GNU was new to me.
Anyway, yeah, one of the most pointless things to know.
SSE4.2 has POPCNT
....aaaand the function is taken from this site.
How long till Adok finds this thread and tries to impress everyone?
Counting bits isn't useless.
yes, i'm into this kind of stuff, and yes, it's population count. oooold!
i'll shoot a few back that should be a bit more fun:
(both pixel manipulation stuff)
the advantage of this kind of tricks is (or well, was) that they are easy to generalize to "non-homogenous" pixel formats (e.g. rgb565), where you can't simply use mmx instructions :)
though my favorite one by far is this (i've seen it first described by blinn):
as for bit tricks that are actually useful, the "chunky to planar-esque" bitwise-transpose tricks often come in handy in SIMD code. even though you really only need the key insight (decomposing a matrix transpose into several block matrix transposes) and then can do the rest with pack/unpack/shift instructions (without the xor trickery).
time for link galore!
http://aggregate.org/MAGIC/
http://graphics.stanford.edu/~seander/bithacks.html
http://home.pipeline.com/~hbaker1/hakmem/hakmem.html
http://realtimecollisiondetection.net/blog/?p=90
http://realtimecollisiondetection.net/blog/?p=78
and knuth has a pretty good (and systematic) collection too.
and with that, i shall leave you for the evening. merry christmas.
(this post was brought to you by the number 2 and the letter pi).
i'll shoot a few back that should be a bit more fun:
Code:
static inline uint32 mystery1(uint32 a,uint32 b)
{
return (a & b) + (((a ^ b) >> 1) & 0x7f7f7f7f);
}
static inline uint32 mystery2(uint32 a,uint32 b)
{
return ((a & ~b) & 0x80808080u) ^ ((a & 0x7f7f7f7f) + (b & 0x7f7f7f7f));
}
(both pixel manipulation stuff)
the advantage of this kind of tricks is (or well, was) that they are easy to generalize to "non-homogenous" pixel formats (e.g. rgb565), where you can't simply use mmx instructions :)
though my favorite one by far is this (i've seen it first described by blinn):
Code:
int mul255(int a,int b) // =round(a*b/255) for a,b small enough (-255<=a,b<=255 always works)
{
int t = a*b + 128;
return (t + (t>>8)) >> 8;
}
as for bit tricks that are actually useful, the "chunky to planar-esque" bitwise-transpose tricks often come in handy in SIMD code. even though you really only need the key insight (decomposing a matrix transpose into several block matrix transposes) and then can do the rest with pack/unpack/shift instructions (without the xor trickery).
time for link galore!
http://aggregate.org/MAGIC/
http://graphics.stanford.edu/~seander/bithacks.html
http://home.pipeline.com/~hbaker1/hakmem/hakmem.html
http://realtimecollisiondetection.net/blog/?p=90
http://realtimecollisiondetection.net/blog/?p=78
and knuth has a pretty good (and systematic) collection too.
and with that, i shall leave you for the evening. merry christmas.
(this post was brought to you by the number 2 and the letter pi).
The first one looks like it is an average of two 4 component colours (0..255) packed into a and b. If true, thats a super cool trick.
I cant get the second one. At a guess it does the same for signed values?
I cant get the second one. At a guess it does the same for signed values?
poops on a baby
Adok need not impress anyone. :)
ryg: Write something for Hugi!
ryg: Write something for Hugi!
merry christmas to you too.
duffman: yeah, nice bit counter. it proceeds from fine grain to coarse, that was easy. but the first step.. i wouldn't have thought of that.
ryg:
..and since they will release everything without further ado you really can send him anything... GUTEN RRRRRUTSCH, adok!
..and since they will release everything without further ado you really can send him anything... GUTEN RRRRRUTSCH, adok!
ryg: what i always did was:
return (a>>1)&0x7f7f7f7f + (b>>1)&0x7f7f7f7f;
it wasn't 4-component pixel manip. but for water/fire effects. it's nice that your version would probably be faster on some implementations (a shift traded against an ALU operation).
return (a>>1)&0x7f7f7f7f + (b>>1)&0x7f7f7f7f;
it wasn't 4-component pixel manip. but for water/fire effects. it's nice that your version would probably be faster on some implementations (a shift traded against an ALU operation).
correction: with implementations i mean cpu architectures.
earxmas: The difference being that yours has some undesirable rounding behaviour. E.g. the average of 0xff and 0xff becomes 0xfe. Oh and you need some more brackets. :)
xmas device: yeah, that's true.
.. come to think of it, i probably did shift and mask afterwards, which made it alot faster.
.. come to think of it, i probably did shift and mask afterwards, which made it alot faster.
Guess what this is..
*hint* It has something to do with 565 packed pixels...
*hint* It has something to do with 565 packed pixels...
Code:
unsigned int inl_WtfPixelOp(unsigned int a, unsigned int b)
{
unsigned int sum = a+b;
unsigned int mask = ((sum & 0x08010020)>>5)*63;
return (sum | mask) & 0x07E0F81F;
}
my second function was simply (a+b-128) without saturation.
torus: saturating add. green is in the top 16bits, rb in the low 16bits.
torus: saturating add. green is in the top 16bits, rb in the low 16bits.
v3nomsoup: "Guten Rutsch" - same to you! But ryg's contributions could even be published uncontrolled. He writes only politically correct articles. :)
ryg, that's it.. saturating add.
btw - on the C64x+ DSP I'm using the following one:
_shfl interleaves an integer bit-wise with 0-bits (e.g. Pow (x,2) over the Galious Field 2). _deal does more or less the opposite (sorts even bits into MSB16 and odd bits into LSB16).
It took me quite a while to come up with this one. It runs roughly twice as fast as the version I've posted above and does not need interleaving of the color-channels.
btw - on the C64x+ DSP I'm using the following one:
Code:
unsigned int inl_SatAdd565 (ui16 a, ui16 b)
{
// for C64x+ DSP
unsigned int mask = ((15<<11)|(31<<5)|15)<<16;
unsigned int temp = _deal (_shfl (a) + _shfl (b | mask));
unsigned int color = temp;
unsigned int carry = temp;
if (carry & 0x80000000) color |= 0xf800;
if (carry & 0x04000000) color |= 0x07e0;
if (carry & 0x00100000) color |= 0x001f;
return color;
}
_shfl interleaves an integer bit-wise with 0-bits (e.g. Pow (x,2) over the Galious Field 2). _deal does more or less the opposite (sorts even bits into MSB16 and odd bits into LSB16).
It took me quite a while to come up with this one. It runs roughly twice as fast as the version I've posted above and does not need interleaving of the color-channels.
Quote:
ryg's contributions could even be published uncontrolled. He writes only politically correct articles. :)
ryg, you know what to do.