guess what this function does

category: general [glöplog]

static inline uint32_t YeahImNotGonnaGiveYouTheNameOfCourse(uint32_t v)
{
uint32_t tmp = v - ((v >> 1) & 0x55555555);

tmp = (tmp & 0x33333333) + ((tmp >> 2) & 0x33333333);

uint32_t r = ((tmp + (tmp >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;

return r;
}

BTW I didn't come up with this

added on the 2008-12-23 22:44:55 by duffman

i'll let ryg do this, he's into this kinky stuff.

added on the 2008-12-23 22:48:18 by Gargaj

That function counts the number of set bits in a value.

added on the 2008-12-23 22:49:23 by xernobyl

wild stab in the dark, its one of those interview questions about counting the number of 1 bits ? There are millions of variations but this looks like one of them.

added on the 2008-12-23 22:49:43 by auld

http://gurmeetsingh.wordpress.com/2008/08/05/fast-bit-counting-routines/

I must say that __builtin_popcount in GNU was new to me.

Anyway, yeah, one of the most pointless things to know.

added on the 2008-12-23 23:00:59 by snoutmate

SSE4.2 has POPCNT

added on the 2008-12-23 23:33:19 by misioslaw

....aaaand the function is taken from this site.

added on the 2008-12-24 00:26:48 by kusma

How long till Adok finds this thread and tries to impress everyone?

added on the 2008-12-24 00:31:01 by Shockwave

Counting bits isn't useless.

added on the 2008-12-24 00:38:14 by doomdoom

yes, i'm into this kind of stuff, and yes, it's population count. oooold!

i'll shoot a few back that should be a bit more fun:

Code:

static inline uint32 mystery1(uint32 a,uint32 b)
{
  return (a & b) + (((a ^ b) >> 1) & 0x7f7f7f7f);
}

static inline uint32 mystery2(uint32 a,uint32 b)
{
  return ((a & ~b) & 0x80808080u) ^ ((a & 0x7f7f7f7f) + (b & 0x7f7f7f7f));
}

(both pixel manipulation stuff)

the advantage of this kind of tricks is (or well, was) that they are easy to generalize to "non-homogenous" pixel formats (e.g. rgb565), where you can't simply use mmx instructions :)

though my favorite one by far is this (i've seen it first described by blinn):

Code:

int mul255(int a,int b) // =round(a*b/255) for a,b small enough (-255<=a,b<=255 always works)
{
  int t = a*b + 128;
  return (t + (t>>8)) >> 8;
}

as for bit tricks that are actually useful, the "chunky to planar-esque" bitwise-transpose tricks often come in handy in SIMD code. even though you really only need the key insight (decomposing a matrix transpose into several block matrix transposes) and then can do the rest with pack/unpack/shift instructions (without the xor trickery).

time for link galore!
http://aggregate.org/MAGIC/
http://graphics.stanford.edu/~seander/bithacks.html
http://home.pipeline.com/~hbaker1/hakmem/hakmem.html
http://realtimecollisiondetection.net/blog/?p=90
http://realtimecollisiondetection.net/blog/?p=78
and knuth has a pretty good (and systematic) collection too.

and with that, i shall leave you for the evening. merry christmas.

(this post was brought to you by the number 2 and the letter pi).

added on the 2008-12-24 00:46:52 by ryg

The first one looks like it is an average of two 4 component colours (0..255) packed into a and b. If true, thats a super cool trick.

I cant get the second one. At a guess it does the same for signed values?

added on the 2008-12-24 02:25:11 by auld

poops on a baby

added on the 2008-12-24 02:31:42 by PENETRATOR

Adok need not impress anyone. :)

ryg: Write something for Hugi!

added on the 2008-12-24 10:43:34 by Adok

merry christmas to you too.

added on the 2008-12-24 11:10:49 by rudi

duffman: yeah, nice bit counter. it proceeds from fine grain to coarse, that was easy. but the first step.. i wouldn't have thought of that.

added on the 2008-12-24 11:28:04 by earx

ryg:
..and since they will release everything without further ado you really can send him anything... GUTEN RRRRRUTSCH, adok!

added on the 2008-12-24 11:30:49 by v3nom

ryg: what i always did was:

return (a>>1)&0x7f7f7f7f + (b>>1)&0x7f7f7f7f;

it wasn't 4-component pixel manip. but for water/fire effects. it's nice that your version would probably be faster on some implementations (a shift traded against an ALU operation).

added on the 2008-12-24 11:41:19 by earx

correction: with implementations i mean cpu architectures.

added on the 2008-12-24 11:42:16 by earx

earxmas: The difference being that yours has some undesirable rounding behaviour. E.g. the average of 0xff and 0xff becomes 0xfe. Oh and you need some more brackets. :)

added on the 2008-12-24 13:27:16 by doomdoom

xmas device: yeah, that's true.

.. come to think of it, i probably did shift and mask afterwards, which made it alot faster.

added on the 2008-12-24 14:21:03 by earx

Guess what this is..

*hint* It has something to do with 565 packed pixels...

Code:


unsigned int inl_WtfPixelOp(unsigned int a, unsigned int b)
{
  unsigned int sum  = a+b;
  unsigned int mask = ((sum & 0x08010020)>>5)*63;
  return (sum | mask) & 0x07E0F81F;
}

added on the 2008-12-24 14:23:34 by torus

my second function was simply (a+b-128) without saturation.

torus: saturating add. green is in the top 16bits, rb in the low 16bits.

added on the 2008-12-24 14:46:25 by ryg

v3nomsoup: "Guten Rutsch" - same to you! But ryg's contributions could even be published uncontrolled. He writes only politically correct articles. :)

added on the 2008-12-24 14:49:14 by Adok

ryg, that's it.. saturating add.

btw - on the C64x+ DSP I'm using the following one:

Code:


unsigned int inl_SatAdd565 (ui16 a, ui16 b)
{
  // for C64x+ DSP
  unsigned int mask  = ((15<<11)|(31<<5)|15)<<16;
  unsigned int temp  = _deal (_shfl (a) + _shfl (b | mask));  
  unsigned int color = temp;
  unsigned int carry = temp;
  if (carry & 0x80000000) color |= 0xf800;
  if (carry & 0x04000000) color |= 0x07e0;
  if (carry & 0x00100000) color |= 0x001f;
  return color;
}

_shfl interleaves an integer bit-wise with 0-bits (e.g. Pow (x,2) over the Galious Field 2). _deal does more or less the opposite (sorts even bits into MSB16 and odd bits into LSB16).

It took me quite a while to come up with this one. It runs roughly twice as fast as the version I've posted above and does not need interleaving of the color-channels.

added on the 2008-12-24 15:01:59 by torus

Quote:

ryg's contributions could even be published uncontrolled. He writes only politically correct articles. :)

ryg, you know what to do.

added on the 2008-12-24 15:22:12 by Gargaj

pouët.net

guess what this function does

login