pouët.net

Go to bottom

integer clamp

category: code [glöplog]
Quote:
=> Just let the compiler do it's job.

That was interesting. It might be fun to look at the assembly dump to see how much gets optimized out, though, before dismissing the branchless clamping completely =)
added on the 2010-12-03 08:52:33 by sol_hsa sol_hsa
use unsigned comparison and CMOVcc
added on the 2010-12-03 11:06:06 by ponce ponce
mmx indeed
Quote:
The PADDUS (Packed Add Unsigned with Saturation) instructions add the packed unsigned data elements of the source operand to the packed unsigned data elements of the destination operand and saturate the results.
PADDUS support packed byte (PADDUSB) and packed word (PADDUSW) data types.
added on the 2010-12-03 11:44:54 by the_Ye-Ti the_Ye-Ti
Using no conditionals?

// if (number < 0) number = 0;
number &= ~number >>31;
// if (number > 0xFF) number = 0xFF;
number = number & 0xFF | ((-(((unsigned)number) >> 8)) >> 24);

Alternatively:

// if (number < 0) number = 0;
number &= ~number >>31;
// if (number > 0xFF) number = 0xFF;
number = 0xFF-number;
number &= ~number >>31;
number = 0xFF-number;

Whether this is faster than conditionals should depend on your platform. Platforms with built-in clamping, min/max or "set register if condition is true"-like instructions are unlikely to be faster with this code.
added on the 2010-12-03 12:13:06 by Kabuto Kabuto
las: Now try benchmarking again with something that isn't trivially branch-predictable by the CPU. :-)
added on the 2010-12-03 12:51:33 by Sesse Sesse
Sesse: Does it really matter? The compiler probably turned those branches into CMOVs anyway...
added on the 2010-12-03 13:14:11 by kusma kusma
yeh.... does it really matter? :) seems a theme on pouet these days.. in this day and age of optimised compilers why are people still discussing this? I'm sure there are many other flaws in your code that impact perf more :)
added on the 2010-12-03 13:24:50 by dv$ dv$
Sesse:
Quote:
// Just some not so serious testing...


Benchmark it yourself ;)
added on the 2010-12-03 13:26:44 by las las
Of course the real power of nonconditional version is that you can do simd without simd..

sum0 = a0+b0 | (a0+b0>>8&0x801008)*255;
sum1 = a1+b1 | (a1+b1>>8&0x801008)*255;
*dest = sum0<<1&0xf81f0000 | sum0<<16&0x7e00000 | sum1>>15&0xf81f | sum1&0x3e0;
added on the 2010-12-03 13:56:55 by 216 216

login

Go to top