half-decent coder's n00b question about GLSL optimization

category: code [glöplog]

In my pixelshader code what is the better (in term of speed) thing to do:

vec2=vec1+vec1+vec1;

instead of

vec2=vec1*3.0;

Also there is some secret hints to speed up code like for example those good old bit-shifting to divide/multiply by 2-4-8-(...) ?

added on the 2011-09-01 10:17:11 by rez

mul and add are the same speed, so 1 op is faster than 2. And I guess that answers your question about shift too (except on older architectures without integer support..).

added on the 2011-09-01 10:36:41 by Psycho

then again, the compiler will optimize it anyway.

added on the 2011-09-01 10:40:33 by Gargaj

ok, thank you boys :)

added on the 2011-09-01 10:41:22 by rez

Gargaj: I wouldn't be so sure :)

added on the 2011-09-01 11:01:27 by decipher

Gargaj: Within well-defined (but very arcane) limits. The HLSL and GLSL compilers have considerable leeway in this; but at least for D3D, once you're down to shader bytecode, don't expect lots of arithmetic transformations to happen.

The general building block of shader ALUs is a fused multiply-accumulate unit: fma(a,b,c) computes "a*b+c". Addition in such a unit is computed as fma(1,a,b) ="1*a+b", and multiplication as fma(a,b,0)="a*b+0". This is important to consider. E.g. "0.25 * (a+b+c+d)" looks like it should be less ops (4) than "0.25*a + 0.25*b + 0.25*c + 0.25*d" (7), but in fact both take 4 ops when executed using FMAs. If you don't take this into account, estimates based on instruction counting will be even more worthless than they already are :)

added on the 2011-09-01 11:06:36 by ryg

ryg: well aware, but I think that's way outside the context of this thread :)

added on the 2011-09-01 11:09:27 by Gargaj

I could talk about this kind of stuff all day. Sometimes I even do. :)

added on the 2011-09-01 11:11:37 by ryg

Well, check both versions in AMD shader analyzer and see which is fastest in throughput. Does anyone know anything similar for nVidia? Parallel NSight analyses only the whole programs afaik, not single shaders.. or did i miss something? :)

added on the 2011-09-01 11:16:46 by nystep

also looking for this kind of info.

added on the 2011-09-01 11:18:05 by psenough

NVPerfhud perhaps?

added on the 2011-09-01 11:26:38 by Gargaj

ryg: please do, it's useful :)

Anyone working on SGX GPUs should install the PowerVR SDK: http://imgtec.com/powervr/insider/powervr-sdk.asp

There's an unreliable and pretty limited (as in it doesn't properly support many GPUs) shader profiler. It's still pretty useful, as it tells you the cycle count for each part of the shader. The architecture guide and application recommendations are well worth reading too: http://imgtec.com/powervr/insider/powervr-sdk-docs.asp

added on the 2011-09-01 12:01:52 by psonice

ryg: Might someone have an issue when optimizing with AMD's ShaderAnalyzer, considering NVidia GPU's have a vector architecture instead of scalar? Or am I just full of shit? :)

added on the 2011-09-01 13:02:35 by ferris

I thought it was scalar on nVidia (cuda cores) and vector on AMD? :) thanks for PerfHUD, but it works only with directX apparently, sorry for my lameness.

added on the 2011-09-01 13:36:17 by nystep

nVidia replaced the vector processing units with scalar processing units. So, it's nVidia which is the one that doesn't feature a vectorial architecture. However, it's not entirely fair to label recent nVidia architecture scalar either; considering they actually are superscalar architectures. For example GF104 (I guess?) features 3 CUDA cores with 2 warps, hence, sort of requiring a superscalar approach for instruction level parallelism.

added on the 2011-09-01 13:53:32 by decipher

Well AMD is not straightforward vector either, it's VLIW.

Scalar/superscalar is a misnomer here though. Neither is shading single pixels/quads per invocation; both shade pixels in groups of at least 16 at once.

added on the 2011-09-01 18:43:44 by ryg

btw ryg, there will be some new version of kkrunchy in the future or the current version is the final .product? :)

added on the 2011-09-02 09:19:14 by rez

i haven't touched the kkrunchy code in years and deliberately avoided backing up that directory for years in the hope that i'll lose it to a bad disk at some point. without success so far.

added on the 2011-09-02 10:09:23 by ryg

ryg wins with the most elegant solution ever.

added on the 2011-09-02 14:02:49 by gloom

ryg: Then we have lost against the clumsy AV companies and those evil virus makers. Very sad. Seems that they ruined the future of the 64k scene.

added on the 2011-09-02 14:06:47 by ham

Who watches 64ks? demosceners. People who know that 64ks are not viruses. Sceners with antivirus software should acknowledge that it's the AV that's not working, instead of blaming the intros or the tools that were used to create them. Sceners with antivirus software should get used on creating filters, ignoring the warnings or just simply uninstall that useless piece of performance-hog-software.

... look at me, a happy windows user without antivirus software for over 3 years now.

epilogue: sceners who need antivirus software because they don't know how to handle the internet, should stop complaining anyway. no personal offense intended.

added on the 2011-09-02 14:16:13 by xTr1m

xTr1m: Of course all your sentences are right but still you do not understand me. I am like you. I do not use AV software.

I am worried about the future. Look at ryg. If you are right and we should not care about AV and virus makers then why he do not want to work in his cruncher anymore?

added on the 2011-09-02 14:23:59 by ham

There's a good answer for that, and many coders know it: Who wants to maintain an old piece of code, that you once wrote as a hobby for your own projects? If you're not using it anymore, and it's not generating income, motivation is very low.

Ask kb why he doesn't work on v2 anymore ;)

added on the 2011-09-02 14:55:11 by xTr1m

xTr1m: A good conjecture for that, you might say, but the proper answer it's up to ryg in any case.

;)

added on the 2011-09-02 15:12:24 by ham

xtr1m: still, it would be nice if they released the source code (ok, ryg already gave us disfilter). I'm sure there would be people interested by it, they could even fix bugs or improve the tools.

added on the 2011-09-02 23:55:34 by LLB

pouët.net

half-decent coder's n00b question about GLSL optimization

login