half-decent coder's n00b question about GLSL optimization
category: code [glöplog]
In my pixelshader code what is the better (in term of speed) thing to do:
vec2=vec1+vec1+vec1;
instead of
vec2=vec1*3.0;
Also there is some secret hints to speed up code like for example those good old bit-shifting to divide/multiply by 2-4-8-(...) ?
vec2=vec1+vec1+vec1;
instead of
vec2=vec1*3.0;
Also there is some secret hints to speed up code like for example those good old bit-shifting to divide/multiply by 2-4-8-(...) ?
mul and add are the same speed, so 1 op is faster than 2. And I guess that answers your question about shift too (except on older architectures without integer support..).
then again, the compiler will optimize it anyway.
ok, thank you boys :)
Gargaj: I wouldn't be so sure :)
Gargaj: Within well-defined (but very arcane) limits. The HLSL and GLSL compilers have considerable leeway in this; but at least for D3D, once you're down to shader bytecode, don't expect lots of arithmetic transformations to happen.
The general building block of shader ALUs is a fused multiply-accumulate unit: fma(a,b,c) computes "a*b+c". Addition in such a unit is computed as fma(1,a,b) ="1*a+b", and multiplication as fma(a,b,0)="a*b+0". This is important to consider. E.g. "0.25 * (a+b+c+d)" looks like it should be less ops (4) than "0.25*a + 0.25*b + 0.25*c + 0.25*d" (7), but in fact both take 4 ops when executed using FMAs. If you don't take this into account, estimates based on instruction counting will be even more worthless than they already are :)
The general building block of shader ALUs is a fused multiply-accumulate unit: fma(a,b,c) computes "a*b+c". Addition in such a unit is computed as fma(1,a,b) ="1*a+b", and multiplication as fma(a,b,0)="a*b+0". This is important to consider. E.g. "0.25 * (a+b+c+d)" looks like it should be less ops (4) than "0.25*a + 0.25*b + 0.25*c + 0.25*d" (7), but in fact both take 4 ops when executed using FMAs. If you don't take this into account, estimates based on instruction counting will be even more worthless than they already are :)
ryg: well aware, but I think that's way outside the context of this thread :)
I could talk about this kind of stuff all day. Sometimes I even do. :)
Well, check both versions in AMD shader analyzer and see which is fastest in throughput. Does anyone know anything similar for nVidia? Parallel NSight analyses only the whole programs afaik, not single shaders.. or did i miss something? :)
also looking for this kind of info.
NVPerfhud perhaps?
ryg: please do, it's useful :)
Anyone working on SGX GPUs should install the PowerVR SDK: http://imgtec.com/powervr/insider/powervr-sdk.asp
There's an unreliable and pretty limited (as in it doesn't properly support many GPUs) shader profiler. It's still pretty useful, as it tells you the cycle count for each part of the shader. The architecture guide and application recommendations are well worth reading too: http://imgtec.com/powervr/insider/powervr-sdk-docs.asp
Anyone working on SGX GPUs should install the PowerVR SDK: http://imgtec.com/powervr/insider/powervr-sdk.asp
There's an unreliable and pretty limited (as in it doesn't properly support many GPUs) shader profiler. It's still pretty useful, as it tells you the cycle count for each part of the shader. The architecture guide and application recommendations are well worth reading too: http://imgtec.com/powervr/insider/powervr-sdk-docs.asp
ryg: Might someone have an issue when optimizing with AMD's ShaderAnalyzer, considering NVidia GPU's have a vector architecture instead of scalar? Or am I just full of shit? :)
I thought it was scalar on nVidia (cuda cores) and vector on AMD? :) thanks for PerfHUD, but it works only with directX apparently, sorry for my lameness.
nVidia replaced the vector processing units with scalar processing units. So, it's nVidia which is the one that doesn't feature a vectorial architecture. However, it's not entirely fair to label recent nVidia architecture scalar either; considering they actually are superscalar architectures. For example GF104 (I guess?) features 3 CUDA cores with 2 warps, hence, sort of requiring a superscalar approach for instruction level parallelism.
Well AMD is not straightforward vector either, it's VLIW.
Scalar/superscalar is a misnomer here though. Neither is shading single pixels/quads per invocation; both shade pixels in groups of at least 16 at once.
Scalar/superscalar is a misnomer here though. Neither is shading single pixels/quads per invocation; both shade pixels in groups of at least 16 at once.
btw ryg, there will be some new version of kkrunchy in the future or the current version is the final .product? :)
i haven't touched the kkrunchy code in years and deliberately avoided backing up that directory for years in the hope that i'll lose it to a bad disk at some point. without success so far.
ryg wins with the most elegant solution ever.
ryg: Then we have lost against the clumsy AV companies and those evil virus makers. Very sad. Seems that they ruined the future of the 64k scene.
Who watches 64ks? demosceners. People who know that 64ks are not viruses. Sceners with antivirus software should acknowledge that it's the AV that's not working, instead of blaming the intros or the tools that were used to create them. Sceners with antivirus software should get used on creating filters, ignoring the warnings or just simply uninstall that useless piece of performance-hog-software.
... look at me, a happy windows user without antivirus software for over 3 years now.
epilogue: sceners who need antivirus software because they don't know how to handle the internet, should stop complaining anyway. no personal offense intended.
... look at me, a happy windows user without antivirus software for over 3 years now.
epilogue: sceners who need antivirus software because they don't know how to handle the internet, should stop complaining anyway. no personal offense intended.
xTr1m: Of course all your sentences are right but still you do not understand me. I am like you. I do not use AV software.
I am worried about the future. Look at ryg. If you are right and we should not care about AV and virus makers then why he do not want to work in his cruncher anymore?
I am worried about the future. Look at ryg. If you are right and we should not care about AV and virus makers then why he do not want to work in his cruncher anymore?
There's a good answer for that, and many coders know it: Who wants to maintain an old piece of code, that you once wrote as a hobby for your own projects? If you're not using it anymore, and it's not generating income, motivation is very low.
Ask kb why he doesn't work on v2 anymore ;)
Ask kb why he doesn't work on v2 anymore ;)
xTr1m: A good conjecture for that, you might say, but the proper answer it's up to ryg in any case.
;)
;)
xtr1m: still, it would be nice if they released the source code (ok, ryg already gave us disfilter). I'm sure there would be people interested by it, they could even fix bugs or improve the tools.