pouët.net

Go to bottom

Optimizing Closure

category: code [glöplog]
You are seriously telling us that a 256 samples per fragment/pixel shader is almost for free - several times and you insist that your "measurements" using fraps (!) proof that.
I'm sorry to tell you - not convinced. My bullshit detector exploded while reading about your measurements - I guess that's a adequate measure for that what I think of your measurements.

Maybe we all have a different understanding of "almost" and "for free".
added on the 2012-09-18 18:06:01 by las las
i am lazy, so demand framerate-counter-display-code here now! else i dont re-measure!
what chock said!
Quote:
why 256 texture lookups for blurring has to be super expensive.

We are not talking about "super expensive" - we are talking about "not measurable" which is a bit of a different level.
added on the 2012-09-18 18:10:06 by las las
mmh. 256 lookups for free? let me think. how you do that? how's the kernel looking? you do without filtering and precise pixel offset and sampling exactly one pixel in the middle. does that work? the fragments involved would be cached i guess. but how you get the same fps? is this perhaps cause of bottlenecking the vram write speed while the shader's are still faster then that.

weird theory. but reads like bullshit.
added on the 2012-09-18 18:19:00 by yumeji yumeji
there, i told you!
but get flame instead of some "thankyou!"
the scene is not 1994 and i am NOT Lord Helmet!
Quote:
i am lazy, so demand framerate-counter-display-code here now!

means
Quote:
I suck, can't do anything complex properly, please everyone do the hard stuff for me kthxbai

Also you actively refuse to give a valid argument to your statement:
Quote:
else i dont re-measure!

kb already gave you the measuring tools you should use, should you not be smart enough to measure your stuff properly:
Quote:
PIX and/or NVPerfHud and/or AMD GPUPerfStudio

And last but not least, FPS is a bad performance measurement unit.
added on the 2012-09-18 18:20:19 by xTr1m xTr1m
what kernel? its all on the GPU! le textures and le code, everything resides on that addCPU ;) theres RAM and the way from gpu-ram to the gpu-"kernel" is like no way at all!
xTr1m: got you! still lazy! :p i am sure about what i do! so no need to proof you anything!
prove it yourselves! :P i dont care anymore!
Quote:
The code that is run for each pixel is the same, and those 256 texture lookups are repeated for neighboring pixels.

"neighboring"? no. (and there's of course the problem of filtering, etc.)
added on the 2012-09-18 18:26:31 by Gargaj Gargaj
BB Image
<3
added on the 2012-09-18 18:27:18 by las las
theres no filtering in 4ks, if you play the game straight!
on my ati 7970 (just about the fastest card on the market), for a 1080p source and destination texture and assuming that xy.xy gives a step of one pixel per tap (because i dont know the value of your constants), this piece of code runs in 4.5ms.

so if that was running alone on the gpu you'd be getting a best possible frame rate of 222 fps.


added on the 2012-09-18 18:28:38 by smash smash
there's so many bad assumptions about gpus going on here its beyond a joke. like what, you think the shader gets scheduled to run in neat little lines working across the framebuffer?
added on the 2012-09-18 18:29:37 by smash smash
no it doesn't lines. 4x4 fragments i thought i read somewhere.
added on the 2012-09-18 18:31:18 by yumeji yumeji
smash - thanks for helping out. I hoped that would not be necessary.

Let's try to save the thread:
Something one could read.
added on the 2012-09-18 18:38:53 by las las
Large Hedron Collider, GF560, 1080p - according to PIX the two draw calls that make up the PostFX (checked by turning them off in the selector) take an absolutely neglegible amount of....

fucking 16 milliseconds.

Absolutely no performance difference indeed!
added on the 2012-09-18 18:40:50 by kb_ kb_
las: On of the creepier realizations in life is that one is always tempted to link to ryg's blog whenever it comes to GPU internals. :)
added on the 2012-09-18 18:42:25 by kb_ kb_
Or blur filters, for that matter.
added on the 2012-09-18 18:42:37 by kb_ kb_
ok, you all made me sick, have the complete code already (concerning this):

Code: #define SCREEN_WIDTH 1280 #define SCREEN_HEIGHT 720 #define BLUR_WIDTH SCREEN_WIDTH #define BLUR_HEIGHT SCREEN_HEIGHT #define BLUR_AMOUNT 1.05f #define SCREEN_TARGET 0 #define BLUR1_TARGET 1 #define BLUR2_TARGET 2 #define ORIGINAL_TARGET 3 #define MAX_TARGETS 4 [...] #ifdef POST_FX #pragma data_seg(".rendrtgt") struct strRenderTarget { LPDIRECT3DTEXTURE9 lpTexture; LPDIRECT3DSURFACE9 lpColorSurface; }; strRenderTarget RenderTarget[4]; #endif #pragma data_seg(".screen") D3DPRESENT_PARAMETERS d3d_paramZ = { SCREEN_WIDTH, SCREEN_HEIGHT, D3DFMT_A8R8G8B8, 1, D3DMULTISAMPLE_NONE, // D3DMULTISAMPLE_8_SAMPLES, 0, D3DSWAPEFFECT_DISCARD, 0, // hWnd #ifdef _DEBUG #ifdef DEBUG_FULLSCREEN 0, // Fullscreen = 0 #else 1, // Windowed = 1 #endif #else #ifdef RELEASE_WINDOWED 1, // Windowed = 0 #else 0, // Fullscreen = 1 #endif #endif 1, D3DFMT_D24S8, 0, D3DPRESENT_RATE_DEFAULT, // D3DPRESENT_INTERVAL_DEFAULT D3DPRESENT_INTERVAL_IMMEDIATE }; #pragma data_seg(".tristrip") #ifdef POST_FX float DX_TriangleStrip[2][24] = { // screen sized { 0.0f, 0.0f, 0.0f, 1.0f, 0.5f/(SCREEN_WIDTH), 0.5f/(SCREEN_HEIGHT), SCREEN_WIDTH, 0.0f, 0.0f, 1.0f, 1.0f+0.5f/(SCREEN_WIDTH), 0.5f/(SCREEN_HEIGHT), 0.0f, SCREEN_HEIGHT, 0.0f, 1.0f, 0.5f/(SCREEN_WIDTH), 1.0f+0.5f/(SCREEN_HEIGHT), SCREEN_WIDTH, SCREEN_HEIGHT, 0.0f, 1.0f, 1.0f+0.5f/(SCREEN_WIDTH), 1.0f+0.5f/(SCREEN_HEIGHT) }, // blur sized { 0.0f, 0.0f, 0.0f, 1.0f, 0.5f/(BLUR_WIDTH), 0.5f/(BLUR_HEIGHT), BLUR_WIDTH, 0.0f, 0.0f, 1.0f, 1.0f+0.5f/(BLUR_WIDTH), 0.5f/(BLUR_HEIGHT), 0.0f, BLUR_HEIGHT, 0.0f, 1.0f, 0.5f/(BLUR_WIDTH), 1.0f+0.5f/(BLUR_HEIGHT), BLUR_WIDTH, BLUR_HEIGHT, 0.0f, 1.0f, 1.0f+0.5f/(BLUR_WIDTH), 1.0f+0.5f/(BLUR_HEIGHT) } }; #else float DX_TriangleStrip2[24] = { 0.0f, 0.0f, 0.0f, 1.0f, -1.0f,-1.0f, SCREEN_WIDTH, 0.0f, 0.0f, 1.0f, 1.0f,-1.0f, 0.0f, SCREEN_HEIGHT, 0.0f, 1.0f, -1.0f,1.0f, SCREEN_WIDTH, SCREEN_HEIGHT, 0.0f, 1.0f, 1.0f,1.0f }; #endif [...] DX_Effect->Begin( 0, 0 ); #ifdef POST_FX DX_d3dDevice->SetRenderTarget( 0, RenderTarget[SCREEN_TARGET].lpColorSurface ); #endif DX_Effect->BeginPass( part ); #ifdef POST_FX DX_d3dDevice->DrawPrimitiveUP( D3DPT_TRIANGLESTRIP, 2, DX_TriangleStrip[0], 24 ); #else DX_d3dDevice->DrawPrimitiveUP( D3DPT_TRIANGLESTRIP, 2, DX_TriangleStrip, 24 ); #endif DX_Effect->EndPass(); [...] #ifdef POST_FX DX_d3dDevice->SetRenderTarget( 0, RenderTarget[BLUR1_TARGET].lpColorSurface ); DX_d3dDevice->SetTexture( 0, RenderTarget[SCREEN_TARGET].lpTexture ); DX_Effect->SetVector( "xy", &D3DXVECTOR4( 1.0f/(SCREEN_WIDTH), 1.0f/(SCREEN_HEIGHT),0.0f,1.0f) ); DX_Effect->BeginPass( post_fx ); DX_d3dDevice->DrawPrimitiveUP( D3DPT_TRIANGLESTRIP, 2, DX_TriangleStrip[1], 24 ); DX_Effect->EndPass(); // blur horizontal DX_d3dDevice->SetRenderTarget( 0, RenderTarget[BLUR2_TARGET].lpColorSurface ); DX_d3dDevice->SetTexture( 0, RenderTarget[BLUR1_TARGET].lpTexture ); DX_Effect->SetVector( "xy", &D3DXVECTOR4( BLUR_AMOUNT/(BLUR_WIDTH), 0.0f,0.0f,2.0f) ); DX_Effect->BeginPass( post_fx ); DX_d3dDevice->DrawPrimitiveUP( D3DPT_TRIANGLESTRIP, 2, DX_TriangleStrip[1], 24 ); DX_Effect->EndPass(); // blur vertical DX_d3dDevice->SetRenderTarget( 0, RenderTarget[BLUR1_TARGET].lpColorSurface ); DX_d3dDevice->SetTexture( 0, RenderTarget[BLUR2_TARGET].lpTexture ); DX_Effect->SetVector( "xy", &D3DXVECTOR4( 0.0f, BLUR_AMOUNT/(BLUR_HEIGHT),0.0f,2.0f) ); DX_Effect->BeginPass( post_fx ); DX_d3dDevice->DrawPrimitiveUP( D3DPT_TRIANGLESTRIP, 2, DX_TriangleStrip[1], 24 ); DX_Effect->EndPass(); #endif DX_Effect->End(); #ifdef POST_FX DX_d3dDevice->SetRenderState(D3DRS_SRCBLEND, D3DBLEND_BLENDFACTOR); DX_d3dDevice->SetRenderState(D3DRS_DESTBLEND, D3DBLEND_ONE); DX_d3dDevice->SetRenderState(D3DRS_BLENDFACTOR, 0xffffffff); DX_d3dDevice->SetRenderState(D3DRS_ALPHABLENDENABLE, TRUE); DX_d3dDevice->SetRenderTarget( 0, RenderTarget[SCREEN_TARGET].lpColorSurface ); DX_d3dDevice->SetTexture( 0, RenderTarget[BLUR1_TARGET].lpTexture ); DX_d3dDevice->DrawPrimitiveUP( D3DPT_TRIANGLESTRIP, 2, DX_TriangleStrip[0], 24 ); DX_d3dDevice->SetRenderState(D3DRS_ALPHABLENDENABLE, FALSE); DX_d3dDevice->SetRenderTarget( 0, RenderTarget[ORIGINAL_TARGET].lpColorSurface ); DX_d3dDevice->SetTexture( 0, RenderTarget[SCREEN_TARGET].lpTexture ); DX_d3dDevice->DrawPrimitiveUP( D3DPT_TRIANGLESTRIP, 2, DX_TriangleStrip[0], 24 ); DX_d3dDevice->SetRenderState(D3DRS_ALPHABLENDENABLE, FALSE); #endif DX_d3dDevice->EndScene(); DX_d3dDevice->Present( NULL, NULL, NULL, NULL ); } [...] #ifdef POST_FX // Rendertargets DX_d3dDevice->CreateTexture( SCREEN_WIDTH, SCREEN_HEIGHT, 1, D3DUSAGE_RENDERTARGET, D3DFMT_A8R8G8B8, D3DPOOL_DEFAULT, &(RenderTarget[ORIGINAL_TARGET].lpTexture), NULL ); DX_d3dDevice->CreateTexture( BLUR_WIDTH, BLUR_HEIGHT, 1, D3DUSAGE_RENDERTARGET, D3DFMT_A8R8G8B8, D3DPOOL_DEFAULT, &(RenderTarget[BLUR1_TARGET].lpTexture), NULL ); DX_d3dDevice->CreateTexture( BLUR_WIDTH, BLUR_HEIGHT, 1, D3DUSAGE_RENDERTARGET, D3DFMT_A8R8G8B8, D3DPOOL_DEFAULT, &(RenderTarget[BLUR2_TARGET].lpTexture), NULL ); DX_d3dDevice->CreateTexture( SCREEN_WIDTH, SCREEN_HEIGHT, 1, D3DUSAGE_RENDERTARGET, D3DFMT_A8R8G8B8, D3DPOOL_DEFAULT, &(RenderTarget[SCREEN_TARGET].lpTexture), NULL ); RenderTarget[ORIGINAL_TARGET].lpTexture->GetSurfaceLevel( 0, &(RenderTarget[ORIGINAL_TARGET].lpColorSurface)); RenderTarget[BLUR1_TARGET].lpTexture->GetSurfaceLevel( 0, &(RenderTarget[BLUR1_TARGET].lpColorSurface)); RenderTarget[BLUR2_TARGET].lpTexture->GetSurfaceLevel( 0, &(RenderTarget[BLUR2_TARGET].lpColorSurface)); RenderTarget[SCREEN_TARGET].lpTexture->GetSurfaceLevel( 0, &(RenderTarget[SCREEN_TARGET].lpColorSurface)); DX_d3dDevice->GetRenderTarget( 0, &(RenderTarget[ORIGINAL_TARGET].lpColorSurface) ); // DX_d3dDevice->SetRenderState(D3DRS_lING, false); DX_d3dDevice->SetSamplerState(0, D3DSAMP_MAGFILTER, D3DTEXF_LINEAR); DX_d3dDevice->SetSamplerState(0, D3DSAMP_ADDRESSU, D3DTADDRESS_BORDER ); DX_d3dDevice->SetSamplerState(0, D3DSAMP_ADDRESSV, D3DTADDRESS_BORDER ); #endif





thats all i can tell you about "xy" ;)
D3DRS_IING ? seems i replaced "light" with "i" once, while concatenating, and did it on the entire solution instead of on the shaderCode alone! ;) i just wonder if i can delete that line completely now!
ok @hardy. it's a basic blur pass framework. but how do you explain your "256 lookup for free" with that? *scratches head*
added on the 2012-09-18 19:01:06 by yumeji yumeji
well, the shader-code-snippet about the 256lookups is some pages back in this thread! try to implement it yourself and see!

login

Go to top