Optimizing Closure
category: code [glöplog]
You are seriously telling us that a 256 samples per fragment/pixel shader is almost for free - several times and you insist that your "measurements" using fraps (!) proof that.
I'm sorry to tell you - not convinced. My bullshit detector exploded while reading about your measurements - I guess that's a adequate measure for that what I think of your measurements.
Maybe we all have a different understanding of "almost" and "for free".
  
I'm sorry to tell you - not convinced. My bullshit detector exploded while reading about your measurements - I guess that's a adequate measure for that what I think of your measurements.
Maybe we all have a different understanding of "almost" and "for free".
i am lazy, so demand framerate-counter-display-code here now! else i dont re-measure!
  
what chock said!
  
Quote:
why 256 texture lookups for blurring has to be super expensive.
We are not talking about "super expensive" - we are talking about "not measurable" which is a bit of a different level.
mmh. 256 lookups for free? let me think. how you do that? how's the kernel looking? you do without filtering and precise pixel offset and sampling exactly one pixel in the middle. does that work? the fragments involved would be cached i guess. but how you get the same fps? is this perhaps cause of bottlenecking the vram write speed while the shader's are still faster then that.
weird theory. but reads like bullshit.
  
weird theory. but reads like bullshit.
there, i told you!
but get flame instead of some "thankyou!"
the scene is not 1994 and i am NOT Lord Helmet!
  
but get flame instead of some "thankyou!"
the scene is not 1994 and i am NOT Lord Helmet!
Quote:
 i am lazy, so demand framerate-counter-display-code here now!
means
Quote:
I suck, can't do anything complex properly, please everyone do the hard stuff for me kthxbai
Also you actively refuse to give a valid argument to your statement:
Quote:
 else i dont re-measure!
kb already gave you the measuring tools you should use, should you not be smart enough to measure your stuff properly:
Quote:
PIX and/or NVPerfHud and/or AMD GPUPerfStudio
And last but not least, FPS is a bad performance measurement unit.
what kernel? its all on the GPU! le textures and le code, everything resides on that addCPU ;) theres RAM and the way from gpu-ram to the gpu-"kernel" is like no way at all!
  

xTr1m: got you! still lazy! :p i am sure about what i do! so no need to proof you anything! 
  
prove it yourselves! :P i dont care anymore!
  
Quote:
The code that is run for each pixel is the same, and those 256 texture lookups are repeated for neighboring pixels.
"neighboring"? no. (and there's of course the problem of filtering, etc.)

<3
theres no filtering in 4ks, if you play the game straight!
  
on my ati 7970 (just about the fastest card on the market), for a 1080p source and destination texture and assuming that xy.xy gives a step of one pixel per tap (because i dont know the value of your constants), this piece of code runs in 4.5ms. 
so if that was running alone on the gpu you'd be getting a best possible frame rate of 222 fps.
  
so if that was running alone on the gpu you'd be getting a best possible frame rate of 222 fps.
there's so many bad assumptions about gpus going on here its beyond a joke. like what, you think the shader gets scheduled to run in neat little lines working across the framebuffer? 
  
no it doesn't lines. 4x4 fragments i thought i read somewhere.
  
smash - thanks for helping out. I hoped that would not be necessary.
Let's try to save the thread:
Something one could read.
  
Let's try to save the thread:
Something one could read.
Large Hedron Collider, GF560, 1080p - according to PIX the two draw calls that make up the PostFX (checked by turning them off in the selector) take an absolutely neglegible amount of....
fucking 16 milliseconds.
Absolutely no performance difference indeed!
  
fucking 16 milliseconds.
Absolutely no performance difference indeed!
las: On of the creepier realizations in life is that one is always tempted to link to ryg's blog whenever it comes to GPU internals. :)
  
Or blur filters, for that matter.
  
ok, you all made me sick, have the complete code already (concerning this):
thats all i can tell you about "xy" ;)
  
Code:
#define		SCREEN_WIDTH	    1280
#define		SCREEN_HEIGHT	720
#define		BLUR_WIDTH		SCREEN_WIDTH
#define		BLUR_HEIGHT		SCREEN_HEIGHT
#define		BLUR_AMOUNT		1.05f
#define SCREEN_TARGET		0
#define BLUR1_TARGET		1
#define BLUR2_TARGET		2
#define ORIGINAL_TARGET		3
#define MAX_TARGETS			4
[...]
#ifdef POST_FX
#pragma data_seg(".rendrtgt")
struct strRenderTarget
{
	LPDIRECT3DTEXTURE9 lpTexture;
	LPDIRECT3DSURFACE9 lpColorSurface;
};
strRenderTarget RenderTarget[4];
#endif
#pragma data_seg(".screen")
D3DPRESENT_PARAMETERS	d3d_paramZ = {
	SCREEN_WIDTH,
	SCREEN_HEIGHT,
	D3DFMT_A8R8G8B8,
	1,
	D3DMULTISAMPLE_NONE,
//	D3DMULTISAMPLE_8_SAMPLES,
	0,
	D3DSWAPEFFECT_DISCARD,
	0,						//	hWnd
#ifdef _DEBUG
	#ifdef DEBUG_FULLSCREEN
	0,						//	Fullscreen = 0
	#else
	1,						//	Windowed = 1
	#endif
#else
	#ifdef RELEASE_WINDOWED
	1,						//	Windowed = 0
	#else
	0,						//	Fullscreen = 1
	#endif
#endif
	1,
	D3DFMT_D24S8,
	0,
	D3DPRESENT_RATE_DEFAULT,
//	D3DPRESENT_INTERVAL_DEFAULT
	D3DPRESENT_INTERVAL_IMMEDIATE
	};
#pragma data_seg(".tristrip")
#ifdef POST_FX
float DX_TriangleStrip[2][24] =
{
	//  screen sized
	{
		0.0f, 0.0f, 0.0f, 1.0f,
		0.5f/(SCREEN_WIDTH), 0.5f/(SCREEN_HEIGHT),
		SCREEN_WIDTH, 0.0f, 0.0f, 1.0f,
		1.0f+0.5f/(SCREEN_WIDTH), 0.5f/(SCREEN_HEIGHT),
		0.0f, SCREEN_HEIGHT, 0.0f, 1.0f,
		0.5f/(SCREEN_WIDTH), 1.0f+0.5f/(SCREEN_HEIGHT),
		SCREEN_WIDTH, SCREEN_HEIGHT, 0.0f, 1.0f,
		1.0f+0.5f/(SCREEN_WIDTH), 1.0f+0.5f/(SCREEN_HEIGHT)
	},
	//  blur sized
	{
		0.0f, 0.0f, 0.0f, 1.0f,
		0.5f/(BLUR_WIDTH), 0.5f/(BLUR_HEIGHT),
		BLUR_WIDTH, 0.0f, 0.0f, 1.0f,
		1.0f+0.5f/(BLUR_WIDTH), 0.5f/(BLUR_HEIGHT),
		0.0f, BLUR_HEIGHT, 0.0f, 1.0f,
		0.5f/(BLUR_WIDTH), 1.0f+0.5f/(BLUR_HEIGHT),
		BLUR_WIDTH, BLUR_HEIGHT, 0.0f, 1.0f,
		1.0f+0.5f/(BLUR_WIDTH), 1.0f+0.5f/(BLUR_HEIGHT)
	}
};
#else
float DX_TriangleStrip2[24] =
{
		0.0f, 0.0f, 0.0f, 1.0f,
			-1.0f,-1.0f,
		SCREEN_WIDTH, 0.0f, 0.0f, 1.0f,
			1.0f,-1.0f,
		0.0f, SCREEN_HEIGHT, 0.0f, 1.0f,
			-1.0f,1.0f,
		SCREEN_WIDTH, SCREEN_HEIGHT, 0.0f, 1.0f,
			1.0f,1.0f
};
#endif
[...]
	DX_Effect->Begin( 0, 0 );
#ifdef POST_FX
			DX_d3dDevice->SetRenderTarget( 0, RenderTarget[SCREEN_TARGET].lpColorSurface );
#endif
			DX_Effect->BeginPass( part );
#ifdef POST_FX
				DX_d3dDevice->DrawPrimitiveUP( D3DPT_TRIANGLESTRIP, 2, DX_TriangleStrip[0], 24 );
#else
				DX_d3dDevice->DrawPrimitiveUP( D3DPT_TRIANGLESTRIP, 2, DX_TriangleStrip, 24 );
#endif
			DX_Effect->EndPass();
[...]
#ifdef POST_FX
		
	DX_d3dDevice->SetRenderTarget( 0, RenderTarget[BLUR1_TARGET].lpColorSurface );
	DX_d3dDevice->SetTexture( 0, RenderTarget[SCREEN_TARGET].lpTexture );
	DX_Effect->SetVector( "xy", &D3DXVECTOR4( 1.0f/(SCREEN_WIDTH), 1.0f/(SCREEN_HEIGHT),0.0f,1.0f) );
		DX_Effect->BeginPass( post_fx );
			DX_d3dDevice->DrawPrimitiveUP( D3DPT_TRIANGLESTRIP, 2, DX_TriangleStrip[1], 24 );
		DX_Effect->EndPass();
	// blur horizontal
	DX_d3dDevice->SetRenderTarget( 0, RenderTarget[BLUR2_TARGET].lpColorSurface );
	DX_d3dDevice->SetTexture( 0, RenderTarget[BLUR1_TARGET].lpTexture );
	DX_Effect->SetVector( "xy", &D3DXVECTOR4( BLUR_AMOUNT/(BLUR_WIDTH), 0.0f,0.0f,2.0f) );
			DX_Effect->BeginPass( post_fx );
			DX_d3dDevice->DrawPrimitiveUP( D3DPT_TRIANGLESTRIP, 2, DX_TriangleStrip[1], 24 );
		DX_Effect->EndPass();
	// blur vertical
	DX_d3dDevice->SetRenderTarget( 0, RenderTarget[BLUR1_TARGET].lpColorSurface );
	DX_d3dDevice->SetTexture( 0, RenderTarget[BLUR2_TARGET].lpTexture );
	DX_Effect->SetVector( "xy", &D3DXVECTOR4( 0.0f, BLUR_AMOUNT/(BLUR_HEIGHT),0.0f,2.0f) );
		DX_Effect->BeginPass( post_fx );
			DX_d3dDevice->DrawPrimitiveUP( D3DPT_TRIANGLESTRIP, 2, DX_TriangleStrip[1], 24 );
		DX_Effect->EndPass();
#endif
	DX_Effect->End();
#ifdef POST_FX
	DX_d3dDevice->SetRenderState(D3DRS_SRCBLEND, D3DBLEND_BLENDFACTOR);
	DX_d3dDevice->SetRenderState(D3DRS_DESTBLEND, D3DBLEND_ONE);
	DX_d3dDevice->SetRenderState(D3DRS_BLENDFACTOR, 0xffffffff);
	DX_d3dDevice->SetRenderState(D3DRS_ALPHABLENDENABLE, TRUE);		
	DX_d3dDevice->SetRenderTarget( 0, RenderTarget[SCREEN_TARGET].lpColorSurface );
	DX_d3dDevice->SetTexture( 0, RenderTarget[BLUR1_TARGET].lpTexture );
		DX_d3dDevice->DrawPrimitiveUP( D3DPT_TRIANGLESTRIP, 2, DX_TriangleStrip[0], 24 );
	DX_d3dDevice->SetRenderState(D3DRS_ALPHABLENDENABLE, FALSE);
	DX_d3dDevice->SetRenderTarget( 0, RenderTarget[ORIGINAL_TARGET].lpColorSurface );
	DX_d3dDevice->SetTexture( 0, RenderTarget[SCREEN_TARGET].lpTexture );
		DX_d3dDevice->DrawPrimitiveUP( D3DPT_TRIANGLESTRIP, 2, DX_TriangleStrip[0], 24 );
	
	DX_d3dDevice->SetRenderState(D3DRS_ALPHABLENDENABLE, FALSE);
#endif
DX_d3dDevice->EndScene();
	DX_d3dDevice->Present( NULL, NULL, NULL, NULL );
}
[...]
#ifdef POST_FX
// Rendertargets
		DX_d3dDevice->CreateTexture(	SCREEN_WIDTH, SCREEN_HEIGHT, 1, D3DUSAGE_RENDERTARGET, 
										D3DFMT_A8R8G8B8, D3DPOOL_DEFAULT, &(RenderTarget[ORIGINAL_TARGET].lpTexture), NULL );
		DX_d3dDevice->CreateTexture(	BLUR_WIDTH, BLUR_HEIGHT, 1, D3DUSAGE_RENDERTARGET, 
										D3DFMT_A8R8G8B8, D3DPOOL_DEFAULT, &(RenderTarget[BLUR1_TARGET].lpTexture), NULL );
		DX_d3dDevice->CreateTexture(	BLUR_WIDTH, BLUR_HEIGHT, 1, D3DUSAGE_RENDERTARGET, 
										D3DFMT_A8R8G8B8, D3DPOOL_DEFAULT, &(RenderTarget[BLUR2_TARGET].lpTexture), NULL );
		DX_d3dDevice->CreateTexture(	SCREEN_WIDTH, SCREEN_HEIGHT, 1, D3DUSAGE_RENDERTARGET, 
										D3DFMT_A8R8G8B8, D3DPOOL_DEFAULT, &(RenderTarget[SCREEN_TARGET].lpTexture), NULL );
		RenderTarget[ORIGINAL_TARGET].lpTexture->GetSurfaceLevel( 0, &(RenderTarget[ORIGINAL_TARGET].lpColorSurface));
		RenderTarget[BLUR1_TARGET].lpTexture->GetSurfaceLevel( 0, &(RenderTarget[BLUR1_TARGET].lpColorSurface));
		RenderTarget[BLUR2_TARGET].lpTexture->GetSurfaceLevel( 0, &(RenderTarget[BLUR2_TARGET].lpColorSurface));
		RenderTarget[SCREEN_TARGET].lpTexture->GetSurfaceLevel( 0, &(RenderTarget[SCREEN_TARGET].lpColorSurface));
	DX_d3dDevice->GetRenderTarget( 0, &(RenderTarget[ORIGINAL_TARGET].lpColorSurface) );
//	DX_d3dDevice->SetRenderState(D3DRS_lING, false); 	
	DX_d3dDevice->SetSamplerState(0, D3DSAMP_MAGFILTER, D3DTEXF_LINEAR);
	DX_d3dDevice->SetSamplerState(0, D3DSAMP_ADDRESSU, D3DTADDRESS_BORDER );
	DX_d3dDevice->SetSamplerState(0, D3DSAMP_ADDRESSV, D3DTADDRESS_BORDER );
#endif
thats all i can tell you about "xy" ;)
D3DRS_IING ? seems i replaced "light" with "i" once, while concatenating,  and did it on the entire solution instead of on the shaderCode alone! ;) i just wonder if i can delete that line completely now!
  
ok @hardy. it's a basic blur pass framework. but how do you explain your "256 lookup for free" with that? *scratches head*
  
well, the shader-code-snippet about the 256lookups is some pages back in this thread! try to implement it yourself and see!
  








