Current ( >= 3.3) OpenGL - some questions.

category: code [glöplog]

Hi fellow sceners,
given that I'm to lazy to read tons of spec pages now - I'll ask here first a couple of questions:

1. What is the best/fastest way of rendering to texture, rendering to multiple render targets
Now that there are PBOs, FBOs and so on... what's the proper way to do that?

2. What kind of float texture whatever buffer format should one use - what can be considered the fastest?

3. What is the best/fastest way to render a fullscreen quad with current GL (currently we use VAOs without index buffer containing just the data for 2 triangles and glDrawArrays - maybe that's uncool for some reason?)?

4. What is the tiniest solution for rendering a fullscreen quad in GL >= 3.3? (I thought of something like binding a "NULL" buffer and let the vertex shader do the job - not sure how to implement that in GL - in DX it's simple).

Thanks in advance ;)
Code snippets appreciated.

added on the 2012-01-21 16:23:25 by las

Code:


10 PRINT "pants off!"
20 GOTO 10

added on the 2012-01-21 17:21:15 by trc_wm

i know it doesn't answer your questions, but i'm just curious what do you need opengl>3 for? afaik, (0) every new functionality is already available in extensions in older contexts, (1) on win32 you would still need to wglGetProcAddress all the gl>3 functions, so no gain here either.
am i missing something here?

added on the 2012-01-21 17:37:35 by provod

using *EXT and *ARB is something I don't like - I prefer sticking to a more or less properly defined specification - avoiding total ATI/NV compatibility horror.
And consider the wglGetProcAddress problem solved. ;)

added on the 2012-01-21 17:43:50 by las

Quote:

1. What is the best/fastest way of rendering to texture, rendering to multiple render targets. Now that there are PBOs, FBOs and so on... what's the proper way to do that?

FBOs.

Quote:

2. What kind of float texture whatever buffer format should one use - what can be considered the fastest?

The one with least bits is always the fastest.

Quote:

3. What is the best/fastest way to render a fullscreen quad with current GL (currently we use VAOs without index buffer containing just the data for 2 triangles and glDrawArrays - maybe that's uncool for some reason?)?

If you're drawing one quad, the method of doing it doesn't really affect your performance. So yeah, a triangle strip of two triangles is the usual method.

Quote:

4. What is the tiniest solution for rendering a fullscreen quad in GL >= 3.3? (I thought of something like binding a "NULL" buffer and let the vertex shader do the job - not sure how to implement that in GL - in DX it's simple).

Now, this is something a 4k intro coder could answer.

added on the 2012-01-21 19:50:20 by sol_hsa

http://opengl.org/wiki

added on the 2012-01-21 20:19:06 by xernobyl

3. draw a triangle instead, to get less wasted pixels along the diagonal (pixels rendered in batches of quads on most platforms these days).

4. you can use the vertexid, see the fxaa vertex-shader

added on the 2012-01-21 20:57:45 by hornet

Hi las,

Quote:

1. What is the best/fastest way of rendering to texture, rendering to multiple render targets
Now that there are PBOs, FBOs and so on... what's the proper way to do that?

FBOs are the proper way, and shader_image_load_store the fastest and possibly the smallest as well now from what I've seen. Note that shader_image_load_store is a SM5 (in D3D terminology) extension. Don't expect it to be supported on SM4 hardware.

PBOs are only needed if you need to stream data between a given texture and the central memory.

Quote:

2. What kind of float texture whatever buffer format should one use - what can be considered the fastest?

32 bits and lower bpp formats run at fullspeed, 64 bpp at half the sampling rate, and 128 bpp at a quarter, from what I've seen from some benchmarks. Maybe some of the most recent NV GPUs can do 64bpp nearest/bilinear at fullspeed. I'm not sure. You'll have to ask a testing guru here. :)

Quote:

3. What is the best/fastest way to render a fullscreen quad with current GL (currently we use VAOs without index buffer containing just the data for 2 triangles and glDrawArrays - maybe that's uncool for some reason?)?

That's somehow a strange question, I thought we're somehow limited by what we do in the shader for fullscreen quads, and that the method doesn't really matters? If it does matter, I'd pick the one that is changing the least states (the less calls). Actually the immediate mode is a good candidate. If you're really running a GL 3.3 context (i.e. you use the extra extensions to create specific contexts of a given GL version either in debug,compatible or core profiles) the immediate mode isn't available.

Quote:

4. What is the tiniest solution for rendering a fullscreen quad in GL >= 3.3? (I thought of something like binding a "NULL" buffer and let the vertex shader do the job - not sure how to implement that in GL - in DX it's simple).

glRecti( -1, -1, 1, 1 );

Code:


static PIXELFORMATDESCRIPTOR pfd = 
{ 0, 0, PFD_SUPPORT_OPENGL | PFD_DOUBLEBUFFER, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0 };


	HDC hDC = GetDC( CreateWindowA( "edit", 0, WS_POPUP|WS_MAXIMIZE, 0, 0, 0, 0, 0, 0, 0, 0 ) );
	SetPixelFormat ( hDC, ChoosePixelFormat ( hDC, &pfd ) , &pfd );
	wglMakeCurrent ( hDC, wglCreateContext( hDC ) );
	ShowCursor( 0 );

	do
	{
		glRecti(-1,-1,1,1);
		SwapBuffers(hDC);
	}
	while (!GetAsyncKeyState(VK_ESCAPE));

	ExitProcess(0);

Have fun :-)

added on the 2012-01-21 23:17:07 by nystep

Quote:

Current ( >= 3.3) OpenGL - some questions.

I'm sorry nystep... No news - just for you from the source on the screen to my right side - and that's not >= 3.3 iirc; iirc glRecti is not available in >= 3.3

Code:


  	push edx
 	push edx
 	neg edx
 	push edx
 	push edx
	call _glRecti@16

Quote:

3. draw a triangle instead, to get less wasted pixels along the diagonal....

Ouch, ouch ouch, I guess that's a good point! That explains a lot...
So going for one triangle is the way to go - performance wise?

added on the 2012-01-22 00:37:01 by las

I bet your performance bottleneck isnt drawing the initial quad, but the shader ;)

added on the 2012-01-22 08:19:02 by rasmus/loonies

Yea, nVidia has been recommending the single triangle approach for years. The performance difference isn't exactly earth-shattering, but still.

added on the 2012-01-22 10:41:12 by Scali

Mmm, but just out of curiosity, what is the reason you're actually using GL >= 3.3 for in your program/application/intro?

added on the 2012-01-22 15:06:04 by nystep

see answer above.

added on the 2012-01-22 16:17:33 by las

Quote:

using *EXT and *ARB is something I don't like - I prefer sticking to a more or less properly defined specification - avoiding total ATI/NV compatibility horror.

IMO it won't be avoided this way. Have fun. kthxbye. :)

added on the 2012-01-22 22:34:46 by nystep

To a certain degree it's a better idea to stick to a specific specification. :)

added on the 2012-01-22 22:36:17 by las

I ask because you're looking like you misunderstood that the GLSL version used in the shader is completely orthogonal and independent of the version of the OpenGL context. But i can be wrong. :)

added on the 2012-01-22 22:39:52 by nystep

in a shader/in the shader. typos as usual.

added on the 2012-01-22 22:41:41 by nystep

actually you were not so wrong at all with the glRect thing... with compatibility profiles you can still do all the dirty things...
[quote]
An OpenGL 4.2 implementation must be able to create a context supporting
the core profile, and may also be able to create a context supporting the compatibility profile.[/code]
I guess now it's up to ATI to go core profile only... Not testet yet what they are doing ;)

added on the 2012-01-23 01:12:19 by las

fail.

Quote:

An OpenGL 4.2 implementation must be able to create a context supporting the core profile, and may also be able to create a context supporting the compatibility profile.

added on the 2012-01-23 01:13:09 by las

But you're right that glRecti isn't going to be always supported in newer opengl drivers when you create contexts >= 3.30. The triangle trick looks good. And the rect thing works if edx != 0 if i understood correctly. :) I'm so bad at coding. :)

added on the 2012-01-23 12:01:30 by nystep

you are right about the edx. :)
glRecti should be supported using a 4.2 compatibility profile (also known as FULL profile) - but it's up to your gfx-card vendor to implement compatibility profile support.
This intro uses a 4.2 context and glRecti - therefore pretty sure compatibility profile.

added on the 2012-01-23 12:25:24 by las

Actually I just tried it quickly and I noticed that wglGetProcAddress returns a valid and functionnal function pointer to glCreateShaderProgramv under the normal and basic 4k opengl context (opengl 1.4 here on windows 7 64 bits). The fragment shader compiles as fine as with the traditional method.

glCreateShaderProgramv is stitching shaders and compiling with a single call. Pure awesomeness and love from the Khronos Group to the 4k community! <3 :)

I'll have to cleanup to post a serious figure of how much it reduces executable size.. But yea, thanks a lot for pointing out, it's great. :)

added on the 2012-01-23 15:08:52 by nystep

Only 56 bytes get saved, apparently you'd still need to call glUseProgram to bind the shader after compiling.

added on the 2012-01-23 15:31:02 by nystep

@nystep
GLSL and context versions are not fully orthogonal. for example, on osx you get a 2.1 compatibility context by default, and it supports GLSL 1.20 max.
on win32 you actually get a context that is guaranteed to be backward-compatible with 2.X. and according to official docs, you need to use wgl extensions for a "new" (>3.X, 4.X) context. but in practice it seems that you get the new context already, just in some weird backwards-compatible mode (all these fancy glRecti).
(and thanks for glCreateShaderProgramv, i didn't know such marvellous thing existed :D)

@las
so, if you just get your first context via simple wglCreateContext and no fancy wgl extensions, you are guaranteed to have glRecti. and if you're lucky you already get the "new" context at this point for free.

that were my thought, feel free to prove them wrong :D.

added on the 2012-01-23 15:40:49 by provod

I don't see why one should use 3.3 Core Profile in 4k intros at all. The space required for the wglCreateContextAttribsARB() call and its attributes (which alone are 28 bytes uncompressed) is probably better spent on other stuff. Also you can't use that nice glRecti() hack and have to go down the glVertexAttribArray()+glDrawArrays() route (the vertexid trick mentioned by hornet is D3D-only, AFAIK).

If you are concerned about deprecation of Compatibility Profile in newer drivers, don't worry. If you use good old wglCreateContext() for context creation, it acts as if you called wglCreateContextAttribsARB() with an empty parameter list, hence default values are used. And the WGL_ARB_create_context spec says:

Quote:

The default values for WGL_CONTEXT_MAJOR_VERSION_ARB and WGL_CONTEXT_MINOR_VERSION_ARB are 1 and 0 respectively. In this case, implementations will typically return the most recent version of OpenGL they support which is backwards compatible with OpenGL 1.0 (e.g. 3.0, 3.1 + GL_ARB_compatibility, or 3.2 compatibility profile)

The only thing that could theoretically happen is that nVidia or AMD suddenly decide to create OpenGL 1.0 or 2.1 contexts instead of Compatibility Profile of the highest possible version when no specific version is requested. This is extremely unlikely to happen though, as is would break tons of existing software.

In other words, if you get a 4.2 Compatibility Profile context using wglCreateContext() today, you can safely assume that you will get at least that version tomorrow as well. Also, you will never get a Core Profile context using this method, so glRecti() and friends are fair game.

added on the 2012-01-23 15:57:19 by KeyJ

pouët.net

Current ( >= 3.3) OpenGL - some questions.

login