Current ( >= 3.3) OpenGL - some questions.
category: code [glöplog]
Hi fellow sceners,
given that I'm to lazy to read tons of spec pages now - I'll ask here first a couple of questions:
1. What is the best/fastest way of rendering to texture, rendering to multiple render targets
Now that there are PBOs, FBOs and so on... what's the proper way to do that?
2. What kind of float texture whatever buffer format should one use - what can be considered the fastest?
3. What is the best/fastest way to render a fullscreen quad with current GL (currently we use VAOs without index buffer containing just the data for 2 triangles and glDrawArrays - maybe that's uncool for some reason?)?
4. What is the tiniest solution for rendering a fullscreen quad in GL >= 3.3? (I thought of something like binding a "NULL" buffer and let the vertex shader do the job - not sure how to implement that in GL - in DX it's simple).
Thanks in advance ;)
Code snippets appreciated.
given that I'm to lazy to read tons of spec pages now - I'll ask here first a couple of questions:
1. What is the best/fastest way of rendering to texture, rendering to multiple render targets
Now that there are PBOs, FBOs and so on... what's the proper way to do that?
2. What kind of float texture whatever buffer format should one use - what can be considered the fastest?
3. What is the best/fastest way to render a fullscreen quad with current GL (currently we use VAOs without index buffer containing just the data for 2 triangles and glDrawArrays - maybe that's uncool for some reason?)?
4. What is the tiniest solution for rendering a fullscreen quad in GL >= 3.3? (I thought of something like binding a "NULL" buffer and let the vertex shader do the job - not sure how to implement that in GL - in DX it's simple).
Thanks in advance ;)
Code snippets appreciated.
Code:
10 PRINT "pants off!"
20 GOTO 10
i know it doesn't answer your questions, but i'm just curious what do you need opengl>3 for? afaik, (0) every new functionality is already available in extensions in older contexts, (1) on win32 you would still need to wglGetProcAddress all the gl>3 functions, so no gain here either.
am i missing something here?
am i missing something here?
using *EXT and *ARB is something I don't like - I prefer sticking to a more or less properly defined specification - avoiding total ATI/NV compatibility horror.
And consider the wglGetProcAddress problem solved. ;)
And consider the wglGetProcAddress problem solved. ;)
Quote:
1. What is the best/fastest way of rendering to texture, rendering to multiple render targets. Now that there are PBOs, FBOs and so on... what's the proper way to do that?
FBOs.
Quote:
2. What kind of float texture whatever buffer format should one use - what can be considered the fastest?
The one with least bits is always the fastest.
Quote:
3. What is the best/fastest way to render a fullscreen quad with current GL (currently we use VAOs without index buffer containing just the data for 2 triangles and glDrawArrays - maybe that's uncool for some reason?)?
If you're drawing one quad, the method of doing it doesn't really affect your performance. So yeah, a triangle strip of two triangles is the usual method.
Quote:
4. What is the tiniest solution for rendering a fullscreen quad in GL >= 3.3? (I thought of something like binding a "NULL" buffer and let the vertex shader do the job - not sure how to implement that in GL - in DX it's simple).
Now, this is something a 4k intro coder could answer.
http://opengl.org/wiki
3. draw a triangle instead, to get less wasted pixels along the diagonal (pixels rendered in batches of quads on most platforms these days).
4. you can use the vertexid, see the fxaa vertex-shader
4. you can use the vertexid, see the fxaa vertex-shader
Hi las,
FBOs are the proper way, and shader_image_load_store the fastest and possibly the smallest as well now from what I've seen. Note that shader_image_load_store is a SM5 (in D3D terminology) extension. Don't expect it to be supported on SM4 hardware.
PBOs are only needed if you need to stream data between a given texture and the central memory.
32 bits and lower bpp formats run at fullspeed, 64 bpp at half the sampling rate, and 128 bpp at a quarter, from what I've seen from some benchmarks. Maybe some of the most recent NV GPUs can do 64bpp nearest/bilinear at fullspeed. I'm not sure. You'll have to ask a testing guru here. :)
That's somehow a strange question, I thought we're somehow limited by what we do in the shader for fullscreen quads, and that the method doesn't really matters? If it does matter, I'd pick the one that is changing the least states (the less calls). Actually the immediate mode is a good candidate. If you're really running a GL 3.3 context (i.e. you use the extra extensions to create specific contexts of a given GL version either in debug,compatible or core profiles) the immediate mode isn't available.
glRecti( -1, -1, 1, 1 );
Have fun :-)
Quote:
1. What is the best/fastest way of rendering to texture, rendering to multiple render targets
Now that there are PBOs, FBOs and so on... what's the proper way to do that?
FBOs are the proper way, and shader_image_load_store the fastest and possibly the smallest as well now from what I've seen. Note that shader_image_load_store is a SM5 (in D3D terminology) extension. Don't expect it to be supported on SM4 hardware.
PBOs are only needed if you need to stream data between a given texture and the central memory.
Quote:
2. What kind of float texture whatever buffer format should one use - what can be considered the fastest?
32 bits and lower bpp formats run at fullspeed, 64 bpp at half the sampling rate, and 128 bpp at a quarter, from what I've seen from some benchmarks. Maybe some of the most recent NV GPUs can do 64bpp nearest/bilinear at fullspeed. I'm not sure. You'll have to ask a testing guru here. :)
Quote:
3. What is the best/fastest way to render a fullscreen quad with current GL (currently we use VAOs without index buffer containing just the data for 2 triangles and glDrawArrays - maybe that's uncool for some reason?)?
That's somehow a strange question, I thought we're somehow limited by what we do in the shader for fullscreen quads, and that the method doesn't really matters? If it does matter, I'd pick the one that is changing the least states (the less calls). Actually the immediate mode is a good candidate. If you're really running a GL 3.3 context (i.e. you use the extra extensions to create specific contexts of a given GL version either in debug,compatible or core profiles) the immediate mode isn't available.
Quote:
4. What is the tiniest solution for rendering a fullscreen quad in GL >= 3.3? (I thought of something like binding a "NULL" buffer and let the vertex shader do the job - not sure how to implement that in GL - in DX it's simple).
glRecti( -1, -1, 1, 1 );
Code:
static PIXELFORMATDESCRIPTOR pfd =
{ 0, 0, PFD_SUPPORT_OPENGL | PFD_DOUBLEBUFFER, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0 };
HDC hDC = GetDC( CreateWindowA( "edit", 0, WS_POPUP|WS_MAXIMIZE, 0, 0, 0, 0, 0, 0, 0, 0 ) );
SetPixelFormat ( hDC, ChoosePixelFormat ( hDC, &pfd ) , &pfd );
wglMakeCurrent ( hDC, wglCreateContext( hDC ) );
ShowCursor( 0 );
do
{
glRecti(-1,-1,1,1);
SwapBuffers(hDC);
}
while (!GetAsyncKeyState(VK_ESCAPE));
ExitProcess(0);
Have fun :-)
Quote:
Current ( >= 3.3) OpenGL - some questions.
I'm sorry nystep... No news - just for you from the source on the screen to my right side - and that's not >= 3.3 iirc; iirc glRecti is not available in >= 3.3
Code:
push edx
push edx
neg edx
push edx
push edx
call _glRecti@16
Quote:
3. draw a triangle instead, to get less wasted pixels along the diagonal....
Ouch, ouch ouch, I guess that's a good point! That explains a lot...
So going for one triangle is the way to go - performance wise?
I bet your performance bottleneck isnt drawing the initial quad, but the shader ;)
Yea, nVidia has been recommending the single triangle approach for years. The performance difference isn't exactly earth-shattering, but still.
Mmm, but just out of curiosity, what is the reason you're actually using GL >= 3.3 for in your program/application/intro?
see answer above.
Quote:
using *EXT and *ARB is something I don't like - I prefer sticking to a more or less properly defined specification - avoiding total ATI/NV compatibility horror.
IMO it won't be avoided this way. Have fun. kthxbye. :)
To a certain degree it's a better idea to stick to a specific specification. :)
I ask because you're looking like you misunderstood that the GLSL version used in the shader is completely orthogonal and independent of the version of the OpenGL context. But i can be wrong. :)
in a shader/in the shader. typos as usual.
actually you were not so wrong at all with the glRect thing... with compatibility profiles you can still do all the dirty things...
[quote]
An OpenGL 4.2 implementation must be able to create a context supporting
the core profile, and may also be able to create a context supporting the compatibility profile.[/code]
I guess now it's up to ATI to go core profile only... Not testet yet what they are doing ;)
[quote]
An OpenGL 4.2 implementation must be able to create a context supporting
the core profile, and may also be able to create a context supporting the compatibility profile.[/code]
I guess now it's up to ATI to go core profile only... Not testet yet what they are doing ;)
fail.
Quote:
An OpenGL 4.2 implementation must be able to create a context supporting the core profile, and may also be able to create a context supporting the compatibility profile.
But you're right that glRecti isn't going to be always supported in newer opengl drivers when you create contexts >= 3.30. The triangle trick looks good. And the rect thing works if edx != 0 if i understood correctly. :) I'm so bad at coding. :)
you are right about the edx. :)
glRecti should be supported using a 4.2 compatibility profile (also known as FULL profile) - but it's up to your gfx-card vendor to implement compatibility profile support.
This intro uses a 4.2 context and glRecti - therefore pretty sure compatibility profile.
glRecti should be supported using a 4.2 compatibility profile (also known as FULL profile) - but it's up to your gfx-card vendor to implement compatibility profile support.
This intro uses a 4.2 context and glRecti - therefore pretty sure compatibility profile.
Actually I just tried it quickly and I noticed that wglGetProcAddress returns a valid and functionnal function pointer to glCreateShaderProgramv under the normal and basic 4k opengl context (opengl 1.4 here on windows 7 64 bits). The fragment shader compiles as fine as with the traditional method.
glCreateShaderProgramv is stitching shaders and compiling with a single call. Pure awesomeness and love from the Khronos Group to the 4k community! <3 :)
I'll have to cleanup to post a serious figure of how much it reduces executable size.. But yea, thanks a lot for pointing out, it's great. :)
glCreateShaderProgramv is stitching shaders and compiling with a single call. Pure awesomeness and love from the Khronos Group to the 4k community! <3 :)
I'll have to cleanup to post a serious figure of how much it reduces executable size.. But yea, thanks a lot for pointing out, it's great. :)
Only 56 bytes get saved, apparently you'd still need to call glUseProgram to bind the shader after compiling.
@nystep
GLSL and context versions are not fully orthogonal. for example, on osx you get a 2.1 compatibility context by default, and it supports GLSL 1.20 max.
on win32 you actually get a context that is guaranteed to be backward-compatible with 2.X. and according to official docs, you need to use wgl extensions for a "new" (>3.X, 4.X) context. but in practice it seems that you get the new context already, just in some weird backwards-compatible mode (all these fancy glRecti).
(and thanks for glCreateShaderProgramv, i didn't know such marvellous thing existed :D)
@las
so, if you just get your first context via simple wglCreateContext and no fancy wgl extensions, you are guaranteed to have glRecti. and if you're lucky you already get the "new" context at this point for free.
that were my thought, feel free to prove them wrong :D.
GLSL and context versions are not fully orthogonal. for example, on osx you get a 2.1 compatibility context by default, and it supports GLSL 1.20 max.
on win32 you actually get a context that is guaranteed to be backward-compatible with 2.X. and according to official docs, you need to use wgl extensions for a "new" (>3.X, 4.X) context. but in practice it seems that you get the new context already, just in some weird backwards-compatible mode (all these fancy glRecti).
(and thanks for glCreateShaderProgramv, i didn't know such marvellous thing existed :D)
@las
so, if you just get your first context via simple wglCreateContext and no fancy wgl extensions, you are guaranteed to have glRecti. and if you're lucky you already get the "new" context at this point for free.
that were my thought, feel free to prove them wrong :D.
I don't see why one should use 3.3 Core Profile in 4k intros at all. The space required for the wglCreateContextAttribsARB() call and its attributes (which alone are 28 bytes uncompressed) is probably better spent on other stuff. Also you can't use that nice glRecti() hack and have to go down the glVertexAttribArray()+glDrawArrays() route (the vertexid trick mentioned by hornet is D3D-only, AFAIK).
If you are concerned about deprecation of Compatibility Profile in newer drivers, don't worry. If you use good old wglCreateContext() for context creation, it acts as if you called wglCreateContextAttribsARB() with an empty parameter list, hence default values are used. And the WGL_ARB_create_context spec says:
The only thing that could theoretically happen is that nVidia or AMD suddenly decide to create OpenGL 1.0 or 2.1 contexts instead of Compatibility Profile of the highest possible version when no specific version is requested. This is extremely unlikely to happen though, as is would break tons of existing software.
In other words, if you get a 4.2 Compatibility Profile context using wglCreateContext() today, you can safely assume that you will get at least that version tomorrow as well. Also, you will never get a Core Profile context using this method, so glRecti() and friends are fair game.
If you are concerned about deprecation of Compatibility Profile in newer drivers, don't worry. If you use good old wglCreateContext() for context creation, it acts as if you called wglCreateContextAttribsARB() with an empty parameter list, hence default values are used. And the WGL_ARB_create_context spec says:
Quote:
The default values for WGL_CONTEXT_MAJOR_VERSION_ARB and WGL_CONTEXT_MINOR_VERSION_ARB are 1 and 0 respectively. In this case, implementations will typically return the most recent version of OpenGL they support which is backwards compatible with OpenGL 1.0 (e.g. 3.0, 3.1 + GL_ARB_compatibility, or 3.2 compatibility profile)
The only thing that could theoretically happen is that nVidia or AMD suddenly decide to create OpenGL 1.0 or 2.1 contexts instead of Compatibility Profile of the highest possible version when no specific version is requested. This is extremely unlikely to happen though, as is would break tons of existing software.
In other words, if you get a 4.2 Compatibility Profile context using wglCreateContext() today, you can safely assume that you will get at least that version tomorrow as well. Also, you will never get a Core Profile context using this method, so glRecti() and friends are fair game.