Optimizing Closure
category: code [glöplog]
OK now here's maths for noobs. Yes hArDy, I'm looking at you!
Let's talk about FPS. That's the amout of Frames that are rendered in one second (1s).
Let's talk about ms. That's 1/1000 of one second (1s).
Let's talk about realtime. That's an *expected* framerate of 60 FPS.
Now here comes the math. If I want 60 frames in one second, how long can one frame be at most? *scratches on paper* That's 1/60s = 0,01667s = 16,67ms. Exactly. Your whole rendering pipeline must render everything in less than 16,67ms to stay fluent.
Now let's talk about performance measurements. I trust smash's experience which shows in his awesome intros, to make an accurate measurement. If your PostProcess FX chain takes fucking 4,5ms, then you're forefitting a large part of computing power.
How much? Let's talk about maths. If 16,67ms is 100% then 4,5ms is 26,99%. Thats a fucking quarter of your entire frame rendering time that's available to you!
Now let's talk about for free... ignorance surely is bliss. Now hArDy, go read ryg's block and then come back with REAL arguments, if you're going to discuss with people who actually do have a clue.
Let's talk about FPS. That's the amout of Frames that are rendered in one second (1s).
Let's talk about ms. That's 1/1000 of one second (1s).
Let's talk about realtime. That's an *expected* framerate of 60 FPS.
Now here comes the math. If I want 60 frames in one second, how long can one frame be at most? *scratches on paper* That's 1/60s = 0,01667s = 16,67ms. Exactly. Your whole rendering pipeline must render everything in less than 16,67ms to stay fluent.
Now let's talk about performance measurements. I trust smash's experience which shows in his awesome intros, to make an accurate measurement. If your PostProcess FX chain takes fucking 4,5ms, then you're forefitting a large part of computing power.
How much? Let's talk about maths. If 16,67ms is 100% then 4,5ms is 26,99%. Thats a fucking quarter of your entire frame rendering time that's available to you!
Now let's talk about for free... ignorance surely is bliss. Now hArDy, go read ryg's block and then come back with REAL arguments, if you're going to discuss with people who actually do have a clue.
Ah and kb...
I forgive you for not having turned off VSYNC :)
Quote:
fucking 16 milliseconds.
I forgive you for not having turned off VSYNC :)
k, when people getting ponied already it's time for a weird proposal to the initial question.
The backbuffer is scaled down for post processing anyway, so you could encode the image into BC4 (or BC1 for colored glow) in that step. (Visit link below for a how-to)
Using that new compressed texture the texture cache can now store 8x as much texels as before, resulting in less cache misses and thereby less stalling by waiting for texture reads. This approach may be most useful when you have alot and somewhat random samples (SSAO?) and assumes the cache uses a LRU strategy and the fragments are somewhat local to each other ofc. It's lossy compression though so handling artefacts will be a problem. Maybe later BCn formats work better.
In case someone wants to explore this further, here's a link to a presentation how to render to BCn with DirectX
http://twvideo01.ubm-us.net/o1/vault/gdc10/slides/Tranchida_TextureCompressionInRealtime.pdf
The backbuffer is scaled down for post processing anyway, so you could encode the image into BC4 (or BC1 for colored glow) in that step. (Visit link below for a how-to)
Using that new compressed texture the texture cache can now store 8x as much texels as before, resulting in less cache misses and thereby less stalling by waiting for texture reads. This approach may be most useful when you have alot and somewhat random samples (SSAO?) and assumes the cache uses a LRU strategy and the fragments are somewhat local to each other ofc. It's lossy compression though so handling artefacts will be a problem. Maybe later BCn formats work better.
In case someone wants to explore this further, here's a link to a presentation how to render to BCn with DirectX
http://twvideo01.ubm-us.net/o1/vault/gdc10/slides/Tranchida_TextureCompressionInRealtime.pdf