What's a fillrate and how does it work?
category: gfx [glöplog]
Let's say I have an awesome eyefinity setup… 6 displays with 2560x1600 pixels apiece
That is 24.576 million pixels. Let's also say that that I am running these things at 120 frames per second…
Lets also assume that there are at least 8 raster ops per pixel as of overdraw, multiple passes, render to texture, anti-aliasing, overhead, etc...
That would come out to be 22.5 gigapixels a second, or a little under the theoretical pixel fillrate of the Radeon 6850.
However, I have yet to see a [modern gaming] benchmark that goes anywhere near 120 frames per second on three 1920x1080 screens for the 6850. So...
Explain plox? Fillrate, not magnets.
That is 24.576 million pixels. Let's also say that that I am running these things at 120 frames per second…
Lets also assume that there are at least 8 raster ops per pixel as of overdraw, multiple passes, render to texture, anti-aliasing, overhead, etc...
That would come out to be 22.5 gigapixels a second, or a little under the theoretical pixel fillrate of the Radeon 6850.
However, I have yet to see a [modern gaming] benchmark that goes anywhere near 120 frames per second on three 1920x1080 screens for the 6850. So...
Explain plox? Fillrate, not magnets.
The hardware does more than just filling areas with plain colors ? All the computations in shaders takes GPU time, which is not spent filling ?
>The hardware does more than just filling areas with plain colors
Given the fact that each ROP tends to have one or more texture unit behind it (that can do loopbacks!!), and each tex unit can do 1 perspective corrected bilinear filtered sample per cycle... it is hard to believe that ROPs are sitting idle while texture samplers and shaders do their magic.
I already gave a generous 1 pixel drawn for every 8 potential ROP draw or any other operation.
Given the fact that each ROP tends to have one or more texture unit behind it (that can do loopbacks!!), and each tex unit can do 1 perspective corrected bilinear filtered sample per cycle... it is hard to believe that ROPs are sitting idle while texture samplers and shaders do their magic.
I already gave a generous 1 pixel drawn for every 8 potential ROP draw or any other operation.
You have no idea how many awesome words there is in that sentence, that I have no clue what means :D
hurp durp
QUINTIX: http://en.wikipedia.org/wiki/Figure_of_merit :)
it's l33tsp34k for how much ya biaatch is a ho, bro!
I guess Gargaj has the closest answer, but still...
I'm sorry if I wasn't clear. I was not trying to sound smart. In fact, I have on more than one occasion revealed how ignorant I am.
So far my understanding is:
Hardware GPUs, from the begining, had two things:
Texture samplers, that picked a texel, did some blending for bi/tri/ansio filtering
Raster ops, that placed the picked texel on the screen
Multiply the the number of raster ops, or ROPS, by the clock, and you get the number of pixels drawn in a specific amount of time.
Do the same thing for the texture samplers, which aways equaled or where greater in number than the ROPs, and you get the number of texels can be sampled per period of time.
I would understand if the actuall usable fillrate where 1/8th the theoretical fillrate, but it seems to be more like 1/64th or less. Why is that? Is there any benefit to having 16+ ROPs if they spend most of the time waiting on data from shaders? Considering how expensive they are, I doubt design houses like AMD or NVIDIA just put them in for marketing. This is not hard drive manufactures boasting "SATA 3.0 compliance" and the bandwidth of the interface. Raster ops are real hardware that take up real die space. So, what's up? Is my understanding of fill rate fundamentally wrong?
I'm sorry if I wasn't clear. I was not trying to sound smart. In fact, I have on more than one occasion revealed how ignorant I am.
So far my understanding is:
Hardware GPUs, from the begining, had two things:
Texture samplers, that picked a texel, did some blending for bi/tri/ansio filtering
Raster ops, that placed the picked texel on the screen
Multiply the the number of raster ops, or ROPS, by the clock, and you get the number of pixels drawn in a specific amount of time.
Do the same thing for the texture samplers, which aways equaled or where greater in number than the ROPs, and you get the number of texels can be sampled per period of time.
I would understand if the actuall usable fillrate where 1/8th the theoretical fillrate, but it seems to be more like 1/64th or less. Why is that? Is there any benefit to having 16+ ROPs if they spend most of the time waiting on data from shaders? Considering how expensive they are, I doubt design houses like AMD or NVIDIA just put them in for marketing. This is not hard drive manufactures boasting "SATA 3.0 compliance" and the bandwidth of the interface. Raster ops are real hardware that take up real die space. So, what's up? Is my understanding of fill rate fundamentally wrong?
you can also play youtube videos.
Quintix: one way to know if the theory matches the practice.
Make yourself a program that just fills as many pixels as you can, without any shading, just flat colors, and check the frame-rate if gives you.
Maybe the numbers will match :)
or maybe not.
In case of real tasks, I would suppose that the card has MANY bottle necks, not just fillrate. Stuff like transform, lighting/shading, texture fetching, etc.
Makes me think of the NDS. that machine is so crappy there's a hard limit on polygons per screen, but also one of polygon segments on the same horizontal line - not fun to work with.
Make yourself a program that just fills as many pixels as you can, without any shading, just flat colors, and check the frame-rate if gives you.
Maybe the numbers will match :)
or maybe not.
In case of real tasks, I would suppose that the card has MANY bottle necks, not just fillrate. Stuff like transform, lighting/shading, texture fetching, etc.
Makes me think of the NDS. that machine is so crappy there's a hard limit on polygons per screen, but also one of polygon segments on the same horizontal line - not fun to work with.
I believe a fillrate is used in a demoscene.
for one thing, you wont get anywhere near the ideal unless you're rendering a single fullscreen triangle - most modern gaming benchmarks do a bit more than that.
saturating the rop (=writes, not texture reads) on modern hardware is quite difficult in my experience unless you're doing something particularly designed to do it. it's essentially a memory bandwidth issue, so if you're aiming for the figures you could try binding the max number and size of multiple render targets - on dx11 thats 6? or 8? x 128bit. or, conversely, normalise your figures (presumably based on 1x 32 bit render target) by that factor (/ 8x4).
i'd like to point you (and everyone else who wants to understand modern gpus) at ryg's excellent series of articles here: http://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/
and then point all the questions at him too :)
saturating the rop (=writes, not texture reads) on modern hardware is quite difficult in my experience unless you're doing something particularly designed to do it. it's essentially a memory bandwidth issue, so if you're aiming for the figures you could try binding the max number and size of multiple render targets - on dx11 thats 6? or 8? x 128bit. or, conversely, normalise your figures (presumably based on 1x 32 bit render target) by that factor (/ 8x4).
i'd like to point you (and everyone else who wants to understand modern gpus) at ryg's excellent series of articles here: http://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/
and then point all the questions at him too :)
I wanted to write something here, but what everyone said was said already. Games are not good example for testing theoritical fillrate, they do much more things on the CPU side too. Draw a triangle of death or something :)
Hold up. Wtf is a pixel?? Is that what the kids are calling cubes these days?
See, this (being that article) is the kind of explanation that is actually useful because it gives you the whole picture instead of really detailed specific parts of it, which is what I usually encounter and am frustrated by. Squee.
ferris: it's a term used in the gaming community, those kids have no respect for demoscene history.
dagnabbit!!
Quote:
I bow in awesomeness over that sentence.but what everyone said was said already
Just a note: I have already spent far too much time reading the blog series in question, and I keep finding more over there.
Serious kudos to ryg for this. Hmm, I wonder what the most current linkfarm with a list of useful articles like this is . . .
Serious kudos to ryg for this. Hmm, I wonder what the most current linkfarm with a list of useful articles like this is . . .