The fastest way of rendering a huuuge model ?
category: general [glöplog]
A friend of mine just bumped into this problem and i realized i have never had such issues, as the model always fit into the card memory.
So, there's a large model, say over 3 millions of vertices, and we need to pass normal, tangent, binormal, texcoords, color etc etc so the vertex chunks are pretty big themselves.
What would be the fastest way of rendering it then, as it doesnt fit into VRAM ?
One point is that either tangent or binormal could be restored in vertex shader has it two other ( normal, binormal or normal, tangent respectively ) values. Tho i really doubt that'd save any time comparing to saved memory-transfer time, atleast not with current gpu performance.
I also had this idea of splitting model into two subsets ( submodels actually, index spaces should not overlap for method to work efficiently) the way that one part fulfills VRAM ( leaving some space for rendertargets, textures etc ), so that we could store it at the card all the time. Another part, kind of leftovers, could just be drawn from RAM using DrawIndexedPrimitiveUP in terms of d3d. So we have to transfer less data from RAM, which could save some time.
And there are some obvious solutions like rendering the whole model from RAM or rewriting on-card vertex and index buffers multiple times a frame with a new data and thus rendering model by kindof 'subsets' imitation.
Also i'm curious if gfx card needs some vram free to do ...PrimitiveUp renders, and how big is the buffer it uses, if so.
Aight, any ideas ? :)
So, there's a large model, say over 3 millions of vertices, and we need to pass normal, tangent, binormal, texcoords, color etc etc so the vertex chunks are pretty big themselves.
What would be the fastest way of rendering it then, as it doesnt fit into VRAM ?
One point is that either tangent or binormal could be restored in vertex shader has it two other ( normal, binormal or normal, tangent respectively ) values. Tho i really doubt that'd save any time comparing to saved memory-transfer time, atleast not with current gpu performance.
I also had this idea of splitting model into two subsets ( submodels actually, index spaces should not overlap for method to work efficiently) the way that one part fulfills VRAM ( leaving some space for rendertargets, textures etc ), so that we could store it at the card all the time. Another part, kind of leftovers, could just be drawn from RAM using DrawIndexedPrimitiveUP in terms of d3d. So we have to transfer less data from RAM, which could save some time.
And there are some obvious solutions like rendering the whole model from RAM or rewriting on-card vertex and index buffers multiple times a frame with a new data and thus rendering model by kindof 'subsets' imitation.
Also i'm curious if gfx card needs some vram free to do ...PrimitiveUp renders, and how big is the buffer it uses, if so.
Aight, any ideas ? :)
downsample
and switch to pokemon mini
skrebbel : no, he can't do it really. it's precision what is appreciated in the project, exact model must be rendered. that's the point.
you go and buy a bigger card obviously
On the other hand, considering the fact that at standart resolutions there're too many triangles laying a claim to a pixel, yes, a slightly downsampled model could do, tho one would have to build such a model ( or atleast recalculate normals ) every frame to ensure precise shading.
But still, having one triangle per pixel, it's over 1.3 million triangles visible, so the memory problem is not solved.
BSP trees and such do not really do the trick in this very case, as the model is not some race-game-city-model where only 1/100th of it is visible at the moment, at least 50-60% of the model are visible every frame, and that's only in the case of transparency disabled.
But still, having one triangle per pixel, it's over 1.3 million triangles visible, so the memory problem is not solved.
BSP trees and such do not really do the trick in this very case, as the model is not some race-game-city-model where only 1/100th of it is visible at the moment, at least 50-60% of the model are visible every frame, and that's only in the case of transparency disabled.
First ditch the tangent or binormal and calculate in the vertex shader.
Then I'd try is using smaller vertex formats. So, instead of, say, having normals/tangents in floats, you have encode them in D3DCOLORS (1 byte per component, 1 unused) or 10:10:10:2 format. For position/normal/tangent/UV/color, this gets vertex size from 48 bytes to 32 bytes, which is not miracles, but now a 3 million size vertex buffer takes 96MB instead of 144MB.
Then you split the model into chunks, put into managed buffers, render separately and hey, you have a nice stress test of D3D resource manager!
Then I'd try is using smaller vertex formats. So, instead of, say, having normals/tangents in floats, you have encode them in D3DCOLORS (1 byte per component, 1 unused) or 10:10:10:2 format. For position/normal/tangent/UV/color, this gets vertex size from 48 bytes to 32 bytes, which is not miracles, but now a 3 million size vertex buffer takes 96MB instead of 144MB.
Then you split the model into chunks, put into managed buffers, render separately and hey, you have a nice stress test of D3D resource manager!
yeah, that's a nice tip indeed, thanks NeARAZ.
The only thing i didn't get is how can you handle the 10:10:10:2 format in a shader?
Anyway, it's aimed for compatibility too, so what i meant was how to do it best on a 64mb card ? :D
The only thing i didn't get is how can you handle the 10:10:10:2 format in a shader?
Anyway, it's aimed for compatibility too, so what i meant was how to do it best on a 64mb card ? :D
I guess asking "what the hell is it?" would be pointless, because the answer would either be "you'll see" or "it's a secret". Right? :)
well it's not related to a demo, i dont know exactly, but i suppose it's a laser 3d-scanner output reconstruction project or something. not a commercial one, too. pure science. i will ask on opportunity.
Rendering a 3 million vertex mesh and using normal maps is sick indeed.
academists have to wait for newer graphics and computer technologies sadly.
can't you, like, split the data into a grid, and sortof mipmapdownsample them a few times into system mem, and show the highdetail cubes of the grid only when the camera is close? i'm sure there should be difficult words made up by beardy professors for stuff like this, but it doesn't sound so hard. naturally this only increases size, but not on vidmem.
or LOD?
skrebbel: an octree? :)
skrebbels is confused, he probably meant a c\-/tree
Software rendering.
Or... maybe precalc the lightning and color per triangle? If you has just vertice, triangle and color data, maybe it fits in vram
just search nv for var... couple of var buffers, transfer to one while rendering the other.. gets you peak throughput rendering from sysmem.
in other news, you are doing something fundamentally wrong by not using lod.
in other news, you are doing something fundamentally wrong by not using lod.
Maybe this one day, i.e. look at the boeing CAD model stuff ;]
...and in my opinion this publication is quite interesting.
what about displacement mapping? maybe that could help to reduce vertices a bit...
Does anyone know
how to filter 64bits cubemap
with ati card ??
how to filter 64bits cubemap
with ati card ??
fadeout: 10:10:10:2 format is D3DDECLTYPE_UDEC3 or D3DDECLTYPE_DEC3N (of course, not every hardware supports that... if not, you just fallback to 8:8:8:8 aka D3DCOLOR and do a foo*2-1 in the shader)
if there would be a fast way to render all that on consumer hardware, carmack would have done it already?