memory usage
category: general [glöplog]
I don't have the habit of complaining about demo requirements, but i have something against the huge
memory usage we get from some recent demos..
not much because it would mean upgrading my 256meg
to something higher, but mostly because even with a lot
of mem, due to win9x inane vm, you're never assured
it's not going to swap something out and kill the framerate for one or two seconds.. It gets better with nt kernels of course.
Is the memory usage there only because people like to take it easy and load everything in mem at start and not do any further access, or is there anything special
in todays demos that take a whole lot of memory.
(I'd think the days of massive precals are over, with
the hardware we have)
Why not stream data from disk? Remember the old amiga demos that would fetch data from disk while the demo was progressing? And afterall, audio sequencers like logic or cubase can manipulate way over 500meg of data without taking that much amount of ram.
Plus by doing the streaming yourself you get to beat the vm which doesn't have a clue what your accesses patterns are..
memory usage we get from some recent demos..
not much because it would mean upgrading my 256meg
to something higher, but mostly because even with a lot
of mem, due to win9x inane vm, you're never assured
it's not going to swap something out and kill the framerate for one or two seconds.. It gets better with nt kernels of course.
Is the memory usage there only because people like to take it easy and load everything in mem at start and not do any further access, or is there anything special
in todays demos that take a whole lot of memory.
(I'd think the days of massive precals are over, with
the hardware we have)
Why not stream data from disk? Remember the old amiga demos that would fetch data from disk while the demo was progressing? And afterall, audio sequencers like logic or cubase can manipulate way over 500meg of data without taking that much amount of ram.
Plus by doing the streaming yourself you get to beat the vm which doesn't have a clue what your accesses patterns are..
Yeah, access disks while playing the demo... Suddenly gets interesting when your data doesn't really come in fast enough and it's time to play that part :-) Plus you'd have to meddle around with threads etc. and add a whole lot of extra complexity to your demo -- NOT a good idea.
The alternative, of course, is loading between parts, but how many would like to have demos that stop every few minutes today?
The alternative, of course, is loading between parts, but how many would like to have demos that stop every few minutes today?
you don't need to stream data from disk. if your demo is 10 MB packed, you can load the packed data in RAM without any problem. you just should not put all textures into DirectX or OpenGL in the beginning of the demo. if you don't take care, you end having 2 or three copies of the decrunched texture in ram.
i know, i am guilty (fr-019). but things are not as easy as they were back on amiga. an example:
our demo Roots had hand written memory management, hand written diskloader and hand written decrunching. the effects itself run completly in the interrupt. thus, i could use the little amount of time left to load, decrunch and initialize the next effect (no need to do multithreading!). memory was split, one half for the current and one half for the next effect (of course the split-point was dynamic...). even loading the next tune worked with no major pause (opposed to arte...). this is how it should be done.
i don't dare to code something like this on PC. of course, no diskloading is needed, that makes things a bit simpler. you can control virtual memory quite well, you could upload your textures on your own (not with DX MANAGED textures), decrunching them directly between parts. you could use more dynamic vertex buffers instead of millions of precalculated vertices (well, few demos have that complexity today). but getting that to run is really a lot of work, and no fun at all. or you can make "please wait for part II"-style demos.
i think, for most demos the biggest problem is textures.
i know, i am guilty (fr-019). but things are not as easy as they were back on amiga. an example:
our demo Roots had hand written memory management, hand written diskloader and hand written decrunching. the effects itself run completly in the interrupt. thus, i could use the little amount of time left to load, decrunch and initialize the next effect (no need to do multithreading!). memory was split, one half for the current and one half for the next effect (of course the split-point was dynamic...). even loading the next tune worked with no major pause (opposed to arte...). this is how it should be done.
i don't dare to code something like this on PC. of course, no diskloading is needed, that makes things a bit simpler. you can control virtual memory quite well, you could upload your textures on your own (not with DX MANAGED textures), decrunching them directly between parts. you could use more dynamic vertex buffers instead of millions of precalculated vertices (well, few demos have that complexity today). but getting that to run is really a lot of work, and no fun at all. or you can make "please wait for part II"-style demos.
i think, for most demos the biggest problem is textures.
maybe competitions with limits on memory usage as well as the size of the productions could be introduced?
..the ultimate challenge for a coder would naturally be "make a demo which work flawlessly on diamondie's computer, and which she also happens to like" :) I mean, when not even console-demos can be run properly, what can? :)
..the ultimate challenge for a coder would naturally be "make a demo which work flawlessly on diamondie's computer, and which she also happens to like" :) I mean, when not even console-demos can be run properly, what can? :)
lug00ber, that would not be a challenge, it would be masochism.
its not like its cool haing less memory usage anyways, i prefer less GPU/CPU ... memory is free, oh wait so is CPU... so we have GPU left.. and the difference between G3 and GF2 are like huge.. oh.. lets all get rich, demoscene is dead anyways why bother
hm, textures.. it's not at all that hard to load all your textures into system memory ones, and between parts upload textures into your default pool ones.. but au contraire.. i also prefer just whacking everything into the mem prior to starting the demo ;)
i wouldn't know about directx, but in OpenGL, you -ARE- recommended to upload all textures you'll need at once, and let the driver manage the uploading to the card and stuff (while keeping the ones you ain't using in system memory). Whereas most drivers do fail to do this correctly [thus, introducing delays when big textures get uploaded -the first time you *use* them-], this is the way it should be done in OGL. This is because for most cards you can't know beforehand how much memory they have available and stuff. For example, if you uploaded 24bpp, or even 8-bit textures, and measured the size, it'd be pretty different than the actual card's memory allocation (nVidia converts them to 32bpp internally).
Oh well, my 0.2€
Oh well, my 0.2€
I don't think opengl best practices preclude loading all the textures you'll need at once, but we may have here different opinions on "need" :)
(one doesn't "need" a texture that is used 3min from now, in a complete
different scene, and it surely could be left out until then, even though i admit not having any opengl experience.. are the drivers coded in such a way that simply binding and creating new textures would kill some runtime deadline?)
(one doesn't "need" a texture that is used 3min from now, in a complete
different scene, and it surely could be left out until then, even though i admit not having any opengl experience.. are the drivers coded in such a way that simply binding and creating new textures would kill some runtime deadline?)
when you tell opengl to upload a texture, it'll make a copy of it in system memory (in fact, you can delete or otherwise unallocate the pointer to your bitmap), and will upload it to the card when needed... actually, all of this depends on the driver implementation... some drivers might upload everything to the card, some might just share memory between the TMU and system memory, etc.
OpenGL is not a strict system (as far as implementation goes); every manufacturer/card might do the same thing completely different. That's why you don't need to do anything special with (say) T&L, and any current T&L card -automatically- supports T&L on very old OGL software... and that's why whatever you make now, will benefit from card updates without explicitly coding it (... or, well, this would be in an ideal world without vendor specific extensions and stuff ;D)
OpenGL is not a strict system (as far as implementation goes); every manufacturer/card might do the same thing completely different. That's why you don't need to do anything special with (say) T&L, and any current T&L card -automatically- supports T&L on very old OGL software... and that's why whatever you make now, will benefit from card updates without explicitly coding it (... or, well, this would be in an ideal world without vendor specific extensions and stuff ;D)
Well said about memory,. u lazy bastards! =)
opengl practically precludes you from keepin' it real
glPrioritizeTextures
GL_COMPRESSED_RGB_ARB
just two hints my dearo moles :)
GL_COMPRESSED_RGB_ARB
just two hints my dearo moles :)
oh, the joys of memory management and 3d hardware. this is not supposed to be a "coder-sorgenecke" (too lazy to find an appropriate translation now :), but I'll still whine a bit I guess.
the problem is that you have next to no control in how much memory is actually allocated/used. and I mean it - if you want some figures, fr-022: ein.schlag uses around 60MB memory during precalculation for textures. I use 16bits/color channel in the texture generator, so that's about 30MB of 8bit/channel RGB texture data, some of that is only temporary buffers, so it's a bit less, say 28 or so. After the initial texture calculation, textures are uploaded and then all now unused textures are freed (some are required to stay because they're displacement maps). Then there's geometry calculation etc. That much for intro.
Memory allocated by the app during runtime is around 13MB (textures+geometry both). Say we have about that much of static geometry (in reality it's somewhat less, say 11MB) that adds to the textures uploaded. Then we have about 28MB+11MB=39MB of memory managed by the driver plus another 13MB for the memory allocated by me, which sums to 52MB theoretical (system) memory usage for backing storage of textures and geometry, and working memory. Judging by that, einschlag should run ok on 64MB machines, with some initial swapping, and be fine on 128MB.
If you know the readme, you'll see that I say something like "use 256MB RAM or more". This turned out to be a wrong assumption from how it ran on MY system - with the ATI drivers I was using at that moment, the 384MB I had were already far too less when running DevStudio and the RG2 (tool used to make the demos) in background, which summed up to a memory usage of about 130MB; if I ran the demo with these two running, I already got *loads* of swapping. Later it turned out that it actually ran ok (if not great) on 128MB systems with NVidia drivers.
Some weeks ago, new ATI drivers came out, and since then I can run devstudio, rg2 and einschlag without any real swapping. Mind, I have changed nothing in either program, and the memory used by the RG2 and the intro for driver-managed objects should be at around 80MB - only it seems like till recently, it was FAR more.
This is getting incredibly lengthy already, but it illustrates the problem quite nicely IMHO - getting decent memory usage is extremely hard since you have absolutely no control over what memory the driver decides he needs for buffering. Especially textures are a pain in the a** for 64k intros, since you also don't really have the option of reducing that data, i.e. using S3 texture compression, because it takes too much code to do a decent-quality decent-speed S3TC compressor. Besides, it doesn't solve the issue, it just reduces the danger of you hittting some serious problems by reducing overall memory usage.
Note that this is all just about memory allocation - in case you actually decide to use those textures/vertex buffers one time, you'll quickly run against the next wall: driver buffering/caching. I won't really talk about that since this is already far too long, but trust me, getting a "simple 3DS player" to work properly without having noticeable texture upload stutters in the middle of them is actually more complex than one thinks (as with about everything about PC coding).
Anyway, the madonion demos all have "loading part 2" screens for a reason - it's definitely the easiest and, more importantly, most reliable solution, even though it sucks for the watcher.
shiva: texture priorities help with the caching problems, they don't change an inch about memory consumption (which is the topic :) though.
the problem is that you have next to no control in how much memory is actually allocated/used. and I mean it - if you want some figures, fr-022: ein.schlag uses around 60MB memory during precalculation for textures. I use 16bits/color channel in the texture generator, so that's about 30MB of 8bit/channel RGB texture data, some of that is only temporary buffers, so it's a bit less, say 28 or so. After the initial texture calculation, textures are uploaded and then all now unused textures are freed (some are required to stay because they're displacement maps). Then there's geometry calculation etc. That much for intro.
Memory allocated by the app during runtime is around 13MB (textures+geometry both). Say we have about that much of static geometry (in reality it's somewhat less, say 11MB) that adds to the textures uploaded. Then we have about 28MB+11MB=39MB of memory managed by the driver plus another 13MB for the memory allocated by me, which sums to 52MB theoretical (system) memory usage for backing storage of textures and geometry, and working memory. Judging by that, einschlag should run ok on 64MB machines, with some initial swapping, and be fine on 128MB.
If you know the readme, you'll see that I say something like "use 256MB RAM or more". This turned out to be a wrong assumption from how it ran on MY system - with the ATI drivers I was using at that moment, the 384MB I had were already far too less when running DevStudio and the RG2 (tool used to make the demos) in background, which summed up to a memory usage of about 130MB; if I ran the demo with these two running, I already got *loads* of swapping. Later it turned out that it actually ran ok (if not great) on 128MB systems with NVidia drivers.
Some weeks ago, new ATI drivers came out, and since then I can run devstudio, rg2 and einschlag without any real swapping. Mind, I have changed nothing in either program, and the memory used by the RG2 and the intro for driver-managed objects should be at around 80MB - only it seems like till recently, it was FAR more.
This is getting incredibly lengthy already, but it illustrates the problem quite nicely IMHO - getting decent memory usage is extremely hard since you have absolutely no control over what memory the driver decides he needs for buffering. Especially textures are a pain in the a** for 64k intros, since you also don't really have the option of reducing that data, i.e. using S3 texture compression, because it takes too much code to do a decent-quality decent-speed S3TC compressor. Besides, it doesn't solve the issue, it just reduces the danger of you hittting some serious problems by reducing overall memory usage.
Note that this is all just about memory allocation - in case you actually decide to use those textures/vertex buffers one time, you'll quickly run against the next wall: driver buffering/caching. I won't really talk about that since this is already far too long, but trust me, getting a "simple 3DS player" to work properly without having noticeable texture upload stutters in the middle of them is actually more complex than one thinks (as with about everything about PC coding).
Anyway, the madonion demos all have "loading part 2" screens for a reason - it's definitely the easiest and, more importantly, most reliable solution, even though it sucks for the watcher.
shiva: texture priorities help with the caching problems, they don't change an inch about memory consumption (which is the topic :) though.
that buffering/caching wall indeed is a bitch.. one of the reasons i hate d3dpool_managed ;)
still i don't really see the problem
let's say we have 100mb of uncompressed textures. store them in memory using a simple compression scheme, or perhaps a few uncompressed in d3dpool_sysmem textures. then let your texturemanager create a set of default textures: a bunch of empty static textures of different sizes and specifications, a few dynamic (both backuped by one or more sysmem 'transfer' textures, you can easily use one or two large ones because you can always specify exact locations of the rectangle to update, etc) and a few others (rendertargets for example). well then, whenever you need a texture, simply decompress it into one of the temporary system memory textures and upload it. shouldn't give much trouble, especially because YOU are in charge of when the uploading occurs instead of d3d's manager (who doesn't have a clue about whats going on), and ofcourse it saves a truckload of default/sysmem-backup textures (these are created for each managed texture you load, 1 of each or even more in some situations).
alltho it's not 100% optimal it would work perfectly in most situations (especially for demos). one hint you must follow when implementing such a system is: be specific, not generic. only create what you WILL use.
oh just my thoughts..
still i don't really see the problem
let's say we have 100mb of uncompressed textures. store them in memory using a simple compression scheme, or perhaps a few uncompressed in d3dpool_sysmem textures. then let your texturemanager create a set of default textures: a bunch of empty static textures of different sizes and specifications, a few dynamic (both backuped by one or more sysmem 'transfer' textures, you can easily use one or two large ones because you can always specify exact locations of the rectangle to update, etc) and a few others (rendertargets for example). well then, whenever you need a texture, simply decompress it into one of the temporary system memory textures and upload it. shouldn't give much trouble, especially because YOU are in charge of when the uploading occurs instead of d3d's manager (who doesn't have a clue about whats going on), and ofcourse it saves a truckload of default/sysmem-backup textures (these are created for each managed texture you load, 1 of each or even more in some situations).
alltho it's not 100% optimal it would work perfectly in most situations (especially for demos). one hint you must follow when implementing such a system is: be specific, not generic. only create what you WILL use.
oh just my thoughts..
this ofcourse is for demos (and 100mb of uncompressed textures is quite a lot). for games and such i'd go for a slightly altered version of the system as stated above (another layer between the copies in system memory and those on the disk, prolly).
just use d3dpool_managed, but don't use it without thinking about how you're uploading. important textures get a higher priority using ->SetPriority(), and in between scenes you make sure you have small pauses to upload new textures, simply by 'touching' them (ie. render a triangle using that texture without even flipping it to the frontbuffer) and, if necessary, flushing the old ones using ->ResourceManagerDiscardBytes().
if you really must have 100mb of textures, don't stick 'em all in memory, but decompress them halfway through your demo. with moderately efficient decompression this can be done nearly as fast as a plain 'read from swapfile' and it will cause less performance problems. you'd need to have a small break in your action, but that's just a design-choice you'd have to make.
if you really must have 100mb of textures, don't stick 'em all in memory, but decompress them halfway through your demo. with moderately efficient decompression this can be done nearly as fast as a plain 'read from swapfile' and it will cause less performance problems. you'd need to have a small break in your action, but that's just a design-choice you'd have to make.
touching them is what causes the stall. i think some dynamic priorities could solve some of the problems tho, indeed.
THIS IS WHY D3D IS LAME
yes, ofcourse touching causes a stall, but you'd rather have it stall when your screen is black then on the first frame of rendering.
you have the ->preload function (or whatever it was called).
plek: problems with your approach:
so all it does is trade away lots of optimization opportunities for the driver for a bit more direct control. not a good exchange IMHO - it's a far better idea to look for better schemes to use the texture management the driver provides. this is the real problem.
plek: problems with your approach:
- you still need backing storage, even though you manage it on your own
- fixed size preallocated textures suck if you have more than 3 or so different texture sizes in use
- you can't reliably allocate a good number of buffers for the available VRAM since a) you don't get an accurate measure of available VRAM and b) memory usage of allocated surfaces can be relatively strange, depending on the constraints the card puts on texture layout
- the driver cached versions are probably swizzled and transformed in other ways to guarantee optimal performance. OTOH, if you update them every time from a plain linear system memory copy, the driver either has to give you linear textures (which are far slower in drawing operations because of worse texture cache efficiency) or swizzle them every time you upload a new texture (which sucks performanceweise)
- the driver is unlikely to find a good configuration of what textures to put in VRAM and which in AGP memory, because the usage patterns with your approach a pretty much random for the driver. this'll again cause a slowdown.
so all it does is trade away lots of optimization opportunities for the driver for a bit more direct control. not a good exchange IMHO - it's a far better idea to look for better schemes to use the texture management the driver provides. this is the real problem.
for fuck sake.. sysram is FREE today, where do you guys live?? its the fucking graphiccards that COSTS.. i dont have any computer with less than 512 MB of ram.. exept for the laptop. it "only" has 320MB
You can make a demo that needs 1gb of ram to work. It's your demo, you can do whatever you want with it. The problem is that not many people will be able to watch it. So, stefan, I also stuff my machines with memory, not just for demos. But not everyone does so, and as I said earlier, there are still machines with 128mb, where 256 should be the minimum.
As for the multipart demos, madonion did loading between parts rather cleverly... showing a picture etc. Simple effects/scenes could be shown when the textures for the next part are loaded.
As for the multipart demos, madonion did loading between parts rather cleverly... showing a picture etc. Simple effects/scenes could be shown when the textures for the next part are loaded.
ryg;
- the backing storage won't disappear.. unless you want to load stuff from disk, which in demos at least, sucks ass (i run demos from a server, imagine the horror of pumping data thru my 10mpbs hub between two parts ;)
- why does it suck to have textures with different sizes?
- i'm not sure about the "swizzling" and "transforming" thing, if it were so, then why would upload occasionally result in painfull stalls? and i was referring to uploading to the onboard textures on a not-so-regular basis ofcourse :) and even if it's done, i dont think executing an updatetexture is such a pain
- your last point would be valid, but if you dont have all that much onboard textures lying around those problems would be a bit futile i think
using the driver's control is ofcourse a very valid option (there are several ways to rome), but the approach i proposed *does* consume less memory (managed textures simply eat a lot, system ram mainly.. i would just keep a stack of default pool textures, a few system mem textures for transfer purposes, and compressed textures in regular mem) and that was my main point =)
i think i'd like to see some developer specifications about d3d's management system :) anyone who has them or knows where to find them?
oh and sagacity: exactly when will you be rendering black frames for a while in a demo? that would indeed suggest a certain type of design ;)
- the backing storage won't disappear.. unless you want to load stuff from disk, which in demos at least, sucks ass (i run demos from a server, imagine the horror of pumping data thru my 10mpbs hub between two parts ;)
- why does it suck to have textures with different sizes?
- i'm not sure about the "swizzling" and "transforming" thing, if it were so, then why would upload occasionally result in painfull stalls? and i was referring to uploading to the onboard textures on a not-so-regular basis ofcourse :) and even if it's done, i dont think executing an updatetexture is such a pain
- your last point would be valid, but if you dont have all that much onboard textures lying around those problems would be a bit futile i think
using the driver's control is ofcourse a very valid option (there are several ways to rome), but the approach i proposed *does* consume less memory (managed textures simply eat a lot, system ram mainly.. i would just keep a stack of default pool textures, a few system mem textures for transfer purposes, and compressed textures in regular mem) and that was my main point =)
i think i'd like to see some developer specifications about d3d's management system :) anyone who has them or knows where to find them?
oh and sagacity: exactly when will you be rendering black frames for a while in a demo? that would indeed suggest a certain type of design ;)
2002-07-10 write : by
When performing loading from a disk, you have to read seamlessly.
However, this method may cause the problem of the timing of reading.
When performing loading from a disk, you have to read seamlessly.
However, this method may cause the problem of the timing of reading.