Go to bottom

5 faces "making of"

category: code [glöplog]
I put up a couple of blog posts about the making of 5 faces and my implementation of real time raytracing on GPU. The first half is more overview, the second gets a bit more deep & technical.

part 1
part 2
added on the 2013-05-08 15:33:34 by smash smash
cool shit is cool
added on the 2013-05-08 16:25:14 by psenough psenough
Very interesting read! Thanks for the write-up.

As for SVO building: You don't necessarily require a full-res grid and then work bottom up. For the Voxel Cone Tracing paper, they build an SVO by rasterizing fragments for the geometry at low resolution, with each fragment thread traversing the existing tree and adding itself to a leaf node if it exists, or atomically subdividing nodes until it creates a leaf node. Race conditions are resolved by back-off - if a thread wants to subdivide, but another thread is already doing that, the thread adds itself to a secondary queue and terminates. The secondary queue is then processed in a second pass and the contained threads run again, until all threads succeed in finding a leaf node. It sounds like a massive kludge (and it is), but in practice it appears to work to some degree. Their paper handled fully dynamic scenes only interactively (~10 FPS I think, with octree building + the indirect lighting pipeline), but UE4 did have a SVOGI implementation at some point, so it seems to be possible in real-time with some compromises.

While I completely agree with your choice of data structure, I don't think that raytracing is the way to go for more advanced lighting effects. Indirect lighting (and also AO) seem to require too large of a sample budget to look good in a real-time context, unless you do something like AO reprojection each frame, which limits the camera/scene movement. The noisy look can be improved with more careful sampling strategies and adaptive post-process noise filters (although research is somewhat lacking for the latter), but I think the "hacky" approaches (like Ambient Occlusion Volumes or SVOGI) are more promising. If it is known what kind of scenes to expect, it might be possible to simplify the geometry and use some of the analytic/near closed-form solutions for the solid angle, but I haven't had much luck with those so far.

Subsurface Scattering is imo beyond what is reasonable for real-time raytracing. Using one of the diffusion BSSRDFs and an irradiance cache (like the hierarchical pointcloud from the 2001 Jensen/Buhler paper) seems to be much more feasible. Using these, I got to about 50FPS at 720p on a GTX480 with a few 100k point samples, a precomputed diffusion profile, a sparse octree and a poorly written shader, but I'm sure that could be improved if the coder actually knows what he is doing.
added on the 2013-05-08 16:39:56 by Noobody Noobody
You don't necessarily require a full-res grid and then work bottom up.

I know. That was the point - it became practical for me when I fixed it to work top down (see the post).
I just add cells atomically with a mutex as required. If you have to do it bottom up its simply not workable considering the memory budget requirements.
I've used SVO/brick map-style structures for a whole range of things, all built top down - because the main reason i wanted a sparse structure was to increase the resolution beyond what i could actually fit in memory if it wasnt sparse. :)

Indirect lighting (and also AO) seem to require too large of a sample budget to look good in a real-time context, unless you do something like AO reprojection each frame, which limits the camera/scene movement

I came to the same conclusion (see the post). thats the main reason i went for a focus on reflections / refractions instead.

SSS is something i will probably revisit - the setup I have now is way better than what I had over a year ago when I last tried it.
added on the 2013-05-08 16:47:55 by smash smash
So this sounds like a case of optimizing based on specific conditions i.e. in this case "focus on reflections / refractions" with lessons that are more generally applicable?

Will add these to my recommendation pile, along with ryg's series on the graphics pipeline.
can't wait read this later!
added on the 2013-05-08 17:48:35 by Canopy Canopy
I just love the way that Smash is still finding techniques on modern hardware that follow all the best traditions of oldschool demo making: things that would never make sense in the context of a "real world" project, but happen to be exactly the right tool if you have a single-minded determination to achieve a particular result, and no game to get in the way. It's the modern equivalent of a C64 effect that relies on using up three quarters of the available RAM on a giant lookup table.

So many people on the 8-16 bit scenes seem to dismiss PC demos on the assumption that those 'heroics' don't exist any more - that you have practically unlimited hardware resources, and that there's no opportunity to innovate beyond what games are doing already. They are wrong. :-)
added on the 2013-05-09 02:33:34 by gasman gasman
+1 gasman
added on the 2013-05-09 04:57:58 by rc55 rc55
I did really enjoy reading all this (despite lacking the math skills to even trace a sphere), I even wished there was more :) This makes for me possible to understand and appreciate whats going on in this demo. Extremely complex stuff compared to what we do on the c64, and a lot of hard work.

and what gasman said, I do often indeed dismiss PC demos on this basis, now this makes me deeply respect this demo.
added on the 2013-05-09 08:28:58 by Oswald Oswald
Great reading as always.

I like the fact, that tweaking and faking is still a big player. That's is creativity :)
added on the 2013-05-09 09:02:28 by maytz maytz
Gasman: and still most of the data is runtime generated.

I love smash's scientific approach to algorithm development, because it's the perfect contrast to the way I and some other people work by mostly randomly dicking around with tried and tested algorithms and hopefully producing new and unique combinations.

Hopefully this will lead to something that takes the same ideas and turns then into something unexpected and wholly unreal.
added on the 2013-05-09 10:23:49 by visy visy
Awesome read, thanks for sharing... :)
added on the 2013-05-09 11:18:05 by doz doz
+2 gasman :)

doing everything possible to get the 'scene on screen' without a care for managing every possible case under the sun or a game engine behind it.

stuff on the edge like this shows what will be realised in games once hardware allows.
added on the 2013-05-09 11:47:55 by Canopy Canopy
I personally didn't like the "end result" of 5 faces, but was pretty much fascinated about the techniques used... reading about the whole process just makes it even more fascinating.

Also, what gasman said.
added on the 2013-05-09 11:57:32 by Jcl Jcl
What gasman said... plus it also shows that the GPU is still somewhat of an unknown beast when it comes to algorithm development. Most of the stuff I've seen in the non-demoscene world still has the feel about it that it's trying to be the optimal algorithm on a single-threaded CPU. Cool algorithms are cool, but it seems that there's much more bang for buck in keeping it simple and letting the GPU do its thing.

Smash: Thank you for the super-awesome write-up.
added on the 2013-05-09 12:12:30 by bloodnok bloodnok
Fantastic read! Thanks for sharing this with us, it is an eye-opener. I laughed when I read how you had to beg, borrow and steal to create resources for this demo. I share your pain bro!!
added on the 2013-05-09 13:32:19 by Navis Navis
Dr Claw commented favorably upon this article as has studied such that he was able to comment on it in detail. Thank you for furnishing us a lengthy dinner conversation for last night (and that *won't* be the last time we discuss that topic). ( :

Connecting abstract things like data structures to actually making things is a lot easier (although I still have a LOT to learn) when one has a case study like this to help one understand them.
@metoikos: you guys have the best dinner conversations, it seems.
added on the 2013-05-10 08:47:35 by visy visy
How exactly does the bokeh/dof thing work?
added on the 2013-05-10 13:32:36 by msqrt msqrt
msqrt: I think that was discussed in the numb res writing back in 2011 already.
added on the 2013-05-10 13:34:08 by noby noby
I thought it looked way better this time, though it might just be that it fits these scenes more naturally.
added on the 2013-05-10 13:40:58 by msqrt msqrt
nice write-up, thanks..

and good to see that the old "keep it simple" rule can still apply for some scenarios..
multi-level grids in production still fail for various reasons though, but in your case they make perfect sense, especially with the cool "limit ray by overall amount of cells traversed instead of material bounces/ray depth" idea..

there is also a nice paper on a different encoding of hierarchical grids out there, that might be an interesting read in that context, as it -could- handle more scene types than the fixed two level grid: http://www.cs.purdue.edu/cgvlab/papers/xmt/matrixtree.pdf
added on the 2013-05-10 14:01:45 by toxie toxie
msqrt: I wondered a bit about the bokeh too, since there are few ways of doing it, and one is how smash did it before on numb res, and another involves casting multiple camera rays from different positions and combining them to get DOF (putting them in a hexagonal shape should give hexagonal bokeh too).

But looking at the demo, it looks like the fake/post process variety he used before to me, and since smash mentions in the blog for 5 faces that the camera rays aren't traced at all we can pretty much rule out the tracing method. So I guess this part hasn't changed, or at least not enough to be worth mentioning.
added on the 2013-05-10 14:51:09 by psonice psonice
The bokeh dof is new, but its basically a tweaked version of the one by matt pettino ( mynameismjp.wordpress.com ) so not much to say. :)

toxie: true, but dnt forget : the main performance of brick maps comes from its simplicity and resulting shader code quality. Simple but less optimal algorithm, faster shader code can win out over more optimal algo and slower shader if the algo isn't pushed too much in that situation. :)
added on the 2013-05-10 16:09:32 by smash smash
@smash: nice! i quite like the shots in part one. can you give some stats of the "AO in viewport – 10 second refine" (number of rays/ao bounces per pixel, that kind of stuff)?
added on the 2013-05-10 17:08:04 by abductee abductee


Go to top