Raymarching Beginners' Thread

category: code [glöplog]

Thanks all!

@rudi Yea, the distance function is really tiny, I totally could do a 1k. Didn't think of that. xD

added on the 2011-09-13 06:50:04 by Mewler

wow, third one looks fantastic!

added on the 2011-09-13 17:43:56 by toxie

I've been having problems trying to work out what cell/integer the nearest position is in. I'm using floor() right now and its giving barely passable results. I know this probably isn't the best way to do this. Any ideas?

BB Image

An example of the artifacts.

added on the 2011-09-15 07:32:41 by Mewler

Maybe you are off by 0.5f or something like that.

Btw. some of you implemented triangle mesh to SDF stuff. How fast is it? Can you get that stuff realtime for let's say a 4k poly mesh? I saw some papers dealing with GPU implementations but their results were not as awesome as I need them to be (but given their measurements were from 2007 or something like that - maybe stuff is now realtime).
And how fast and reliable is rendering the SDF from a volume texture? I think of first using an OBB intersection test and then sphere tracing the discretized SDF.

added on the 2011-09-15 17:34:55 by las

round instead of floor?

added on the 2011-09-15 18:03:45 by nystep

las: our tri mesh to sdf convertor used to be cpu and glacially slow, but we just recently moved to a gpu-based new way of doing it and it's very fast.
realtime with 4k polys? depends on the res of the sdf. if its quite low then yea, why not. :) one problem is that on dx9 i have to either use it as a 2d render target made from slices which makes sampling slower, or copy it back to a volume texture which .. adds a copy back.

rendering sdf from volume texture is good, you can interpolate it fine too. (we also have bicubic support.) yes you need to handle clamping as it isnt infinite.

the biggest irritation ive found so far is that adding texture sampling in the sphere tracing loop makes it much easier to explode the shader compiler.

added on the 2011-09-15 18:17:56 by smash

Quote:

texture sampling in the sphere tracing loop makes it much easier to explode the shader compiler.

Yes (Do you also run out of temporary registers all the time with dx9? ;)).
The most current dx HLSL compiler is - let's be honest - complete crap.
It's both: (DAMN F**KING) slow and buggy.

Given the fact that I don't care about people who don't buy dx11 level hardware I can go the cs_5_0 way...
But that will take me some time given the huge turn-around times: Waiting for a shader to compile with ~300 loc can take ~5 minutes depending on what you do. Really sucks.

Did you implement the "Interactive 3D distance field computation using linear factorization" approach from 2006? Or is there anything "improved" already ;)

added on the 2011-09-15 18:52:50 by las

las: the one I've done at work does around 15B/s triangle-point distances (6870/460). That includes a bit of spatial optimization in the shader itself (around a factor 2). As we don't have to care about long distances (ie for AO in the building models it's clamped at 3-4m) there's a lot of spatial optimization on the cpu part before going into the shader and that speed.

So without cpu spatial it would be around 70 ms for a 64x64x64 grid and 4k polys, I don't remember how much of the time is spent with all the gpu-cpu transfers and sync during the process (it's for big models, like 5M polys * 32M voxels).

added on the 2011-09-15 21:10:50 by Psycho

70 ms is far to much and 64x64x64 is kinda low res - but that might even be enough... I guess I have to go for the other approach which I have in mind (that one should be faster by design), I hope to find the time to implement both.

added on the 2011-09-15 22:25:06 by las

*too much

added on the 2011-09-15 22:25:18 by las

smash: Just get over that whole DX9-thing already, life is much better with DX11 :)

added on the 2011-09-15 23:05:57 by kusma

Code:return box(vec3(q.x, q.y-round(p.x), q.z), vec3(1.0, 1.0, 1.0));

Is what I'm using. Still the same artifacts. I'm sure I'm doing it the complete wrong way, but I've got no idea of other approaches. Tried offsetting it .5 and 1. but no luck.

added on the 2011-09-16 06:35:44 by Mewler

Sorry.

Code: return box(vec3(p.x, p.y-round(p.x), p.z), vec3(1.0))

added on the 2011-09-16 06:40:32 by Mewler

There's still intersection going on there. Maybe try

Code: return box(vec3(p.x, p.y-round(p.x), p.z), vec3(0.9))

added on the 2011-09-16 10:25:02 by psonice

That doesn't seem to work either. I'm stumped :o

added on the 2011-09-17 06:25:46 by Mewler

Aha! I knew I was doing it the wrong way. Heres the code, now actually working. Generates a city of randomly tall cubes.

Code:

vec3 f = mod(p, 2.0) - 1.0;
float rd = fract(sin(dot(vec2(round(p.x),round(p.z)),vec2(31.9898,78.233)))*43758.5453);
f.y = p.y+1.;
vec3 di = max(abs(f)-vec3(0.1, rd, 0.1), 0.0);
return sqrt(dot(di, di));

added on the 2011-09-17 11:01:43 by Mewler

Minor suggestions:
- vec2(round(p.x),round(p.z)) ==> round(p.xz)
- sqrt(dot(di,di)) ==> length(di)
- inline variables rd and di (as they are used only once each now)

added on the 2011-09-19 06:33:38 by KK

Another thing:
mod/fmod is way slower than fract/frac.

#define _mod(X,Y) (frac(X*(1./Y))*Y)
At least on my GTS 450 this seems way faster than mod(X,Y).

added on the 2011-09-21 00:38:50 by las

(Ouch... this is not correct... Fail).
Take this one:
#define _mod(X,Y) (X-floor(X*(1./Y))*Y)

added on the 2011-09-21 01:02:01 by las

Proof:
http://www.wolframalpha.com/input/?i=X-floor%28X%2FY%29*Y+%3D+mod%5BX%2CY%5D
- true
q.e.d.

added on the 2011-09-21 01:04:13 by las

No, depends on the definition of fmod and floor - whether you round towards negative infinity or towards zero. And in hlsl fmod is towards zero (so negative x means negative result, while floor is toward -inf (so _mod will always return positive).
d3d assembly for _mod:

Code:

rcp r0.x, v0.y
mul r0.x, r0.x, v0.x
frc r0.y, r0.x
add r0.x, -r0.y, r0.x
mad oC0, r0.x, -v0.y, v0.x

and fmod:

Code:

rcp r0.x, v0.y
mul r0.x, r0.x, v0.x
frc r0.y, r0_abs.x
cmp r0.x, r0.x, r0.y, -r0.y
mul oC0, r0.x, v0.y

And they will perform the same on current ati (rcp,mul,fract,cndge,mul vs rcp,mul,fract,add,muladd) - doesn't fermi have a conditional move?

added on the 2011-09-21 01:56:30 by Psycho

Good question. But tests show that it's _really_ faster... for strange reasons.

added on the 2011-09-21 02:12:57 by las

I just realized something. Because raymarching scales really nicely speed wise with resolution, the best way to do specular/reflections would be to render a cubemap of the scene at a low resolution and use that. Anybody tried that?

added on the 2011-10-10 10:26:49 by Mewler

nope. do it.

added on the 2011-10-10 14:03:37 by las

I've tried something like that, not exactly with cubemaps tho. Worked very well :)

added on the 2011-10-10 21:44:33 by unc

pouët.net

Raymarching Beginners' Thread

login