Raymarching Beginners' Thread
category: code [glöplog]
Thanks all!
@rudi Yea, the distance function is really tiny, I totally could do a 1k. Didn't think of that. xD
@rudi Yea, the distance function is really tiny, I totally could do a 1k. Didn't think of that. xD
wow, third one looks fantastic!
I've been having problems trying to work out what cell/integer the nearest position is in. I'm using floor() right now and its giving barely passable results. I know this probably isn't the best way to do this. Any ideas?
An example of the artifacts.
An example of the artifacts.
Maybe you are off by 0.5f or something like that.
Btw. some of you implemented triangle mesh to SDF stuff. How fast is it? Can you get that stuff realtime for let's say a 4k poly mesh? I saw some papers dealing with GPU implementations but their results were not as awesome as I need them to be (but given their measurements were from 2007 or something like that - maybe stuff is now realtime).
And how fast and reliable is rendering the SDF from a volume texture? I think of first using an OBB intersection test and then sphere tracing the discretized SDF.
Btw. some of you implemented triangle mesh to SDF stuff. How fast is it? Can you get that stuff realtime for let's say a 4k poly mesh? I saw some papers dealing with GPU implementations but their results were not as awesome as I need them to be (but given their measurements were from 2007 or something like that - maybe stuff is now realtime).
And how fast and reliable is rendering the SDF from a volume texture? I think of first using an OBB intersection test and then sphere tracing the discretized SDF.
round instead of floor?
las: our tri mesh to sdf convertor used to be cpu and glacially slow, but we just recently moved to a gpu-based new way of doing it and it's very fast.
realtime with 4k polys? depends on the res of the sdf. if its quite low then yea, why not. :) one problem is that on dx9 i have to either use it as a 2d render target made from slices which makes sampling slower, or copy it back to a volume texture which .. adds a copy back.
rendering sdf from volume texture is good, you can interpolate it fine too. (we also have bicubic support.) yes you need to handle clamping as it isnt infinite.
the biggest irritation ive found so far is that adding texture sampling in the sphere tracing loop makes it much easier to explode the shader compiler.
realtime with 4k polys? depends on the res of the sdf. if its quite low then yea, why not. :) one problem is that on dx9 i have to either use it as a 2d render target made from slices which makes sampling slower, or copy it back to a volume texture which .. adds a copy back.
rendering sdf from volume texture is good, you can interpolate it fine too. (we also have bicubic support.) yes you need to handle clamping as it isnt infinite.
the biggest irritation ive found so far is that adding texture sampling in the sphere tracing loop makes it much easier to explode the shader compiler.
Quote:
texture sampling in the sphere tracing loop makes it much easier to explode the shader compiler.
Yes (Do you also run out of temporary registers all the time with dx9? ;)).
The most current dx HLSL compiler is - let's be honest - complete crap.
It's both: (DAMN F**KING) slow and buggy.
Given the fact that I don't care about people who don't buy dx11 level hardware I can go the cs_5_0 way...
But that will take me some time given the huge turn-around times: Waiting for a shader to compile with ~300 loc can take ~5 minutes depending on what you do. Really sucks.
Did you implement the "Interactive 3D distance field computation using linear factorization" approach from 2006? Or is there anything "improved" already ;)
las: the one I've done at work does around 15B/s triangle-point distances (6870/460). That includes a bit of spatial optimization in the shader itself (around a factor 2). As we don't have to care about long distances (ie for AO in the building models it's clamped at 3-4m) there's a lot of spatial optimization on the cpu part before going into the shader and that speed.
So without cpu spatial it would be around 70 ms for a 64x64x64 grid and 4k polys, I don't remember how much of the time is spent with all the gpu-cpu transfers and sync during the process (it's for big models, like 5M polys * 32M voxels).
So without cpu spatial it would be around 70 ms for a 64x64x64 grid and 4k polys, I don't remember how much of the time is spent with all the gpu-cpu transfers and sync during the process (it's for big models, like 5M polys * 32M voxels).
70 ms is far to much and 64x64x64 is kinda low res - but that might even be enough... I guess I have to go for the other approach which I have in mind (that one should be faster by design), I hope to find the time to implement both.
*too much
smash: Just get over that whole DX9-thing already, life is much better with DX11 :)
Code:
return box(vec3(q.x, q.y-round(p.x), q.z), vec3(1.0, 1.0, 1.0));
Is what I'm using. Still the same artifacts. I'm sure I'm doing it the complete wrong way, but I've got no idea of other approaches. Tried offsetting it .5 and 1. but no luck.
Sorry.
Code:
return box(vec3(p.x, p.y-round(p.x), p.z), vec3(1.0))
There's still intersection going on there. Maybe try
?
Code:
return box(vec3(p.x, p.y-round(p.x), p.z), vec3(0.9))
?
That doesn't seem to work either. I'm stumped :o
Aha! I knew I was doing it the wrong way. Heres the code, now actually working. Generates a city of randomly tall cubes.
Code:
vec3 f = mod(p, 2.0) - 1.0;
float rd = fract(sin(dot(vec2(round(p.x),round(p.z)),vec2(31.9898,78.233)))*43758.5453);
f.y = p.y+1.;
vec3 di = max(abs(f)-vec3(0.1, rd, 0.1), 0.0);
return sqrt(dot(di, di));
Minor suggestions:
- vec2(round(p.x),round(p.z)) ==> round(p.xz)
- sqrt(dot(di,di)) ==> length(di)
- inline variables rd and di (as they are used only once each now)
- vec2(round(p.x),round(p.z)) ==> round(p.xz)
- sqrt(dot(di,di)) ==> length(di)
- inline variables rd and di (as they are used only once each now)
Another thing:
mod/fmod is way slower than fract/frac.
#define _mod(X,Y) (frac(X*(1./Y))*Y)
At least on my GTS 450 this seems way faster than mod(X,Y).
mod/fmod is way slower than fract/frac.
#define _mod(X,Y) (frac(X*(1./Y))*Y)
At least on my GTS 450 this seems way faster than mod(X,Y).
(Ouch... this is not correct... Fail).
Take this one:
#define _mod(X,Y) (X-floor(X*(1./Y))*Y)
Take this one:
#define _mod(X,Y) (X-floor(X*(1./Y))*Y)
No, depends on the definition of fmod and floor - whether you round towards negative infinity or towards zero. And in hlsl fmod is towards zero (so negative x means negative result, while floor is toward -inf (so _mod will always return positive).
d3d assembly for _mod:
and fmod:
And they will perform the same on current ati (rcp,mul,fract,cndge,mul vs rcp,mul,fract,add,muladd) - doesn't fermi have a conditional move?
d3d assembly for _mod:
Code:
rcp r0.x, v0.y
mul r0.x, r0.x, v0.x
frc r0.y, r0.x
add r0.x, -r0.y, r0.x
mad oC0, r0.x, -v0.y, v0.x
and fmod:
Code:
rcp r0.x, v0.y
mul r0.x, r0.x, v0.x
frc r0.y, r0_abs.x
cmp r0.x, r0.x, r0.y, -r0.y
mul oC0, r0.x, v0.y
And they will perform the same on current ati (rcp,mul,fract,cndge,mul vs rcp,mul,fract,add,muladd) - doesn't fermi have a conditional move?
Good question. But tests show that it's _really_ faster... for strange reasons.
I just realized something. Because raymarching scales really nicely speed wise with resolution, the best way to do specular/reflections would be to render a cubemap of the scene at a low resolution and use that. Anybody tried that?
nope. do it.
I've tried something like that, not exactly with cubemaps tho. Worked very well :)