pouët.net

Go to bottom

So, what do distance field equations look like? And how do we solve them?

category: code [glöplog]
try coding a simple CPU marcher and speed test, then do this and speed test after :) If you don't use hardly any if() statements in your inner loops it should be plenty fast on a GPU :)
added on the 2009-08-22 06:43:38 by ferris ferris
I've done it. It works...

I've tested it with this simple scene:

inline float dist(float x, float y, float z)
{
float x1=x-20;
float y1=y;
float z1=z-500;

float x2=x-150;
float y2=y+100;
float z2=z-350;

float a, b;

a=sqrtf((x1*x1)+(y1*y1)+(z1*z1))-200+sinf(x/15.0f)*20.0f+sinf((z*y)/7000)*15.0f;
b=sqrtf((x2*x2)+(y2*y2)+(z2*z2))-100;
return (m(a, b));
}

and it is about 20% faster with my algorithm. I suppose it should be even faster with more complex scenes...

And also, I've used a very very simple way to implement my algorithm, it does less marching that would be possible...

I will continue trying, but I have very little time to do it...
added on the 2009-08-22 07:35:35 by texel texel
Cool :) This is very interesting. I'd like to see it tested with more complex scenes with deformations etc, as well as pictures of the output or .exe's for us all to test :)
added on the 2009-08-22 07:40:06 by ferris ferris
iq: I remember reading a paper about that algorithm you mentioned. I don't know why but I have completely forgotten about it; thus thanks a lot for reminding me of the algorithm again :). It's worth testing.
added on the 2009-08-22 14:03:38 by decipher decipher
I suspect the more complex the scene, the less effective the algorithm of Texel will be. Imagine a infinite forest full of trees and mushrooms or and endless room full of columns. As rays start to hit near by tress/columns you have to remove them from RAYS (or deactivate/mask them out in the terminology of packet tracing). As less and less rays are left in RAYS less of them can benefit from the common marching (traversal). In the case of simple scenes with few big primitives, instead, rays travel together for longer time and benefit more from each other's computations. As in packet raytracing, however, one can always artificially increase the coherence back by increasing the screen resolution (or adding supersampling).

Again, this is just speculation, usually I only believe in framerates thrown by real tests, so go ahead Texel and tell us. I can provide you some pseudo-complex float dist( float x, float y, float z) functions, or just pick a cube or some rings and fmod them.

Side note: for packet tracing have a look to the PhD of Ingo Wald or Toxie's PhD on BIH (Carsten Wachter).
added on the 2009-08-22 15:05:03 by iq iq
please iq, do it.
added on the 2009-08-22 18:33:08 by texel texel
lately i made a try in describing distancefields not as spheres, instead i used the surface of an elipsoid. so i used something like axis aligned distancefields, but its very complicated to estimate the best fitting elipsoids to a given function, i didnt manage to do the math propper so instead i coded some volume texture thingy (like i iq promoted) with 3 values per texel describing the ellipsoid. its slow. but ryg said there could be a possibility to render the three axis aligned distance fields by some mathshifts directly. at my tests the iterationcount reduced to around the half. but wth how do i generate these elipsoids.. only some suggestion perhaps something could evolve from it.
added on the 2009-08-22 21:29:09 by mad mad
what ive done is principly looking for the best fitting ellipsoid not colliding with the geometry at any given point, hard enough to made this work with volumetextures. most probably there is no way to calc that on the fly. but if..
added on the 2009-08-22 21:34:02 by mad mad
he already did it ;)
added on the 2009-08-22 22:14:14 by nystep nystep
I will send it again, now a bit more explicitly ;)

Texel, the dist(x,y,z) of Slisesix

Code: #include <math.h> // can be replaced by (int)floorf(x) static __forceinline int mfloorf( float x ) { x = x - 0.5f; int t; _asm fld x _asm fistp t return t; } // can be replaced by return powf( 2.0f, f ); static __forceinline float m2xf(float f) { _asm fld dword ptr [f] _asm fld1 _asm fld st(1) _asm fprem _asm f2xm1 _asm faddp st(1), st _asm fscale _asm fstp st(1) _asm fstp dword ptr [f] return f; } static float clamp01( float x ) { if( x<0.0f ) x=0.0f; if( x>1.0f ) x=1.0f; return x; } static float smoothstep( float x, float a, float b ) { if( x<a ) return 0.0f; if( x>b ) return 1.0f; x = (x-a)/(b-a); return x*x*(3.0f-2.0f*x); } static float coolfFunc3d2( int n ) { n = (n << 13) ^ n; n = (n * (n * n * 15731 + 789221) + 1376312589) & 0x7fffffff; return (float)n; } static __forceinline float smoothstep3( float x ) { return x*x*(3.0f-2.0f*x); } static __forceinline float smoothstep5( float x ) { return x*x*x*(x*(x*6.0f-15.0f)+10.0f); } static __forceinline float lerp( float x, float a, float b ) { return a+(b-a)*x; } static float noise3f( const float x, const float y, const float z, int sem ) { const int ix = mfloorf( x ); const int iy = mfloorf( y ); const int iz = mfloorf( z ); const float u = smoothstep3( x-(float)ix ); const float v = smoothstep3( y-(float)iy ); const float w = smoothstep3( z-(float)iz ); const int n = ix + 57*iy + 113*iz + sem; const float res = lerp(w, lerp(v, lerp(u, coolfFunc3d2(n+(0+57*0+113*0)), coolfFunc3d2(n+(1+57*0+113*0))), lerp(u, coolfFunc3d2(n+(0+57*1+113*0)), coolfFunc3d2(n+(1+57*1+113*0)))), lerp(v, lerp(u, coolfFunc3d2(n+(0+57*0+113*1)), coolfFunc3d2(n+(1+57*0+113*1))), lerp(u, coolfFunc3d2(n+(0+57*1+113*1)), coolfFunc3d2(n+(1+57*1+113*1))))); return 1.0f - res*(1.0f/1073741824.0f); } static float fbm( float x, float y, float z ) { float v = 0.5000f*noise3f( x*1.0f, y*1.0f, z*1.0f, 0 ) + 0.2500f*noise3f( x*2.0f, y*2.0f, z*2.0f, 0 ) + 0.1250f*noise3f( x*4.0f, y*4.0f, z*4.0f, 0 ) + 0.0625f*noise3f( x*8.0f, y*8.0f, z*8.0f, 0 ); return v; } static float distToBox( float x, float y, float z, float a, float b, float c ) { float di = 0.0f; const float dx = fabsf(x)-a; if( dx>0.0f ) di+=dx*dx; const float dy = fabsf(y)-b; if( dy>0.0f ) di+=dy*dy; const float dz = fabsf(z)-c; if( dz>0.0f ) di+=dz*dz; return di; } static float columna( float x, float y, float z, float mindist, float offx ) { const float di0 = distToBox( x, y, z, 0.14f, 1.0f, 0.14f ); if( di0 > (mindist*mindist) ) return mindist + 1.0f; const float y2=y-0.40f; const float y3=y-0.35f; const float y4=y-1.00f; const float di1 = distToBox( x, y , z, 0.10f, 1.00f, 0.10f ); const float di2 = distToBox( x, y , z, 0.12f, 0.40f, 0.12f ); const float di3 = distToBox( x, y , z, 0.05f, 0.35f, 0.14f ); const float di4 = distToBox( x, y , z, 0.14f, 0.35f, 0.05f ); const float di9 = distToBox( x, y4, z, 0.14f, 0.02f, 0.14f ); const float di5 = distToBox( (x-y2)*0.7071f, (y2+x)*0.7071f, z, 0.10f*0.7071f, 0.10f*0.7071f, 0.12f ); const float di6 = distToBox( x, (y2+z)*0.7071f, (z-y2)*0.7071f, 0.12f, 0.10f*0.7071f, 0.1f*0.7071f ); const float di7 = distToBox( (x-y3)*0.7071f, (y3+x)*0.7071f, z, 0.10f*0.7071f, 0.10f*0.7071f, 0.14f ); const float di8 = distToBox( x, (y3+z)*0.7071f, (z-y3)*0.7071f, 0.14f, 0.10f*0.7071f, 0.10*0.7071f ); float di = di1; if( di2<di ) di=di2; if( di3<di ) di=di3; if( di4<di ) di=di4; if( di5<di ) di=di5; if( di6<di ) di=di6; if( di7<di ) di=di7; if( di8<di ) di=di8; if( di9<di ) di=di9; const float fb = fbm(10.1f*x+offx,10.1f*y,10.1f*z); if( fb>0.0f ) di = di + 0.00000003f*fb; return di; } static __forceinline unsigned int coolfFunc3d3( unsigned int n ) { n = (n << 13) ^ n; n = (n * (n * n * 15731 + 789221) + 1376312589) & 0x7fffffff; return n; } static float bicho( float x, float y, float z, float mindist ) { x = x-0.64f; y = y-0.50f; z = z-1.50f; float r2 = x*x + y*y + z*z; float sa = smoothstep(r2,0.0f,0.5f); float fax = 0.75f + 0.25f*sa; float fay = 0.80f + 0.20f*sa; x *= fax; y *= fay; z *= fax; r2 = x*x + y*y + z*z; // bighacks #if 1 if( r2>5.0f ) return mindist; if( y >0.5f ) return mindist; if( y>-0.20f && (x*x+z*z)>0.60f ) return mindist; if( r2>(1.70f+mindist)*(1.70f+mindist) ) return mindist; #endif const float r = sqrtf(r2); if( r<0.75f ) { float a1 = 1.0f-smoothstep( r, 0.0f, 0.75 ); a1 *= 0.60f; const float si1 = sinf(a1); const float co1 = cosf(a1); float nx = x; float ny = y; x = nx*co1 - ny*si1; y = nx*si1 + ny*co1; } float mindist2 = 100000.0f; const float p[3] = { x, y, z }; const float rr = 0.05f+sqrtf(x*x+z*z); const float ca = (0.5f-0.045f*0.75f) -6.0f*rr*m2xf(-10.0f*rr); for( int j=1; j<7; j++ ) { const float an = (6.2831f/7.0f) * (float)j; const float aa = an + 0.40f*rr*noise3f(4.0f*rr, 2.5f, an, 0 ) + 0.29f; const float rc = cosf(aa); const float rs = sinf(aa); const float q[3] = { p[0]*rc-p[2]*rs, p[1]+ca, p[0]*rs+p[2]*rc }; const float dd = q[1]*q[1] + q[2]*q[2]; if( q[0]>0.0f && q[0]<1.5f && dd<mindist2 ) mindist2=dd; } const float c = sqrtf(mindist2) - 0.045f; const float d = r-0.30f; const float a = clamp01( r*3.0f ); return c*a + d*(1.0f-a); } static float techo2( float x, float y ) { y = 1.0f - y; if( x<0.1f || x>0.9f ) return y; x = x - 0.5f; return -(sqrtf(x*x+y*y)-0.4f); } float dist( float x, float y, float z, int *sid ) { float mindist;// = 1e20f; float dis; //----------------------- // floor //----------------------- dis = y; const float ax = 128.0f + (x+z)*6.0f; const float az = 128.0f + (x-z)*6.0f; const unsigned int ix = mfloorf(ax); const unsigned int iz = mfloorf(az); const int submat = coolfFunc3d3(ix+53*iz); const int ba = ( ((submat>>10)&7)>6 ); const float peldx = fmodf( ax, 1.0f ); const float peldz = fmodf( az, 1.0f ); float peld = peldx; if( peldz>peld) peld=peldz; peld = smoothstep( peld, 0.975f, 1.0f ); if( ba )peld = 1.0f; dis += 0.005f*peld; mindist = dis; if( peld>0.0000001f ) sid[0] = 2; else sid[0] = 0; sid[0] = sid[0]+(submat<<8); //----------------------- // roof //----------------------- const float fx = fmodf( x+128.0f, 1.0f ); const float fz = fmodf( z+128.0f, 1.0f ); if( y>1.0f ) { dis = techo2( fx, y ); float disz = techo2( fz, y ); if( disz>dis ) dis=disz; if( dis<mindist ) { mindist = dis; sid[0] = 5; } } //----------------------- // columns //----------------------- const float fxc = fmodf( x+128.5f, 1.0f ) - 0.5f; const float fzc = fmodf( z+128.5f, 1.0f ) - 0.5f; dis = columna( fxc, y, fzc, mindist, 13.1f*(int)(x)+17.7f*(int)z ); if( dis<(mindist*mindist) ) { mindist = sqrtf(dis); sid[0] = 1; } //----------------------- // monster //----------------------- dis = bicho( x, y, z, mindist ); if( dis<mindist ) { mindist = dis; sid[0] = 4; } return mindist; }



The last parameter sid is the object/material id of the closest surface.
added on the 2009-08-22 23:15:32 by iq iq
I removed the ten tiles that are randomly spread over the floor so you don't need initialization code. Also, it's ugly code I know...

If you want to match the image you can generate your ray position ro and ray direction rd as in this function, for a given pixel [i,j] in a buffer of resolution [xres,yres] with xres/yres=16/9:

Code: static void generateRay( float *ro, float *rd, int i, int j, int xres, int yres ) { // screen coords const float sx = -1.75f + 3.5f*(float)i/(float)xres; const float sy = 1.00f - 2.0f*(float)j/(float)yres; // bend rays (fish eye) const float r2 = sx*sx*0.32f + sy*sy; const float tt = (7.0f-sqrtf(37.5f-11.5f*r2))/(r2+1.0f); const float dx = sx*tt; const float dy = sy*tt; // rotate ray float rd[3] = { dx*0.955336f + 0.29552f, dy, 0.955336f - dx*0.29552f }; // normalize normalize( rd ); // ro[0] = 0.195f; ro[1] = 0.5f; ro[2] = 0.0f; }



Could you test the rendering time with and without your trick? (try both low 720x405 and high 1920x1080 res...). Thx!
added on the 2009-08-22 23:30:36 by iq iq
iq:
Code:static float coolfFunc3d2( int n ) { n = (n << 13) ^ n; n = (n * (n * n * 15731 + 789221) + 1376312589) & 0x7fffffff; return (float)n; }

optimize dammit! :D then we wonder why the rendering is so slow :P.
added on the 2009-08-22 23:36:28 by decipher decipher
oh well nevermind, I now see it's not using float -> int -> float all the time like hugo elias' example.

also are you sure it is correct? that doesn't look like a reinterpret_cast more like a static_cast!?
as in:
Code: return ((float *)&n);

versus
Code: return (float)n;
added on the 2009-08-22 23:39:24 by decipher decipher
fuck... late night...
Code: return *((float *)&n);

instead of the first one.
added on the 2009-08-22 23:40:21 by decipher decipher
Decipher, if you do it that way, you won't have good distributed random numbers...
added on the 2009-08-22 23:46:07 by texel texel
but texel, the return (float)n; doesn't really look correct. seriously, because for n = 300 the value produced will be 2027252649 or 2.02725e+09... this surely doesn't look like it's in the range [-1 : 1] does it?
added on the 2009-08-22 23:54:34 by decipher decipher
Quote:
res*(1.0f/1073741824.0f);
ouch I haven't seen that, my bad :)
added on the 2009-08-22 23:59:02 by decipher decipher
BB Image

The top directly rendered, the second with "my trick"

I've not a 1920 screen so I tested it for 1420*850 pixels also, with these results, in miliseconds:

- Without: 57357
- With: 45187

My implementation is very shitty, I think this can be further optimized... I will try when I have some time.

Also, I had to use the non-asm version of pow and floor, since I'm using GNU C...
added on the 2009-08-23 00:30:56 by texel texel
Note that I'm using 6 more distance calcs for the shading, so the difference in raymarching is higher
added on the 2009-08-23 00:32:14 by texel texel
and how do we measure the performance? :) what do the numbers mean?
added on the 2009-08-23 00:32:48 by decipher decipher
okay again, didn't see. ffs I should get some sleep :/
added on the 2009-08-23 00:34:09 by decipher decipher
so, it looks like if you were measuring only the raymarching without the shading you would get indeed something like a 25% time saving. Are you running the trick at fullscreen (RAYS initialized with the rays of all pixels) or by tiles? I suppose working with say 16x16 px tiles would make step 4 of your algorithm lighter?
added on the 2009-08-23 02:19:41 by iq iq
BB Image

Now it is getting MUCH faster...
added on the 2009-08-23 02:50:36 by texel texel
No, iq, I'm not working with tiles now... I'm just testing different approachs, but by now just by scanlines, that are better for memory coherence
added on the 2009-08-23 02:51:23 by texel texel

login

Go to top