pouët.net

Go to bottom

phöng shading and clipping?

category: general [glöplog]
sounds neat. have you tried this on dynamic objects? The overhead of calculating the spheres might not be worth it in this case...
added on the 2008-01-28 15:46:21 by raer raer
If you center the sphere on (0,0,0) in object space, which you usually would, you only need to evaluate x^2+y^2+z^2 once per vertex. That's at least three times faster than a 3D transformation, but only helps if you can cull the object, or it obscures another object.

If you're generating the object in polar coordinates anyway (fucking spikeballs!) , it's even simpler. You might also have a minimum inner and maximum outer radius that you know the dynamic object will adhere to, so you only have to generate the object if you already know it's potentially visible.

And of course you can mix and match any way you like for complex scenes. E.g. a translucent object could have an outer radius to recognise when it's being obscured, but can't obscure anything itself. And so on.

There aren't that many cases where you won't gain anything at all, and I think in most cases you gain a significant deal from this kind of optimisation. Generally a lot more than if you manage to do your triangle edge orientation check with one less multiplication or whatever.
added on the 2008-01-28 17:10:01 by doomdoom doomdoom
Quote:
lol. and I noticed step 7 is missing :D

That is because Step 7 is "???", and step 16 is "Profit!"
I usually prefer when Profit! comes earlier than step 16 ;)
added on the 2008-01-28 17:42:44 by raer raer
To be honest, I've never seen early back-face culling to be efficient enough to justify big pipeline-complications in software renderers. You'll mostly be busy looking up textures anyway. Occlusion culling to reduce overdraw is however very much worth it.
added on the 2008-01-28 18:33:18 by kusma kusma
Yeah. As ryg pointed out backface culling applies to faces, not vertices. So there's a lot of bookkeeping to also eliminate unused vertices, but it's not unrealistic. Whenever you cull a face, make a note of that in an array (by writing a unique frame number so you won't have to reset the array), then before you project each vertex, check the array to see if any of the faces using the vertex are still present. This saves you about half the divisions for the perspective projection. So it depends if half a division is more expensive than, what, 5-6 memory accesses? (That are probably cache-unfriendly.)
added on the 2008-01-28 19:14:09 by doomdoom doomdoom
More like 10-14.. hm.. but yeah..
added on the 2008-01-28 19:14:57 by doomdoom doomdoom
I think the projective divisions are over-hyped as a problem. In our GBA-engine, the division takes 15-20 cycles (some clever LUT-trickery), and at a total of 1^24 cycles per second and an approximate 1 new vertex per triangle, that's not really something I'm too worried about.

Of course, if you're doing a software renderer for a PC today, it might make sense to use a lot more triangles than what I'm using, and then perhaps this'll make more sense to do more aggressive culling. In our software-tnl + hardware rasterizer-solution at work, we've found early back-face culling to be really a good speed-up, but a bit tricky to get robust. But it's definitely possible.
added on the 2008-01-28 19:32:45 by kusma kusma
On an M68000 though, as in the A500, divisions are very expensive, around 70 cycles IIRC. But then there are division tables too.

Anyway for huge amounts of triangles you really want to look at culling groups anyway. If you group faces into convex patches, making sure no two normals in a patch are more than 180 degrees apart (minus a bit to account for perspective distortion), and then transform, project and cull the border faces of all patches in a first pass, you'd be able to cull the interior parts of patches that are completely hidden, without processing the faces or vertices at all. Just a thought.
added on the 2008-01-28 20:36:14 by doomdoom doomdoom
Well, it's not like the GBA can divide in hardware either. We just wrote a fast-enough fixed point division routine. The way we're doing it isn't too far away from what an FPU would have done, BTW.
added on the 2008-01-28 21:18:55 by kusma kusma
Seed! Newton! Newton! Tadaa!

Oh the joy of ARMs relatively fast multipliers :)
added on the 2008-01-28 22:38:31 by ryg ryg
ryg: actually, we're doing a (manually since it's an arm7) clz followed by an interpolated lut-lookup.
added on the 2008-01-28 22:45:52 by kusma kusma
yeah, that's what i meant with seed, but you're not even doing any postiteration? how... inaccurate :)
added on the 2008-01-28 22:53:58 by ryg ryg
We're not doing any newton-ralphson iterations. The result we get out ftom the interpolatino is only 2 bits off at max. Yes, the LUT is quite big (512 entries), but it's in ROM anyway, so who cares? :)

Here's some pouetization as requested: BB Image
added on the 2008-01-28 23:03:41 by kusma kusma
What a beautiful ball! But I can only see about half of the faces!
added on the 2008-01-28 23:12:15 by doomdoom doomdoom
clz is a nice instruction for all sorts of things. sad the gba doesn't have it...

while we're at it:

kusma: have you ever played around with that Xport FPGA-thingy? Would be nice for a GBA accelerator...
And do you have any documentation on the GBA wireless link "cable"? I failed to find something about it on the net.

added on the 2008-01-28 23:52:40 by raer raer
rarefluid: it's still doable for most plain math-stuff in around 10 cycles using two levels of binary search and a LUT for the last 8 bits. But ofcourse, the ARM9's single-cycle version makes it possible to use for innerloops as well.
added on the 2008-01-28 23:59:30 by kusma kusma
IMHO extending the GBA sort of kills the purpose. Could be fun though.
But I would never do it before I have a good engine. If I am not extending my hardware before I use what I got optimally :P
Oh, and no, I haven't been playing around with the Xport, but I have been playing a slight bit around with Spookysys' GBA-FPGA-cart. It's quite fun. A couple of week-ends we made a wifi-flasher for the Nintendo DS (boot a DS cart and flash + boot to GBA-mode over wifi). I'm not really into hardware-design though, but spooky is. After all, it's his job ;)
added on the 2008-01-29 00:03:09 by kusma kusma
Graga: There's Manticore, which seems quite dead. But there may be no need for VGA output if you can read back the framebuffer.

I don't know if all the transactions you'd need to do to/from the FPGA would slow everything down though... But it may be useful as a powerful FPU/3D accelerator.

kusma: Are you or spooky coming to BP? I'd love to take a look at your toys :)
added on the 2008-01-29 00:12:03 by raer raer
Did it own the DS in polyfilling?
Dunno. The page says it runs at 50MHz, while the DS runs at 66 and has the extra ARM7 at 33MHz afaik. If it has the pipeline implemented in "hardware" it might be faster than the DS...
added on the 2008-01-29 00:30:43 by raer raer
rarefluid: I'm coming for sure, but I'm not the one who owns the FPGA-board. Spooky said he's interrested in going, but he wasn't entirely sure he would. I'll let him know you're interested in checking his project out, that might motivate him to go! ;)
added on the 2008-01-29 01:03:19 by kusma kusma
Rarefluid: I don't need a 3d engine, I have my own which shares some of the customizations that Kusma does, but it's not done yet. The polyfiller needs to be redone (badly). Actually, the whole drawing routine could use a touch-up, but it's mostly the polyfiller overhead that is eating frames at the moment. Maybe I should make a lut with values for small triangles (area < 16), but something tells me it would take a high amount of space.
kusma: Tell him. I'm quite new to FPGAs, but the idea of a GBA FPU or similar just thrills me :) Image sending 9 fp-numbers and a short to an adress and getting a filled polygon...
I might even bring beer. :)

btw. I was planing on doing a GBA engine, but I still don't have much time atm (which I obviously spend on doing all those video captures). Man, my days need more hours...

Do you guys use any sort of c-buffers, s-buffer, free span-buffers, z-buffers or <FANCY_TECHNIQUE_HERE>? Or is it painters algorithm for you?

I believe if you do stuff per scanline you can get away with c-buffering, which is cheap but can have artifacts.

And: Anyone tried filling blocks of polygons using tilemaps?
added on the 2008-01-29 09:51:06 by raer raer

login

Go to top