## Fast Plasma Effect

**category:**general [glöplog]

Hi !

I have written some bad C# code with use of SDL.NET. I want to make a fast Plasma Effect. Its meant for a animated background so it must be quick :).

Can someone give me some faster ways to edit a surface in SDL, or see for yourself.

I have written some bad C# code with use of SDL.NET. I want to make a fast Plasma Effect. Its meant for a animated background so it must be quick :).

Can someone give me some faster ways to edit a surface in SDL, or see for yourself.

**Code:**

```
public virtual void UpdateEffect()
{
int w = surface.Width;
int h = surface.Height;
if (cls == null)
{
cls = new int[w, h];
for (int x = 0; x < w; x++)
{
for (int y = 0; y < h; y++)
{
cls[x, y] = (int)(
(127.5 + +(127.5 * Math.Sin(x / 32.0)))
+ (127.5 + +(127.5 * Math.Sin(y / 32.0)))
+ (127.5 + +(127.5 * Math.Sin(Math.Sqrt((x * x + y * y)) / 32.0)))
) / 3;
}
}
}
Color[,] colors = new Color[w, h];
paletteShift = Convert.ToInt32(Environment.TickCount/1000 );
for (int x = 0; x <w; x++)
{
for (int y = 0; y < h; y++)
{
colors[x, y] = palette[(cls[x,y] + paletteShift)%255];
}
}
surface.SetPixels(new Point(0, 0), colors);
}
```

SDL is slow by concept - if you're using C# and .NET, why dont you just go DirectDraw? would save the hassle - at least most of it... (sure, it's deprecated, but only because everything is accel now)

Exploring sw rendering can be a nice thing.

Some quick random hints looking at the code:

1)

2) Using integer math / fixed point math still can improve speed a lot expecially when math precision isn't a must

3_minus)

- multiply for (1/x) instead of dividing for x (if x is constant)

- when doing modulus operations where the divisor is a power of two (x % 2^n) (in your code "... % 25

- when dividing/multiplying for powers of two (integers) you can use bit shifting (i.e.: x * 32 <--> x << 5). Probably compilers already do these last two kinds of optimizations.

(bitwise operations)

Found also this site, maybe you'll find it useful: http://student.kuleuven.be/~m0216922/CG/index.html.

Some quick random hints looking at the code:

1)

**A**lways try to use look-up tables when it comes to computationally heavy stuff like sin,cos,tan,sqrt and friends (this should give you a good speed-up...)2) Using integer math / fixed point math still can improve speed a lot expecially when math precision isn't a must

3_minus)

- multiply for (1/x) instead of dividing for x (if x is constant)

- when doing modulus operations where the divisor is a power of two (x % 2^n) (in your code "... % 25

**6**" should be) you can replace it with a faster bitwise AND where "(x & (2^n)) - 1", i.e. x % 256 <-equals-> x & (256 - 1).- when dividing/multiplying for powers of two (integers) you can use bit shifting (i.e.: x * 32 <--> x << 5). Probably compilers already do these last two kinds of optimizations.

(bitwise operations)

Found also this site, maybe you'll find it useful: http://student.kuleuven.be/~m0216922/CG/index.html.

actually, cls doesn't seem to be time-dependent at all, so just compute it once (you don't even need sine tables for that, woohoo!).

and using an actual paletted image format and actual palette rotation instead of doing a paletted->truecolor conversion by hand in managed code each frame (it should be either %256 or &255, by the way) should make it run blazingly fast without any real work :)

and using an actual paletted image format and actual palette rotation instead of doing a paletted->truecolor conversion by hand in managed code each frame (it should be either %256 or &255, by the way) should make it run blazingly fast without any real work :)

ryg: smashing your lcd gives an even cooler plasma-effect in no-time!

IF the surface width is a power of 2 and the total pixel count is a multiple of 16

IF your buffers are in user memory

then try this

change your loops from this

buf = new int[x, y]

for(int x = 0; x < w; x++)

{

for(int y = 0; y < h; y++)

{

buf[x, y] = code here

...

to something like this

int Log2(int val)

{

ASSERT(val > 0)

int res = -1;

while((1 << ++res) <= val);

return(res - 1);

}

mask = w - 1 //w must be a power of 2 remember

bitshift = Log2(w)

pixelcount = w * h

//initialize buffer like this

buf = new int[pixelcount]

for(int j = 0; j < pixelcount; j += 48)

{

for(int i = 0; i < 16; ++i, ++j)

{

// i is never used

//if you need x and y, here is how to compute them

x = j & mask

y = j >> bitshift

buf[j] = palette[(cls[x,y] + paletteShift)%255];

x = (j+16) & mask

y = (j+16) >> bitshift

buf[j+16] = palette[(cls[x,y] + paletteShift)%255];

x = (j+32) & mask

y = (j+32) >> bitshift

buf[j+32] = palette[(cls[x,y] + paletteShift)%255];

x = (j+48) & mask

y = (j+48) >> bitshift

buf[j+48] = palette[(cls[x,y] + paletteShift)%255];

This memory access method gave me a better speed increase than anything else(in C)

The moral here is, dont access big buffers in user memory sequentially

Dam, I can't post code correctly

I use the magic number 16 because

16 ints = 16 x 4bytes = 64 bytes = cache line size of my cpu

But I'm not sure this is the reason of the speed increase

16 ints = 16 x 4bytes = 64 bytes = cache line size of my cpu

But I'm not sure this is the reason of the speed increase

For the first program, in SDL you can use the function of the documentation to edit the surface.

But it's very slow.

But it's very slow.

I doubt that using sin tables would speed up anything, only opposite. At least, lookup tables slow down shaders.

Allocating memory in a time-critical function called every frame isn't a good idea...

And can't SDL automagically convert to truecolor if you specify a 8-Bit surface with palette? I vaguely remember I saw some functions that did that. They might be optimized already, so at least that should work a bit quicker.

And can't SDL automagically convert to truecolor if you specify a 8-Bit surface with palette? I vaguely remember I saw some functions that did that. They might be optimized already, so at least that should work a bit quicker.

isn't cls allocated once already? and sintables already calculate only one time? and couldnt you avoid these shifts and ands by taking propper loops, you can predict it by using multiples of 64 as texture size? sorry kiddin but it seems that this one can be improved a lot.. thinkin about 8 bit mode and palette cycling or making palette double sized (and doublicated) so you can avoid the and(%255) at each pixel.. there are more possibilities. i suppose.. ;) maybe i'm wrong..

**Quote:**

I doubt that using sin tables would speed up anything, only opposite. At least, lookup tables slow down shaders.

Of course they would. I have tried once instead of using lookup tables for my regular plasmas to use math.sin per pixel and it was 3 times slower. Well that could be different for shaders I guess.

p.s. Hmm,. a plasma with sqrt? Never tried this equation, I wonder what kind of shape does it show!

but these "news" hurt, are they really required here? something which is always (atleast mostly) a bad idea.

sqrt... well, i guess it's fast on pc, lol ;)

the sin(sqrt(x*x+y*y)) shows circular ripples..

it's amazing that a plasma with sdl (or directx) is programmable in 30 seconds, flat. when i did my first plasma in '96 (or was it 95?) it cost me days doing all those luts and palette calcs in assembler. </oldfart>

the sin(sqrt(x*x+y*y)) shows circular ripples..

it's amazing that a plasma with sdl (or directx) is programmable in 30 seconds, flat. when i did my first plasma in '96 (or was it 95?) it cost me days doing all those luts and palette calcs in assembler. </oldfart>

1. don't re-allocate colors[][] in every frame, just make it a class member variable like cls[][].

2. replace %255 by &255, because %255 is wrong and &255 is faster -- or eliminate it altogether, as mad suggested

3. don't use managed code :)

2. replace %255 by &255, because %255 is wrong and &255 is faster -- or eliminate it altogether, as mad suggested

3. don't use managed code :)

people talking about bit manipulation to speed up a plasma effect on pc in 2007. who's trolling now?

thnx for the reactions

it's not a big thing (perhaps even optimized away by the compiler) but you also don't have to do new Point(0, 0) every frame, just store it as a class member... and you could do the "if (cls == null)"-thing in the some function which is executed before the first drawing (one check less).