"Interference" oldschool effect - what's the trick?
category: code [glöplog]
Hi guys. A question for coders who have experience in programming classic amiga-effects on DOS PC.
In some old DOS demos from the early 90s, you can see the classic "interference" effect.
And it runs smoothly on such old machines as 80286, and I guess that it was not without "magic" 🙂 After all, on such hardware it is impossible to render a full-screen effect with nice fps, and without hack. And all old DOS-demos used VGA X-mode (for hardware switching of video pages, scrolling, etc.), instead of the usual video-buffer in RAM (as in later demos).
(example: Wish/Majic 12)
I have a hunch that for high-speed optimization of interference effect, the feature of planar organization of bit-plans in the A0000h segment (in x-mode) is somehow used. It's just intuition. But how exactly this can be used, I have not yet understood.
Can someone clarify what the trick is? 🙂
In some old DOS demos from the early 90s, you can see the classic "interference" effect.
And it runs smoothly on such old machines as 80286, and I guess that it was not without "magic" 🙂 After all, on such hardware it is impossible to render a full-screen effect with nice fps, and without hack. And all old DOS-demos used VGA X-mode (for hardware switching of video pages, scrolling, etc.), instead of the usual video-buffer in RAM (as in later demos).
(example: Wish/Majic 12)
I have a hunch that for high-speed optimization of interference effect, the feature of planar organization of bit-plans in the A0000h segment (in x-mode) is somehow used. It's just intuition. But how exactly this can be used, I have not yet understood.
Can someone clarify what the trick is? 🙂
Another version: a 2-bit video mode (05h) is used, and this allows a frame to be rendered much faster using only a 16000 byte framebuffer. Or this too wild?
Well, in the end, but how is it done on the Amiga OCS/ECS? :)
Well, in the end, but how is it done on the Amiga OCS/ECS? :)
On Amiga and Atari, the common approach would be to draw each layer on a seperate bitplane, which effectively gives you 2 layers and thus 4 colors (including background color) to use to present the effect.
On the Amiga you don't draw anything, set 5 registers, and be done with it. You then have the entire CPU and blitter left for cracking jokes.
it's not magic, but maybe you shouldn't think, that there is only 13h .. that should be enough, because where else is the fun ;)
Quote:
it's not magic, but maybe you shouldn't think, that there is only 13h .. that should be enough, because where else is the fun ;)
128000 byte read from memory + 64000 XOR operations + 64000 writes in memory?
I think it will be too slow on PC 80286 (even if we use STOSW). Considering that even a simple byte-by-byte frame rendering in mode 13h is too slow for this hardware. Therefore, in games, smooth scrolling of the screen is usually not found. The exceptions were a few games using mode-x VGA hardware scrolling, and this was a cool trick for IBM PC.
to keep things very dumb and simple
https://github.com/visualizersdotnl/hot-stuff-src/blob/master/code/demo.cpp
line 593
its like you would do one of those fake multi-tunnels - you simply address 2 buffers larger than your dest buffer (or calc the same pixel right there and there) and check which one has priority for that pixel (simple case: white or black?)
go from there, it's really easy to improve that to suit the hw you have
https://github.com/visualizersdotnl/hot-stuff-src/blob/master/code/demo.cpp
line 593
its like you would do one of those fake multi-tunnels - you simply address 2 buffers larger than your dest buffer (or calc the same pixel right there and there) and check which one has priority for that pixel (simple case: white or black?)
go from there, it's really easy to improve that to suit the hw you have
on 486 etc play with palette and whatnot too :)
Quote:
And all old DOS-demos used VGA X-mode (for hardware switching of video pages, scrolling, etc.), instead of the usual video-buffer in RAM (as in later demos).
Yeah this is not necessarily true at all :) It wasn't that binary/set in stone.
About Wish/Majic 12 just because it says min. requirements is 286 doesn't mean it was running smoothly on it. Everyone had 386 or even 486 in 1993 already. One common trick was to obviously write 2 (on 16bit CPU) or 4 aligned pixels at once (on 32bit CPU). For look-up you could also unpack even more pixels at once from a 1bit table with just concentric circles, do two look-ups to create interference and few shifts to unpack and combine and prepare data for 4 or 8 pixels and then set those pixels just using multiple "mov" operation (no "rep stosd" or "rep movsd" needed, but actually it was good idea to interleave memory writes with ALU operations). That if you wanted to do it nice&fast, but as far as I remember even more advanced look-up operations were already working fast on 386 e.g. for tunnels, so in 1993 you could probably just do whatever :P
Errata: I see people commenting it was indeed fast on XT - which is maybe a typo? cause 286 was actually AT not XT and XT is really old for 1993, but nevertheless, so could have been indeed more tricks then (nfo says it's just 13h though and judging from flash effect near the end it seems it's actually palletized mode).
Multiple tunnels (even 8-bit pixels) fast on a 386? Rethink that move son, you needed to optimize a fair but.
But that's the fun part. Or was, I've not coded a software effect on an actual 486 since.. OK last Friday, but not a serious one since 1997 or so (after that all went VESA and machines became faster and ModeX became more or less obsolete).
But if he just wants to know *how* the problem is *solved* why bother with all the details guys, trying to show off or something?
If someone knows how it's done the interesting puzzle is how to map it on your hardware with good performance and the look you want.
But that's the fun part. Or was, I've not coded a software effect on an actual 486 since.. OK last Friday, but not a serious one since 1997 or so (after that all went VESA and machines became faster and ModeX became more or less obsolete).
But if he just wants to know *how* the problem is *solved* why bother with all the details guys, trying to show off or something?
If someone knows how it's done the interesting puzzle is how to map it on your hardware with good performance and the look you want.
but->bit
Quote:
but actually it was good idea to interleave memory writes with ALU operations
yes, i think ALU (latches, BitMask) are used.
Moreover, all these tricks are used in a similar effect from "Second Reality"
https://github.com/fabiensanglard/SecondReality/blob/master/TECHNO/KOEA.ASM
(I did not understand its code in detail, but code contains "mov dx, 3c4h", that specify of witchcraft with VGA sequencer).
Details are important here. To put it simpler: my take is on 386 it could be as well 13h with optimized look-ups/packed-pixel writing. But since people report in comments it was working fast on AT/XT, maybe it was indeed using x-mode and then "rep movs*" to draw circular patterns with an (x,y) offset in each bit-plane (some info about the technique here). What's your take?
In this KOEA.ASM (SecondReality):
That's selecting a plane using 3c4h map mask register (al = 2).
My understanding is you still have to fill-up the plane using "rep movsw/movsd" every frame.
This code seem to be good example with comments. So there is no way to do hardware scrolling of each plane independently like you can do with the whole frame. But.. it's enough for interference effect, since you only have to blit shifted pattern (bitmap with an offset).
Code:
mov dx,3c4h
mov ax,0002h+pl*100h
out dx,ax
That's selecting a plane using 3c4h map mask register (al = 2).
My understanding is you still have to fill-up the plane using "rep movsw/movsd" every frame.
This code seem to be good example with comments. So there is no way to do hardware scrolling of each plane independently like you can do with the whole frame. But.. it's enough for interference effect, since you only have to blit shifted pattern (bitmap with an offset).
did anyone forgot about glorious 16-color modes and their 4-bitplane nature? :) i.e., you can set mode 0xD (320x200 16 colors), set appropriate palette (don't forget to remap attribute controller to linear palette mapping, for (i=0;i<16;i++){outp(0x3c0, i); outp(0x3c0, i);}), getting 16 colors out of 256k VGA palette.
After all those preps, all you need then is calculate two offsets for patterns, then simply copy them to two VGA bitplanes (since patterns are 1bpp, you could also store 8 preshifted copies in memory to save cycles on scrolling). in case of 320x200, ((320x200/8)* 2 * 60) ~= 1.1MB/sec, which is pretty doable for 16bit ISA VGA (not Trident :) and average 386.
After all those preps, all you need then is calculate two offsets for patterns, then simply copy them to two VGA bitplanes (since patterns are 1bpp, you could also store 8 preshifted copies in memory to save cycles on scrolling). in case of 320x200, ((320x200/8)* 2 * 60) ~= 1.1MB/sec, which is pretty doable for 16bit ISA VGA (not Trident :) and average 386.
(typo: I mean 70 frames per second, not 60, although you can tweak CRTC to reduce refresh rate, as I often did before :)
Quote:
did anyone forgot about glorious 16-color modes and their 4-bitplane nature? :)
what I hinted at above, now you've taken away the fun of figuring it out for himself ...
I like that rat.
trick is basically xor-operations per pixel or per byte. im not familiar with the amiga hardware, but my guess is that it is an xor-gate maybe doing some things in the copper/blitter or whatever.
On Amiga this effect is in the DNA of the hardware. There is a pointer array which lets you set where each bitplane should be read from in chip memory. This lets you scroll on 8-pixel granularity on X and 1 pixel granularity on Y. Additionally there are two scrollregisters that lets you scroll on less than 8-pixel granularity, one for odd and another for even bitplanes. On AGA you can even scroll on 1/4 pixel granularity in X for that extra pretentious interference effect.
You put a circle bit pattern into plane0 ptr and another in plane1 ptr and set the colors like this
0 = Black
1 = Green
2 = Red
3 = Interference color 1+2 (f.ex red+green)
Plane0 pixel bits: 01100001100
Plane1 pixel bits: 11001100110
Result pixel bits: GIGBRRBGIRB
You put a circle bit pattern into plane0 ptr and another in plane1 ptr and set the colors like this
0 = Black
1 = Green
2 = Red
3 = Interference color 1+2 (f.ex red+green)
Plane0 pixel bits: 01100001100
Plane1 pixel bits: 11001100110
Result pixel bits: GIGBRRBGIRB
Quote:
Errata: I see people commenting it was indeed fast on XT - which is maybe a typo?
I'm guessing you're referring to my comment on Wish, and I did indeed run it on an XT (9.54MHz V20 to be exact) and not an AT.
Quote:
I did indeed run it on an XT (9.54MHz V20 to be exact) and not an AT.
Yeah that's impressive. So definitely not 13h then. As wbc\\bz7 explained and the links I was sending from Michael Abrash's book and SecondReality code, the smallest 0Dh bit-plane is the way to go.
I just remember people moved to LFBs later and techniques like computing and writing few aligned pixels at once.. but that would work on 386 DX or 486+. My first PC was actually 386 DX-40, so it was already blazingly fast comparing to 286 and XT - that's why I suspect you could in fact pull off an interference effect in 13h. But I have no way to verify this statement today ;)
I remember Comanche Maximum Overkill came in 1992 and it was running smoothly on my 386-DX, definitely 256 colors, I guess using 320x240, which is a bit more demanding than 0Dh (16color 320x200).
Since the smart guys are here ... Let's take a look at another classic effect that is also present in Wish / Majic 12.
called "Kefrens Bar" (Vertical Raster Bars, Vertical Cooper Bars).
As far as I understand, it was created like this:
1. Draw several pixels into the first line of video memory segment (A0000h if it is PC in VGA mode).
2. We are waiting for the horizontal retrace (end of the screen line drawing by the videocard)
3. Draw a few pixels again on the first line
video memory segment (offset to the left or right if we draw waves).
4. (?) We force the video adapter to read the video memory again from the starting position (from the first line). Or we change the offset for video memory (in RAM) so that the video adapter reads the first line again.
Or, initially we define the size of the video buffer in one line (if the hardware allows it).
The 4th point is needed in order not to copy the contents of the previous line to the current line, and to ensure the minimum load on the CPU.
So here's the question. Is it possible on a PC (in 13h mode) to do something from the 4th point? Since this effect is available in the "Wish" demo, it is possible :)
PS: if you do not understand my explanation, watch this https://www.youtube.com/watch?v=PaHHMaLTNfA
called "Kefrens Bar" (Vertical Raster Bars, Vertical Cooper Bars).
As far as I understand, it was created like this:
1. Draw several pixels into the first line of video memory segment (A0000h if it is PC in VGA mode).
2. We are waiting for the horizontal retrace (end of the screen line drawing by the videocard)
3. Draw a few pixels again on the first line
video memory segment (offset to the left or right if we draw waves).
4. (?) We force the video adapter to read the video memory again from the starting position (from the first line). Or we change the offset for video memory (in RAM) so that the video adapter reads the first line again.
Or, initially we define the size of the video buffer in one line (if the hardware allows it).
The 4th point is needed in order not to copy the contents of the previous line to the current line, and to ensure the minimum load on the CPU.
So here's the question. Is it possible on a PC (in 13h mode) to do something from the 4th point? Since this effect is available in the "Wish" demo, it is possible :)
PS: if you do not understand my explanation, watch this https://www.youtube.com/watch?v=PaHHMaLTNfA