Fast/modern platforms that allow "bare-metal" style coding?
category: code [glöplog]
What are the fastest/newest platforms which can still be easily "bare-metal"-style coded for?
Bonus if it's also a fixed hardware so you don't have to take into account too many different configurations.
With modern OSes and systems you never have that full control, where you know that every cycle is the same as last time you ran this program. Instead some peripheral drive might need some processing time, or some deep-level OS stuff gets higher priority or...
That was never an issue with the C64 or any of the other 8-bitters, and I started thinking - which were the last platforms where you could get 100% control of the system, and know that your and ONLY your code gets those precious CPU cycles?
Amiga/Atari 16-bit era still allowed this. Though I've never done any coding for these myself, I think that even on the accelerated AGA and Falcon machines you still switch out the OS completely, or?
Still, with those you started to see the issues with people having different configs, some had 030@28MHz, some hade 060@50MHz, different mem sizes etc.
Going even more modern, I actually recall trying to write bare-metal ARM-assembler for the original Raspberry Pi. Think I managed to put some pixel in a framebuffer but not much more...
Now, that gets us into the "easily" qualifier I put in the original statement, because of course it's even possible to go "raw" on a modern PC, but good luck getting graphics and sound going without any OS with drivers!
Bonus if it's also a fixed hardware so you don't have to take into account too many different configurations.
With modern OSes and systems you never have that full control, where you know that every cycle is the same as last time you ran this program. Instead some peripheral drive might need some processing time, or some deep-level OS stuff gets higher priority or...
That was never an issue with the C64 or any of the other 8-bitters, and I started thinking - which were the last platforms where you could get 100% control of the system, and know that your and ONLY your code gets those precious CPU cycles?
Amiga/Atari 16-bit era still allowed this. Though I've never done any coding for these myself, I think that even on the accelerated AGA and Falcon machines you still switch out the OS completely, or?
Still, with those you started to see the issues with people having different configs, some had 030@28MHz, some hade 060@50MHz, different mem sizes etc.
Going even more modern, I actually recall trying to write bare-metal ARM-assembler for the original Raspberry Pi. Think I managed to put some pixel in a framebuffer but not much more...
Now, that gets us into the "easily" qualifier I put in the original statement, because of course it's even possible to go "raw" on a modern PC, but good luck getting graphics and sound going without any OS with drivers!
depending on what you call "modern":
RPi, NDS, Wii, PS2, 3DS and Switch and maybe Wii U (for these three you can sorta-kinda install custom firmware, but it's not always a good idea to completely hand-code *that*), PSVita and PS3 maaaybe??
also depends on your interpretation of "easily". for the 3DS, Switch and Wii U you need some hacks to kill the OS and replace it with your own, and the GPUs are nontrivial enough that anything other than a framebuffer will be basically as hard as a PC GPU driver. if you're lucky enough to get to a framebuffer ofc.
NDS, Wii, PS2 code all run on bare-metal, maybe PS3, PSVita (and PSP too?) can do that as well, but I don't know much about these platforms
RPi, NDS, Wii, PS2, 3DS and Switch and maybe Wii U (for these three you can sorta-kinda install custom firmware, but it's not always a good idea to completely hand-code *that*), PSVita and PS3 maaaybe??
also depends on your interpretation of "easily". for the 3DS, Switch and Wii U you need some hacks to kill the OS and replace it with your own, and the GPUs are nontrivial enough that anything other than a framebuffer will be basically as hard as a PC GPU driver. if you're lucky enough to get to a framebuffer ofc.
NDS, Wii, PS2 code all run on bare-metal, maybe PS3, PSVita (and PSP too?) can do that as well, but I don't know much about these platforms
RPi is the most modern and you can go completely bare metal there. I know people who are doing just that.
PS2 is pure metal bliss, but it's a lot of work to pull something off. No stupid OS in sight anywhere (except for a few calls to establish an interrupt handler or thread - but there isn't even a memory allocator), and you have at least three CPUs to work with.
Amiga AGA isn't as exciting for bare metal coding as one might think. The OS isn't a collection of syscalls, but a bunch of libraries and threads, and all file I/O is asynchronous under the hood, always. So if you disable multitasking, you cannot load from some random harddisk with some random filesystem anymore. Also you don't have access to all memory, as the OS places your code at arbitrary addresses. So to go fully bare metal on the Amiga, you better write a trackloader yourself and make OCS trackmos. Or you put everything into a really big binary and don't plan on returning to the OS. I recommend Amiga OCS for getting 100% control of the system. It's relatively easy, almost instant gratification included.
PS2 is pure metal bliss, but it's a lot of work to pull something off. No stupid OS in sight anywhere (except for a few calls to establish an interrupt handler or thread - but there isn't even a memory allocator), and you have at least three CPUs to work with.
Amiga AGA isn't as exciting for bare metal coding as one might think. The OS isn't a collection of syscalls, but a bunch of libraries and threads, and all file I/O is asynchronous under the hood, always. So if you disable multitasking, you cannot load from some random harddisk with some random filesystem anymore. Also you don't have access to all memory, as the OS places your code at arbitrary addresses. So to go fully bare metal on the Amiga, you better write a trackloader yourself and make OCS trackmos. Or you put everything into a really big binary and don't plan on returning to the OS. I recommend Amiga OCS for getting 100% control of the system. It's relatively easy, almost instant gratification included.
@bifat: Is that bare metal coding approach on the RPi interesting for sizecoding (<= 256 Bytes) or is the overhead too big ? Any Hello-World-Put-A-Pixel-Kind-Of-Thing somewhere to look into ?
@Kuemmel, I don't know the RPi platform myself and how much boilerplate is included for getting something up. Insane/TSCC can probably answer that.
Quote:
@bifat: Is that bare metal coding approach on the RPi interesting for sizecoding (<= 256 Bytes) or is the overhead too big ? Any Hello-World-Put-A-Pixel-Kind-Of-Thing somewhere to look into ?
Look here http://www.sizecoding.org/wiki/RISC_OS_on_ARM_based_CPUs
BTW I just noticed, that the Wiki got a lot of new content for non-x86 platforms!
I think there was a thread here on Pouet discussing the RISC OS on RPi thing. The code examples looks very clean and nice, you seem to get a fullscreen framebuffer and can go to town on it, but I'm not sure on how intrusive RISC OS itself is, and how in control you are of VSYNC status etc.
Some people say that Pico-8 is pretty cool.
@Dresdenboy: I wrote that article on sizecoding :-) ...it's for Risc OS, not bare metal...
Noname, Pico8 is as far from bare metal as you can possibly get. They could at least have opened the system for programming Lua's virtual machine code directly, which is pretty cool and easily accessible in stock Lua (see luac and luac -l for disassembly)
I like the GBA, but it's neither fast or new.
I like both the CPU and the hardware, which you can use or just mostly ignore and do software rendering in one of the bitmap modes. Setting up one these is like one write to REG_DISPCNT, and setting up a palette of using a palette mode.
IIRC you can do shit like reprogramming video registers using the DMA controller triggered by hblank (so you have some sort of copper, if you wish)
I like both the CPU and the hardware, which you can use or just mostly ignore and do software rendering in one of the bitmap modes. Setting up one these is like one write to REG_DISPCNT, and setting up a palette of using a palette mode.
IIRC you can do shit like reprogramming video registers using the DMA controller triggered by hblank (so you have some sort of copper, if you wish)
Regarding bare-metal on Raspberry Pi, I think it was the "Helloworld" from this I looked at as a starting point:
https://github.com/PeterLemon/RaspberryPi
https://github.com/PeterLemon/RaspberryPi
I have to mention the bitbox console for being quite modern, very baremetal, but however not that powerful. There's a limit to how baremetal you can go with modern hardware that tends to be a lot more complex to set up and use (if it is documented at all)
a part of me still misses PS2
(the sick part)
(the sick part)
The STM32F4xxx series was quite fun to program bare metal since you didn't have to write lots of code to setup the HW, unlike some other (but also slightly bigger) SoCs I've worked with (plus I liked the STM HW register design in general).
A while ago I abused its GPIOs to create a crude VGA output and create some kind of Amiga Copper-like displaylist processor (in SW).
The OpenPandora was also quite fun since it has a DSP next to the CPU which gives you full control over the whole system (I wrote the driver for that, including a super-minimal bare metal DSP "OS" and a simple graphics library that used a locked-down L1D cache area and DMA streaming for rather fast alpha blended blits). The only really noteworthy application of it was the PSX emulator, though. It used the DSP for sound processing which boosted the FPS of some games to "first frame" vsync :)
Many of the modern platforms are quite boring, though: The display controller usually gives you just a framebuffer (no raster interrupts etc, just vsync, only few controllers support multiple layers/sprites, ..), and the HW often requires elaborate initialization sequences (compared to old school platforms).
Even if the platform has more sophisticated HW features like a GPU, you can basically forget about using it since it requires months, if not years to program it from scratch in a sensible fashion.
May I ask why you are worrying about the (few) extra cycles an OS takes ? You could pick one of the many (cheap/inexpensive) ARM SoCS, run your demo with root privileges and an elevated (i.e. close to realtime) process priority, and you'll basically have the system all for yourself (you could even run a kernel with realtime patches if you really need extra low latencies for some reason, e.g. audio).
On a C64 for example cycle-exact code was a must for all the custom-chip bit-banging trickery but on modern HW this is hardly the case.
A while ago I abused its GPIOs to create a crude VGA output and create some kind of Amiga Copper-like displaylist processor (in SW).
The OpenPandora was also quite fun since it has a DSP next to the CPU which gives you full control over the whole system (I wrote the driver for that, including a super-minimal bare metal DSP "OS" and a simple graphics library that used a locked-down L1D cache area and DMA streaming for rather fast alpha blended blits). The only really noteworthy application of it was the PSX emulator, though. It used the DSP for sound processing which boosted the FPS of some games to "first frame" vsync :)
Many of the modern platforms are quite boring, though: The display controller usually gives you just a framebuffer (no raster interrupts etc, just vsync, only few controllers support multiple layers/sprites, ..), and the HW often requires elaborate initialization sequences (compared to old school platforms).
Even if the platform has more sophisticated HW features like a GPU, you can basically forget about using it since it requires months, if not years to program it from scratch in a sensible fashion.
May I ask why you are worrying about the (few) extra cycles an OS takes ? You could pick one of the many (cheap/inexpensive) ARM SoCS, run your demo with root privileges and an elevated (i.e. close to realtime) process priority, and you'll basically have the system all for yourself (you could even run a kernel with realtime patches if you really need extra low latencies for some reason, e.g. audio).
On a C64 for example cycle-exact code was a must for all the custom-chip bit-banging trickery but on modern HW this is hardly the case.
maybe some esp32 based boards like the odroid go?
Quote:
@Dresdenboy: I wrote that article on sizecoding :-) ...it's for Risc OS, not bare metal...
Haha, my bad! I was even a bit confused whether you knowing this already, because I somehow remembered the discussions after Outline online which I followed on pouet, twitch and discord with you and the other sizecoders being involved. =)
The missed difference was "bare metal".
BTW I'd agree with wisywtf about ESP32 (the ESP32-S2 has both a Tensilica 32b core and a RISC-V companion core).
Quote:
May I ask why you are worrying about the (few) extra cycles an OS takes ? You could pick one of the many (cheap/inexpensive) ARM SoCS, run your demo with root privileges and an elevated (i.e. close to realtime) process priority, and you'll basically have the system all for yourself (you could even run a kernel with realtime patches if you really need extra low latencies for some reason, e.g. audio).
Well, it's not just the cycles. Though I do like things running at a perfect steady-framerate without a single drop - and if I've coded something that really pushes the limits, but still maintain that perfect framerate, I want it to *always* do that even on other peoples systems, and not suddenly "Oh, now SuperComplexOS 8.53 is released, and it added this-and-that nice feature, and it only runs a tiny, tiny bit slower" - that might be enough to suddenly miss that 60Hz update.
But my major dislike when it comes to "big" OSes and demos is the constant incompatibility issues. For example I remember when I bought my first-gen Raspberry Pi (when it was very hyped), and thought, cool, maybe this will be a standardized platform that will live for a while and have some nice demos released for it.
There were a couple of demos, but when I for example tried to run "Bad Hair Decade" by Hedelmae, it didn't run, because the version of Raspbian I was running was not the same as the developer had used, and well, then it didn't work.
When it comes to this, Linux is the biggest culprit, I've never seen an OS which has so low binary compatibility, you code something and the chance of it running on someone elses box is zero, because GFrunkslbkxztLib1.3.88 is *of course* not binary compatible with GFrunkslbkxztLib1.3.87 or whatever crap it might be.
Reading about the RISC OS RPi stuff got me interested though, so I dug it out again, installed RISC OS, and wanted to try these tiny-intros. Well, did they run? No, because they seemed to use features only available on newer hardware versions of the Raspberry Pi. Finally, I found one (Edgedancer) that had some "ARM-only" compatible version that actually started, but then the gfx mode was all messed up, and then I gave up. So no RISC OS for me.
Anyway, yeah, I'd like a 100% stable platform, both hardware and OS, that does not move whatsoever in any direction, and that I know if someone buys an old "Hardware X" 15 years from now and loads up my demo, it will run exactly the same as it did for me when I coded it.
Linux is not exactly known for its great binary compatibility (different philosophy there), although the OpenPandora (which runs a lightweight Ångström Linux) is a notable exception (probably b/c it's primarily a games console).
It delivers a steady frame rate and also has a nice 50Hz/60Hz screen (and very long battery life, ~17 hours).
But yeah, it's out of production for some time now and its successor ("Pyra") is still under development (but will be ready in two months (tm)).
Actually, the Pyra is almost done now and it's not unlikely that it will finally (!) ship this year.
The main drawback is that the tech is a bit outdated by now (OMAP5 / PowerVR SGX544 (GLES 2.0) / Cortex A-15 @~1.2 GHz), and it's relatively pricey (~500 Euros).
I'll definitely get one anyway since it's one of a kind and probably will have great SW support, just like its predecessor.
Re the Raspberry Pi: My experience with that one is rather limited but I do have one (3B) and at one point did some benchmarks with it. Its GPU managed to push ~3.3mio 4x-multi-textured-mapped+alpha blended+z/s-tested triangles per second (most of them rather smallish, ~16x16 pixels). Seemed decent. Maybe linking your demo statically would solve most of the "dependency-hell" issues ? last but not least, it's an inexpensive and widespread platform so chances are high(er) that people will actually run your demo.
One thing I noticed on the Raspberry Pi is that *everything* (even simpler tests) seemed to be capped at 30fps. I didn't investigate further -- maybe it was b/c it was connected to a full-HD beamer and the swapbuffers call took its sweet time to blit things onto the screen ? *shrug* (I _think_ there is an API for changing screen modes, though..)
It delivers a steady frame rate and also has a nice 50Hz/60Hz screen (and very long battery life, ~17 hours).
But yeah, it's out of production for some time now and its successor ("Pyra") is still under development (but will be ready in two months (tm)).
Actually, the Pyra is almost done now and it's not unlikely that it will finally (!) ship this year.
The main drawback is that the tech is a bit outdated by now (OMAP5 / PowerVR SGX544 (GLES 2.0) / Cortex A-15 @~1.2 GHz), and it's relatively pricey (~500 Euros).
I'll definitely get one anyway since it's one of a kind and probably will have great SW support, just like its predecessor.
Re the Raspberry Pi: My experience with that one is rather limited but I do have one (3B) and at one point did some benchmarks with it. Its GPU managed to push ~3.3mio 4x-multi-textured-mapped+alpha blended+z/s-tested triangles per second (most of them rather smallish, ~16x16 pixels). Seemed decent. Maybe linking your demo statically would solve most of the "dependency-hell" issues ? last but not least, it's an inexpensive and widespread platform so chances are high(er) that people will actually run your demo.
One thing I noticed on the Raspberry Pi is that *everything* (even simpler tests) seemed to be capped at 30fps. I didn't investigate further -- maybe it was b/c it was connected to a full-HD beamer and the swapbuffers call took its sweet time to blit things onto the screen ? *shrug* (I _think_ there is an API for changing screen modes, though..)
p.s.: out of those 3.3 mio aforementioned tris, >90% were offscreen. whoever wrote that benchmark obviously liked to torture GPUs
OpenPandora! Now that was something I've completely forgotten - I actually have one of those from the original first batch (I was active in the GP32 community at the time that the Pandora project started, so I had an early place in the list)
I never actually did anything with it, but now I dug it out of storage and I'm trying to charge the batteries. Let's keep our fingers crossed it still lives, and if it does, I'll see if I can code "the way I want" with it - just getting a nice double buffered raw framebuffer, doing some software rendering with a moderately fast CPU could be fun.
I never actually did anything with it, but now I dug it out of storage and I'm trying to charge the batteries. Let's keep our fingers crossed it still lives, and if it does, I'll see if I can code "the way I want" with it - just getting a nice double buffered raw framebuffer, doing some software rendering with a moderately fast CPU could be fun.
Ah crap, the Pandora hade a PowerVR SGX 530 in it, I didn't remember that. So it's 3D accelerated, guess doesn't make much sense to do a software rendered demo then. :/
every x86 based system running DOS is a bare metal system which gives you full control over the machine, and if you stick to the standards like VGA, your software will run fine on most of the systems
Quote:
Ah crap, the Pandora hade a PowerVR SGX 530 in it, I didn't remember that. So it's 3D accelerated, guess doesn't make much sense to do a software rendered demo then. :/
Well, most of these openhandelds anyway, even the later with 3d acceleration (GCW Zero or a better redesign RG350), releases are still software rendered and acceleration rarely used. Though it's usually emulators and the same ported low quality homebrew games. I also bought a Gamekiddy 350h recently (faster CPU, no 3d acceleration) because I am hoping to optimize the 3DOh emulator to actually be playable. Meanwhile, they most still limited to 240p screens, while CPUs pretty powerful for good software rendering anymore. Even in an old Dingoo or GP2X it was powerful enough to do good software rendering (and if it doesn't feel as powerful, well it's a good challenge. I usually work on much much lower CPUs like the 3DO ARM 12.5mhz and still feel things are possible)
@Sdw: Don't give up so fast with Risc OS :-) The people in the Forum would always help out. The screen mode thing is due to some lines in the config. And for Thumb-2/NEON you would need a RPI2/3/4...only the first RPi doesn't work...as every new OS that is completely maintained on a very limited non-commerical enthusiast-based level I guess some obstacles are to be dealt with...