Fast/modern platforms that allow "bare-metal" style coding?

category: code [glöplog]

Nintendo DS: definitely baremetal, you can't run an OS on it because it has no MMU. but probably not the most powerful thing you can write baremetal code for (4M RAM, 66/33 MHz ARM9 and ARM7, two GBA PPUs (which are basically the same as a SNES PPU plus a "GPU" that's shittier than the one in the PS1). It's pretty fun to code for (but I might be biased) and, as I said before, I'd love to have some competition, but, for the "fast/modern" part, PS1, PS2, N64, GCN, Wii are more powerful & are baremetal-codeable as well. See my InerciaDemoparty seminar for more info, also about development environments and emulators etc., but if you want a good debugger, you're out of luck.

Nintendo 3DS: it's definitely easier to code for (Citra has a good gdb stub, and you can even launch *and debug* code on real hardware over wifi, which is pretty cool, and the hardware is also capable of doing WPA2 instead of only WEP, which makes connecting it to your demotool easier), but demos are typically made on top of the OS that runs on the device, using the system-provided 'services' (it's a microkernel) instead of using homebrew drivers. However, with some homebrew launchers (eg. Luma3DS iirc), it is possible to run baremetal code (such as the homebrew tools open_agb_firm and GodMode9), but then you lose the benefits of easy code development: there's no standard set of device drivers (so good luck using the GPU or SD card or wifi, though you could use eg. open_agb_firm as a base), and testing is *incredibly* annoying: *no* good LLE emulators (Citra does HLE only, but maaaaybe Corgi3DS might be able to be helpful here). also for testing it on real hardware, you have to put it on an SD card, install the .firm file, then launch it, which is a huge timesink. (disclaimer: I don't have a 3DS and I've never done homebrew stuff for it, but this is what others have been telling me. maybe ask halcy for more info on this)

added on the 2020-11-22 16:34:52 by porocyon

Quote:

you can't run an OS on it because it has no MMU

An MMU isn't a requirement for an OS. Amiga and early Macs didn't have MMUs for example.

added on the 2020-11-22 16:49:08 by absence

well yes, you could *technically* just put everything in a single address space, and use software task switching (because I'm pretty sure the ARMs in the NDS don't support this in hardware either), but, that's only going to be an impediment for a game (at these hw specs at least), and as Nintendo wanted some security features (which is why they had the entire ARM7<->ARM9 stuff to begin with), an OS like this would give you none of that

added on the 2020-11-22 17:38:16 by porocyon

Thank you porocyon that was interesting.

added on the 2020-11-22 20:15:30 by Moerder

Just to give you an inspiration (perhaps), I just tweaked EFI boot on my old spare iMac (2011) to natively install FREEDOS 1.2 on it smooth and stable. DOS4GW does not work as it uses keyboard controller for some reason and it's the only major malfunction I've spotted so far (0x60/0x64 ports are deaf and causing internal DOS/16M timeouts in the extender).

Other than that, most of the stuff works relatively well, of course you don't have SB, but you can access 64bit (long mode), pci board directly to discover HD Audio plugged in, and it comes with VESA 3.0 driver.

Perhaps having bare metal modern PC would be interesting to some to rediscover.
I find this HACKINGDOS box quite kind of sick and refreshing at the same time ;-)

Especially considering the number of forced reboots until I finally got it working as expected.

cheers,
h1

added on the 2020-11-22 22:19:19 by hollowone

something like this perhaps?

added on the 2020-11-24 12:58:01 by ferris

Thanks @ferris, that device looks beautiful and retro.

added on the 2020-11-24 13:25:26 by neoneye

OK I'm game. I scores a DS Lite and a DSi cheaply. They are in good condition, but had no games with them respectively that mind training crap only.

Now what flash card do I get?

added on the 2020-11-26 17:06:54 by Moerder

I'd say R4i/Acekard2i usually work best, but any will do, really.

But, if you have an SD card, you don't need a flashcart to run code on a DSi: you can use this or this exploit to run this .nds loader/launcher (put it on the SD card as "BOOT.NDS")

If you want to run homebrew code on launch without using an exploit every time, you can try installing unlaunch, which will run the "bootcode.dsi" file on the SD card on boot.

For completeness, crossdev toolchain link.

added on the 2020-11-26 17:19:29 by porocyon

Oh boy. Where do I even start.

Quote:

well yes, you could *technically* just put everything in a single address space, and use software task switching (because I'm pretty sure the ARMs in the NDS don't support this in hardware either),

What is a hardware task switching feature? Multiple cores? Hyperthreading? I guessed all OS do timeslicing and task switching from software, that's what a pre-emptive kernel is about...? If those tasks are in a separate or in the same address space that's really beside the point... (As even on a "modern" system with MMU, your threads will do timesharing in the same address space. And most OS-es will avoid delegating threads to other cores unless it gets necessary/beneficial due to higher load caused by one of those threads, due to the false sharing problem (and other issues)... So, yeah....)

Quote:

but, that's only going to be an impediment for a game (at these hw specs at least), and as Nintendo wanted some security features (which is why they had the entire ARM7<->ARM9 stuff to begin with), an OS like this would give you none of that

Now, this was a very very very VERY long time ago, almost in another lifetime, around 2006, but I was a Nintendo DS Game Developer, with an official Nintendo DevKit and everything, and I'm only like 95% sure, but I think the Nintendo DS ROM OS can do multiple threads/tasks on the ARM9. In "software task switching" and sharing the same address space, of course. Now, all of this is not available to Homebrew devs I guess, because the ROM docs never leaked, and/or not useable when running this kind of code and the hardware was simple enough to just do direct HW banging instead, and build everything from scratch in Homebrew.

(Do you want a copy? ... Err, just kidding Furukawa-san!)

added on the 2020-11-26 18:17:53 by Charlie

Quote:

What is a hardware task switching feature?

example (386), it's the CPU itself that swaps out the current registers etc. automatically, instead of having to do it in software by saving and loading every instruction separately.

The ARM11 MPCore has task and context ID registers as well in CP15 (but no automatic task switching), ARMv7 and up also has an "address space identifier" register. Neither ARMs in the NDS (ARM7TDMI and ARM946E-S) have any of these registers.

Quote:

but I think the Nintendo DS ROM OS can do multiple threads/tasks on the ARM9. In "software task switching" and sharing the same address space, of course

Yes, Ninty SDK code has a multithreading feature (on both ARMs actually, iirc), I've seen it in disassembly. It's basically just setjmp/longjmp, and those functions are available practically anywhere.

But that wasn't exactly what I meant. The 3DS still runs the launcher (and several other apps the user may have open) in the background, along with system services etc., as it's a microkernel. Games need to call into these services (which needs a context switch) to access basically any hardware things: read button/touchscreen input, play some audio, perform a 3D draw call. (Though on the NDS, if you want to play audio from the ARM9 side, it still has to go to the ARM7. Similarly, the 3DS OS *might* allow games to map the GPU MMIO registers into their address space, but I'm not sure about whether or not this is the case.)

On the 3DS this works fine because for a dualcore 300 MHz system, it's fast enough. But imagine running such an OS on the NDS as well, with the launcher still running etc. This would steal CPU time from the game for all the background tasks, waste some of the RAM (of which you only have 4 megs, as opposed to 128? 256? on the 3DS, plus separate physical RAM for the OS alone), and requiring two context switches *on a 33 MHz bus (at best, main RAM is often slower)* to eg. check whether a button is pressed instead of just reading from an MMIO register. I don't think this would've gone so well, regardless of stability/security issues of not having an MMU.

(Also the SDK source code did definitely leak. But homebrew allows for more control, as the SDK reserves the ARM7 for itself and doesn't allow the developer to touch it.)

added on the 2020-11-26 19:01:28 by porocyon

Quote:

Quote:
What is a hardware task switching feature?

example (386), it's the CPU itself that swaps out the current registers etc. automatically,

This feature is a legacy i386 feature, and most of it doesn't even supported on AMD64. Nor you actually need it to do multitasking on x86. Don't believe me? Believe Linux.

Quote:

instead of having to do it in software by saving and loading every instruction separately. (...) Yes, Ninty SDK code has a multithreading feature (on both ARMs actually, iirc), I've seen it in disassembly. It's basically just setjmp/longjmp, and those functions are available practically anywhere.

Yes, because that's really all you need for timeshare multitasking. Save a processor state - usually on a periodic interrupt or exception to be pre-emptive, if not, you that's what you call cooperative multitasking - and switch it over to another. If the CPU does part of it "by hardware", or you have to save/restore registers from software it's an implementation detail, and barely even matters. Not even for the actual performance of the task/context switch, because what is slow there are the actual memory load/store operations and cache flushes involved (if any, differs from arch to arch), not the relatively few instructions you need to execute when you do it "from software". In fact, doing some/most of it from software is often beneficial, because you can better fine tune the storing/restoring the registers, for example registers which were not used/wasn't modified since the last context switch. This is what Linux does on PowerPC for example, to avoid storing/restoring all of the huge register window that CPU arch has on every context switch. (Maybe on other archs too?)

Quote:

On the 3DS this works fine because for a dualcore 300 MHz system, it's fast enough.

Do you realize the Amiga was doing pre-emptive multitasking just fine on a 7Mhz 68000, with 512K RAM? (And lets not even mention some 8 bit systems capable of multitasking.)

Now, of course I realize, and I agree that as soon as you want to do something performance critical like demos or games, you usually want to switch off/avoid multitasking to provide more speed for your actual time critical code. Which is what most people did on the Amiga 500 at least, take over the OS. But for example a bunch of '060/C2P Amiga demos leave multitasking on. And that's still only ~50Mhz... Also, I imagine most people didn't really use the DS' multitasking (or better, multithreading) features. Again the details here are a bit cloudy, but I think we actually did use it for some kind of networking/multiplayer stuff.

So to me, the whole argument that the DS hardware is not enough for multitasking (multithreading), because it's somehow a magical hardware feature, or needs separate address spaces, or you need a shitton of memory or CPU power for it, just entirely bogus. That was the part I was arguing about.

Quote:

(Also the SDK source code did definitely leak. But homebrew allows for more control, as the SDK reserves the ARM7 for itself and doesn't allow the developer to touch it.)

Yeah, fair. Would have been surprised if it didn't leak, actually. :)

added on the 2020-11-26 19:55:40 by Charlie

All new Game & Watch? Arm7, blit engine, tons of on chip peripherals? Screen, sound, buttons, small amounts of everything else.

hackaday G&W

added on the 2020-12-06 13:34:42 by kuiash

RP2040, like in the rpi pico, has two cores, each with two interpolation engines that seem very well suited to building rasterizers and other fun goodies. See section 2.3.1 in the datasheet (specifically 2.3.1.6 for the interpolators).

added on the 2021-01-22 10:32:42 by ferris

...not to mention the PIO in chapter 3!

added on the 2021-01-22 11:13:52 by ferris

yeah that one looks very nice, you could probably build some sort of video chip with those PIO thingies without having to sacrifice much CPU time

added on the 2021-01-22 19:38:07 by porocyon

Yes and no - VGA output seems to be pretty easy, and someone was mad enough to try DVI output already and that one takes 60% of one core for a 640x480 5:6:5 image :)

added on the 2021-01-22 20:06:08 by kb_

... and that's at almost 2x overclocking!

But yes, it's really crazy that you can push out TMDS from a Cortex-M0+ class MCU *at all*.

added on the 2021-01-24 00:25:28 by KeyJ

There's even affine texture-mapped span sample code :)

added on the 2021-01-24 18:13:55 by raer

First RP2040 bare-metal demo in 3-2-1

added on the 2021-01-24 18:22:03 by raer

Maybe something like this: https://shop.pimoroni.com/products/pimoroni-pico-vga-demo-base would make a good base for a RP2040 demo. VGA and sound out, that should be enough!

added on the 2021-01-29 00:18:22 by Sdw

I procrastinated by reading about RP2040, good stuff. Waiting to get my hands on one.

added on the 2021-01-29 09:23:08 by pestis

pouët.net

Fast/modern platforms that allow "bare-metal" style coding?

login