Linux on a Commodore 64
github.comOnno Kortman has taken semu, a minimal RISC-V emulator, and cross-compiled it with llvm-mos, an LLVM port to the MOS 6502 processor, in order to run Linux on the Commodore 64. Kortman writes: "The screenshots took VICE a couple hours in 'warp mode' to generate. So, as is, a real C64 should be able to boot Linux within a week or so."
The 6502 is a notoriously poor target for C compilation, especially C that hasn't been written with the 6502's limitations in mind. I'll bet if you wrote a RISC-V emulator in native 6502 you could get Linux booting on a real machine in a day instead of a week. Think about how many lives that'd save!
https://www.folklore.org/StoryView.py?story=Saving_Lives.txt
Says you. llvm-mos generates surprisingly efficient 6502 code given its age and maturity. Don't take my word for it, try some experiments with it on godbolt.
Sure. Here's a couple of functions to iterate over an array of "Ball" objects, as you might do in a Breakout-style game that has a multi-ball powerup. I didn't do anything to make it particularly 6502-amenable; it's how I'd write it for a modern machine, probably. Even compiled with -O2: as expected, the code is slow and enormous - a single addition compiles to something like 30 instructions.
Seems like your problem is more with the venerable 6502 itself rather than the compiler. Most of that assembly code is spent calculating the offsets inside the Ball struct, which must be done at 16 bits of resolution in every case. The compiler's using the indirect indexed (zero page address with Y offset) 6502 addressing mode to get at all the fields in your struct. It has placed all the variables in zero page, so no instruction is more than two bytes long; additionally, the code in question is entirely linear, with no JSRs or other subroutines. Note in particular how it efficiently uses DEY/INY pairs of one byte instructions to get at low and high bytes of 16-bit memory. Hand-written assembly might be speedier, but not by much and still deal with all the corner cases that your generated code does. "While writing Apple BASIC for a 6502 microprocessor I repeatedly encountered a variant of Murphy's Law. Briefly stated, any routine operating on 16 bit data will require at least twice the code that it should." -Steve Wozniak
I think that was partially his point - on 6502, typical-looking C code will be horrifically inefficient, at least when compared to other architectures more suitable for C.
For 6502, to get the optimum assembly you'd have to structure your data in structure-of-arrays instead of arrays-of-structures and use indices instead of pointers as much as possible (at least when amount of Ball objects would be < 256).
Yes, exactly. Were I hand-writing the assembly for the 6502, I'd make all sorts of decisions that the C code doesn't - and that a compiler can't - to make it more efficient.
Instead of passing in a pointer to two separate functions, I'd write a single UpdateBalls procedure that operated on global data. This data is going to be core to my game logic and physics, so I'd put it all on the ZP. As you suggested, "structure-of-arrays". I'd choose a fixed number of balls so I don't need an argument; maybe I'd set my loop to iterate backwards so I get a free zero check with the decrement, maybe I'd unroll the loop ("dead" balls can be placed off-screen with a dx/dy of 0). I'd probably decide that I don't need 16-bit precision for the deltas (how fast could the balls move, really?), and a 16-8 addition is going to be quicker than a 16-16 one.
The compiler isn't going to make these optimizations; that's not a slight against the compiler. In fact, I just checked - the output [0] when I write my C code this way is pretty close to what I'd hand-write. It's roughly a third the number of instructions and - I'm not going to cycle count, so this is a stab in the dark - would take maybe an order of magnitude fewer cycles to run. semu wasn't written with performance on the 6502 in mind, it's not going to have taken considerations like this, so it's going to inevitably be slow when compiled.
I'd actually like the llvm-mos to do an automated AoS to SoA analysis and rewrite, but haven't gotten around to it yet. There aren't any intrinsic theoretical obstacles I'm aware of though; it's just difficult code to write.
Now that this has come up again as the stock reason "you can't do C well on the 6502", replacing the stack, the zero page, and the register set, I'm probably going to reprioritize it and put the register allocator on pause.
I found it interesting it didnt utilize the X reg. Maybe instead of updating Y, X could have been used. Then I scrutinized the Y reg use:
On line 14, it uses Y, then decrements it to 0, uses it, increments it, uses decrements, uses it, then increment again.. why not perform the indirect load on lines 18 and 26 without the Y index and eliminate lines 16, 21, and 25?
here's my pseudocode:
rc2 <= base of struct
rc4 = rc2 + 4 // addr of dx
rc5 = rc5 + 0 // addr of x
rc6 = *(&rc2+4)
rc4 = *(&rc4+1) // get low byte
rc5 = rc6 + *(&rc2) // add high byte
rc4 = rc4 + *(&rc2+1) // add low byte
rc2 = rc5 // store high result
*(&rc2+1) = rc4 // store low result
I believe it could have done more to do the work in place, but my batt is about to die :(
I've yet to see the 6502 C compiler that can beat good assembly code. I appreciate the convenience and llvm-mos has certainly improved greatly, but if you want speed on a 6502, there's no substitute.
I've yet to see a 6502 C compiler get within sight of good assembly code. C makes many assumptions that are slow to implement on a 6502, and are notoriously hard to optimize.
The problem likely isn't what the compiler produces given the semantics of a piece of code, but rather what you'd do with data structures and memory layout if you'd target the 6502 directly.
Oh wow! This bootstrapping method reminds me of yet another Linux-on-an-8-bit-micro project (https://dmitry.gr/?r=05.Projects&proj=07.%20Linux%20on%208bi...), which used an 8-bit AVR with an ARMv5 emulator. But, this takes the cake in terms of geek coolness.
That's exactly what it reminded me of too; there's been plenty of HN discussion about it: https://news.ycombinator.com/item?id=19762928
I'd be curious to see this running on real hardware. I hope someone's able to make it work.
It might be more interesting than watching paint dry, just via risk that an old C64 will let the smoke out.
Why would it let smoke out? I doubt a C64 has any power management. So whether it idles or boots Linux via a couple of emulation layers, the thermal load will be exactly the same.
I know there can be issues with thermal saturation on a heatsink design-- it was expected not just to generate N watts of heat, but to only do so for M hours. Can you expect to leave a real 64 on for days or weeks and it will stay up? I wonder if stores that had them as demo units when it was a relevant product, for example, power-cycled them regularly.
I know from experience if you block the bottom intake vents on a VIC 20, so it can't convect properly, it will eventually start acting funky.
On a related note, I understand the 64's original power bricks are considered timebombs, they might also not appreciate being left on for weeks at a time.
Can you expect to leave a real 64 on for days or weeks and it will stay up?
Yes. These early computers found their way into various embedded control applications too, and I suspect there's quite a few C64s still in operation that way; they would've been replaced long ago if they weren't stable. An article occasionally appears when someone discovers this:
A fairly simple diagnostic test for a C64 is to touch the chips - if the machine is in working order the chips will be warm to the touch but not uncomfortably so. If any are uncomfortably warm odds are they've failed.
If you block all air circulation, sure you might eventually end up with problems, but it takes quite a bit.
I just saw a museum exhibition that featured a C64C, running all day, with a dust cover on, on what appeared to be the original PSU. I think it makes sense to be cautious about your own unit, but they’re probably not as vulnerable as people assume.
Definitely the Achilles heel in any C64 setup. Perhaps new PSUs are visually distinct from the old ones, but practically everyone had to replace theirs as they aged.
Now the 64C was released in 1986, four years after the 64 and its faulty power supplies came out. I don't know whether Commodore had decisively fixed the flawed PSUs by that time, but I know for sure that my second PSU lasted for the lifetime of that device too.
Just because it's likely an old and long dormant piece of electronics, nothing to do with linux beyond it having to run the machine for multiple days 24x7. My understanding is they don't come out of deep dusty storage in ready for service condition. Leaky caps.
C64 caps are generally fine and do not need replacing. The most common things to spontaneously go bad of ”old age” on a 64 are probably RAM chips and the PLA, and of course the power supply is a time bomb.
Very nice, but my first thought was "surely this will not fit in 64k of ram!". And it doesn't. It requires a 16MB REU!
To explain for the uninitiated how rare this bit of hardware is. The REU available for the c64 back in the day were 256kB and 512kB. These are most commonly built replicas as there are schematics available for them. Sometime in the late 90s there was also an "expansion" for c64 that contained a completely new CPU (superCPU - 65816) that was code compatible with the original and I believe this device could accommodate up to 16mb.
Later reimplementations based purely on fpga popped up including a REU with 16mb. The original SuperCPU schematic was lost to time. Allegedly fpga based expansions are available to buy for few hundred EUR now, but I don't know anyone that attempted to buy one or has one.
So, although it is a neat trick(still a cool tech achievement) , saying it runs on c64 is akin to saying I got doom3 running on a 386, but my 386 is actually a pci card in a modern pc...
If I can't pull my c64 with hardware available back in the day (or hardware one could realistically built back in the day) I'm not sure saying "runs on c64" is correct.
Coming back to the subject of a REU, why has no one published a schematic for one yet? There are cheap SRAM chips floating on ebay. It should be trivial to put one together. Unfortunately it isn't, because the original (Super Cpu) had two components we need a beefy fpga to emulate. The supercpu itself and it's dma controller which was a custom asic I believe.
Perhaps as cheap(ER) fpgas or uC with fpga-like functionality become available someone will create an open source "super cpu". As of yet, everyone I ever heard using these, uses emulation. Nothing wrong with that, but I get the most out of my "retro hobby" by running original hardware. Emulation is very useful for dev, but for general use it's a bit "meh" for me.
If I can't pull my c64 with hardware available back in the day (or hardware one could realistically built back in the day) I'm not sure saying "runs on c64" is correct.
A 16MB REU could absolutely have been built in the 80s. It would have been absolutely astronomically expensive, but there’s no technical reason it could not be done. You seem to be confusing the SuperCPU with a plain REU expansion — the REU is just a bunch of RAM and an ASIC that talks to the 64 and allows it to store or retrieve banks of RAM (because obviously a 6502 cannot address more than 64K so instead you have to tell it to swap out system RAM) — there is no CPU on it.
The SuperCPU (65816) can indeed address up to 16MB directly and that is a different thing. The project in the OP runs on a stock C64 on the stock C64 CPU, it just needs a mountain of RAM that would have cost the equivalent of a house back then ;)
Is there any reason for why the FPGA couldn't just be replaced with for example an on-device generic ARM SoC like an RPi pico/nano doing the same in software?
Or is that how far your purism in anti-emulation goes? (;
> but my 386 is actually a pci card in a modern pc...
Now I want one of those. I guess these days you could easily fit a 386, 486, Pentium and who knows what other SoCs on a single PCIe card, passively cooled…
This is exactly how some SPARC workstations offered x86 compatibility back in the day IIRC.
And how Commodore Amiga 2000 provided PC compatible extension card, funnily enough
This is marginally "on a commodore 64" if it requires a 16MB memory addition
Not Linux related, but I've been trying trying recent (at least to me..) C-64 accessories:
I've tried the "Kung Fu Flash"- it's a software defined cartridge that is cheap- just a single STM32 and can do pretty much everything. I bought this because I'm trying to duplicate the developer experience I see on "8-bit show and tell"- it can emulate the "super snapshot", but not the REU. It's a really nice way to quickly try a lot of C-64 software and games.
https://8bithardware.wixsite.com/website/post/kung-fu-flash
https://github.com/KimJorgensen/KungFuFlash
I also have an SD2IEC: what I've learned is that it would have been useful to get a variant with an extra DIN socket. It's nice but I was never a fan of C-64's DOS and this reinforces it. To mount a D64 disk image you have to: OPEN1,8,15,"CD:MYIMAGE.D64":CLOSE1... yuck..
JiffyDOS (replacement ROM for the C-64) improves this (it's faster and includes a permanent DOS wedge), I bought one- it's on the way. I'm curious to try it with the real 1541 drive.
What got me started on this recently is the "Penultimate +2" cartridge for the VIC-20:
https://www.youtube.com/watch?v=eNGyneXHKJQ
In this case, I basically bought a VIC-20 just to try out the cartridge.
A demonstration of Turing equivalency. Any Turing complete computer can do what any other Turing complete computer can do if you don’t care about time.
Time and memory.
Yeah, so technically a Turing machine has infinite memory.. so no real-world computer is fully Turing complete.
A Turing machine has infinite tape. Which can be implemented in terms of RAM but can just as well be implemented in terms of sequentially addressable IO, and pretty much every real-world computer has that, so it's a meaningless technicality as while there are practical limits on our ability to feed it more input, those are not conceptual limits of the machine.
How do you run Linux in lambda calculus?
Slowly. Very slowly.
This begs the question. What is the oldest hardware that can boot modern Linux but still be used as a daily driver?
The main "daily driver" constraint is probably the crypto required to access most modern websites. You can make the leanest and meanest system you can to run great on the slowest machine but the internet is nowadays an unforgiving place.
Surely video encoding / decoding is more compute intensive than the crypto. Taking video calls is a reasonable part of being daily driver capable.
Video delivered in real time over an encrypted connection, this is a double whammy.
You need to both decrypt and decode all at above the framerate of the video, doubt that will be doable on any older hardware, unless ssid hardware has dedicated components for those functions.
If I had to implement this on an old CPU I would likely be passing network, video and encryption off to co-processors and the older chip will effectively only be running control information.
But that that point why not just use a modern low power chip.
Assuming "daily driver" requires a modern web browser running modern web productivity apps I'd put the minimum at a Core 2 Duo with 4 GB memory. It wouldn't exactly be snappy but with a bit of patience you shouldn't be limited by the hardware. Throw in a GPU with hardware video decoding and you might even be able to watch YouTube in above-potato quality.
I've got a core 2 duo with 2GB RAM that I used for around 6 hours yesterday to write an application.
Only slightly noticeable waiting times when I accessed some sites, but it worked and the application works too.
Which distro, though?
> Which distro, though?
Linux Mint Vanessa, running the Mate DE.
The program is a C program with a single Makefile. My workflow was (and still is, even on my desktop) using a Vim with three vertical splits:
1. A LHS split which is a terminal to run make and execute the program for testing
2. A RHS split with the program source code (single file program).
3. A middle split with the test input file and test output file (in horizontal splits).
Although it is just a single file, on my other C projects I've used the same laptop, with the same 3-vert-split Vim, with multiple tabs, so up to maybe 16-20 source files open at a time for a single project.
Building C projects is very fast, even on the Core 2 Duo/2GB RAM setup. Running a similar workflow but in VSCode on my desktop is less snappier than Vim on the laptop.
I haven't tried doing a Go project on that laptop yet with VSCode, but I am tempted to see what happens :-)
I tried Fedora (Gnome) on a 2GB machine and it was struggling a bit (mind you, there were several services, such as PackageKit, taking too many resources).
But now I can tell why you're experience was good. Mate is a phenomenal desktop environment certainly, so 2GB is probably more than enough for a daily driver.
For the sake of argument, let's say a computer where you can install Debian 12 and run a WM and a browser, and it's not excruciatingly slow.
I think you'd want to aim somewhere around the Pentium 4 / Athlon XP era. The docs say it doesn't support the original Pentium, so I suppose you could go back as far as the Pentium II if you really want to suffer.
3 years ago I tried to run any modern distro on Pentium 3 (without compiling anything by myself). It appeared I wasn't able due to "invalid opcode" error inside systemd. I switched to Devuan (Debian fork without systemd) and it boot and was as usable as first raspberry pie.
Damn and I still remember when Pentium 4 is the epitome of speed and I have to make do with Pentium 3 and even 2.
Pentium 4 was the epitome of heat. The Athlon XP was generally faster and cheaper during that era.
Probably something with a core 2 duo. Though I’m sure a 230mhz cpu will “run” Linux with desktop just fine.
Definitely not modern Linux, when I got Slackware 2.0 in 1995's Summer, I owned a Pentium 75Mhz, with 8MB RAM, Trident card capable of 1024x768 (X could only handle 800x600 on it), IDE CD-ROM and HDD.
The bar for "daily driver" is different for different people's requirements. Would streaming Netflix be included? Running simple games?
Is this different from Lunix?
LUnix is an actual C64-native operating system that you can write apps for and run on C64 hardware directly. This is a RISC-V emulator running on C64 emulating a Linux boot up.
thanks. had hit 'post' too soon, but diving in, i see the emulator aspect of this. looks like it can't run in just the standard 64k though, and needs (much?) more memory?
this is linux
This isn't Linux running on C64 per se. This is C64 emulating a RISC-V environment on which Linux runs.
Still impressive of course, but semantics matter :)
Semu-ntics to be precise ;-)
I wonder, if instead if requiring REU, it could work by using a few dozen floppy discs as RAM, prompting the user to swap discs as needed.
I’d be interested in watching a time-lapse video of that on real hardware, if someone has a couple of months/years to spare. ;)
Once you add more RAM to a Commodore 64, is it still a Commodore 64?
Yes, if you use an REU, which is a correctly contemporary memory upgrade for the C64.
Expanding from Johnwbyrd's nearby:
-- Commodore sold a Ram Expansion Unit named "1764" to bring the C64 to 256kb of RAM;
-- it was possible to use the REU for the C128 named "1750" to bring the C64 to 512kb of RAM;
-- and it is possible to expand on that to have a 2MB REU for the C64 - see https://www.neperos.com/article/rlut8ce90fbb7701
You can have two megabytes on the C64, pretty "legally".
I can imagine "someone" back in the day could take PC SIMM modules and cobble together some monstrosity that would allow one to fill 16MB of RAM on a c64 using simple bank switching. However, the main "innovation" of these original and later REUs wasn't the memory amount, but the chip that implemented DMA. That DMA chip could be used to copy ram contents very quickly with minimal CPU involvement. This is why c64 equipped with the REU has much better graphics capabilities (used for background animation etc).
As far as I know, we still don't have an open source equivalent of that dma chip.
No, it's a Commodore 16384.
(The max addressable memory with a C64 REU is 16 Megabytes.)
I recently came into possession of a fully-functioning TRS-80 Model 4, and I fantasize regularly about putting some vaguely Unix-esque thing on it. The fantasy continues.
You should be able to boost that up to 128k. Once there you have a solid chance of being able to run Fuzix on it.
Start there and you’ll find a rabbit hole of reasonable depth.
Jesus, the documentation for Fuzix is... well, there isn't any is there! This will be an adventure.
Minimal FORTHs (https://en.wikipedia.org/wiki/Forth_(programming_language)) can run on an unexpanded VIC-20 (5K) or even early TRS-80 Model 1 (4K) -- with room and functionality to spare...
On the VIC-20, you even get a few colors!
That 16MiB memory requirement makes this rather disappointing, given that you can run Linux on machines with only 4 MiB of RAM: https://tldp.org/HOWTO/4mb-Laptops.html#toc3
That’s written for linux 2.2.x at best, SysV init, old versions of bash (or maybe even ash) and whatnot.
I don’t think you’ll manage with a recent linux kernel. Heck, even 2.6-era stuff won’t fit easily.
Now someone needs to do the same with a ZX spectrum ;-)
As others have mentioned, a 6502 is very poorly suited to C-style code, but a Z80 should be somewhat better with that.
How long does a kernel recompile take?
But will it impact the sales of GEOS?
Gah, this reminds me that I need to finish my "Risc-v on a Gameboy" lolproject.
RISC-V is inevitable.
What's the BogoMips?
The loading screen reports 130 BogoMIPS, but remember that it's emulating the timer as well, so the number is meaningless.
I assume that's the "warp speed" BogoMIPS, on real hardware the number would be around 1.
AFAIK, C64 software cannot detect the difference between normal and warp mode in normal circumstances. Warp mode speeds up the emulation, but all of the internal timings are still accurate to the C64 in a relative sense.
I have to wonder how the Bogomips are calculated on a machine with no RTC.
Run Neofetch!
So, I like Linux and I love my C64, but.. Linux are for computers too primitive to come with their own kernel and.... the C64 comes with a kernel and shell right from the factory :P
The Commodore 64 doesn't come with a mere kernel, it comes with a mighty KERNAL
https://en.wikipedia.org/wiki/KERNAL
> The KERNAL was known as kernel[6] inside of Commodore since the PET days, but in 1980 Robert Russell misspelled the word as kernal in his notebooks. When Commodore technical writers Neil Harris and Andy Finkel collected Russell's notes and used them as the basis for the VIC-20 programmer's manual, the misspelling followed them along and stuck.[7]
> According to early Commodore myth, and reported by writer/programmer Jim Butterfield among others, the "word" KERNAL is an acronym (or, more likely, a backronym) standing for Keyboard Entry Read, Network, And Link, which in fact makes good sense considering its role. Berkeley Softworks later used it when naming the core routines of its GUI OS for 8-bit home computers: the GEOS KERNAL.
> Jim Butterfield
I had a 6502 machine language book of his as a kid. I figured out in my head what I thought I wanted to do with the various instructions, then wrote out on graph paper the (decimal) number for the op or it’s args, then transcribed the whole affair into memory manually via POKEs. Good times.
Yeah, I should have figured that HN would probably be a place where it'd be okay to write kernal, but then again, someone might point out the spelling difference between the penguin one and the chicken lips one :P
The mighty KERNAL is how us mere mortals can JSR FFD2. (I think that's right)
> JSR [$]FFD2. (I think that's right)
Yes, that is the "print char in A and inc screen pos" in the lookup table for the actual subroutine.
Yup! Crazy how you can remember the dumbest shit after years.
I wonder what the "Network" part of that referred to? The C64 didn't have much networking capability built in.
Yes, but it’s fun :-)