The Commodore 64 is one of the most beloved computers of all time, with a massive library of games, demos, and productivity software. Its distinctive sound, colorful graphics, and iconic design have left an indelible mark on computing history. On the other hand, the Atari ST is a powerhouse of 16-bit computing, with its crisp graphics, MIDI capabilities, and a loyal fanbase of its own.

So, why bring the C64 to the Atari ST? For starters, it’s about nostalgia. Many grew up with the C64 and fondly remember its games and software. But it’s also about exploration. The C64 Emulator for the Atari ST by Uwe Seimet allows ST users to experience the vast library of C64 software without needing to own a physical C64. It’s a way to preserve the past while celebrating the ingenuity of retro computing.

Short Description of the C64 Emulator for the Atari ST

The C64 Emulator replicates a system consisting of a Commodore 64, a compatible printer, and as many floppy drives as have been configured. The disk drives are assigned the following device numbers: A=8, B=9, and so on. The printer can be accessed as usual under device number 4. The printer emulation is designed for Epson-compatible printers, and various print modes can be activated using secondary addresses ranging from 0 to 10.

At the heart of the emulation is the built-in 6502/6510 emulator, which can execute "all documented instructions" of this processor. On the Atari ST, the emulator achieves a speed equivalent to a 6510 clock frequency of nearly 0.4 MHz. (For comparison, the 6510 in the C64 runs at just under 1 MHz.) This makes the C64 Emulator significantly faster than other emulators available for the ST. Disk operations, in particular, are much faster than on the original C64. Additionally, the RS232 interface can now be used at baud rates that were not feasible on the C64 due to speed limitations.

The emulator’s keyboard layout matches the C64’s as closely as possible. So, forget the labels on the ST keyboard—you’re now working with a C64!


Which Programs Run on the C64 Emulator for the Atari ST?

Programs that do not run or do not run flawlessly are those that:

  1. Perform complex graphics operations

  2. Attempt to use RAM located under ROM

  3. Set up custom timer or IRQ routines

These limitations primarily affect games. However, many other programs—whether written in BASIC or assembly—run perfectly. This makes the emulator a great tool for exploring 65xx programming.

Characters or graphics that are written directly to the C64’s screen memory using POKE commands, or points on the graphics screen, do not normally appear on the ST’s display. This is to avoid the performance overhead of checking every access to the screen memory or bitmap. However, you can toggle the display of these elements using function keys, though this results in a slight speed loss of just over 1%. The current settings, along with other information, can be checked using the HELP key.


Resolution and Compatibility

The emulator runs in any resolution. If started in medium resolution, it automatically switches to low resolution to make full use of the available color capabilities. When exiting the program, the original resolution is restored.

The VERIFY routine is not implemented, as it is rarely needed on the ST. The emulator always returns an "OK" status since no comparison is performed.


Floppy and Printer Emulation

Atari ST C64 Emulator opening screen

The following disk commands have been implemented:

  • S: Delete files

  • R: Rename files

  • C: Copy files

  • T: Set or remove write protection

  • I: Initialize floppy disk

  • U9: Floppy reset

Other commands, such as those for formatting, are ignored or generate an error message that can be retrieved via the command channel. Note that there is only one error channel for all configured drives.

Up to 10 floppy files can be opened simultaneously, though relative files are not yet supported.


Details of the Commodore 64 Emulator for the Atari ST

It is not uncommon for individuals who have spent time tinkering with computers to encounter the frustration of being unable to run a desired program on their machine. This issue is particularly prevalent with professional software, which is often designed for operating systems such as MS-DOS or CP/M. Given that these systems are typically written for processors other than the 68000, running their programs on the Atari ST is generally not feasible. However, this challenge is not insurmountable. Emulators for these systems have been developed, enabling programs written for other processors or operating systems to be executed on the ST. Among these, Macintosh emulators are especially notable. Since both the Macintosh and the ST utilize the 68000 processor, it is possible not only to implement the Macintosh operating system on the ST but also to achieve faster program execution than on the original Mac. This is attributed to the fact that the 68000 in the ST is clocked at a higher frequency than in the Macintosh. Regrettably, this type of emulator is expected to remain an exception, as most popular operating systems are designed for machines that do not use the 68000 family of processors. In such cases, the only viable solution is the software emulation of the corresponding processor. While a resulting loss in speed is unavoidable, acceptable performance levels can still be attained through careful programming—though it should be noted that not all emulators available on the market achieve this equally effectively.

The C64’s operating system has been implemented as faithfully as possible on the ST. However, this does not imply that the ST is now capable of running C64 games, as the architectures of the two computers are fundamentally different. Nevertheless, the development of a functional emulator that replicates most of the C64’s capabilities—except certain specialized features—has been demonstrated to be achievable. This article will focus on the considerations involved in programming such an emulator. Given that this task requires knowledge of assembly programming, it is expected to be of primary interest to assembly programmers. Additionally, a brief description of some features of the C64 emulator will be provided. For more detailed instructions, a reference can be made to the emulator diskette.


Understanding the Processor

Before implementing a specific operating system, the first step is to gather detailed information about the processor being emulated. (This assumes, of course, that you’re already proficient in 68000 programming.) To work with the operating system itself, you first need a solid emulation of the processor. For well-known processors like the 6502 (used in the C64), finding relevant literature isn’t a problem. Key details include how the processor handles flags for different instruction types and any unique characteristics of the processor. For example, the Carry flag doesn’t have the same meaning in all processors, and the 6502 has its quirks in this regard. If you’ve worked with the processor you’re emulating before, it will be a significant advantage during programming.


The Speed Challenge in the making of the C64 Emulator for the Atari ST

When programming the C64 Emulator for the Atari ST, the speed of instruction interpretation is critical. Since the 68000 can’t directly execute 6502 opcodes, each one must be interpreted, much like a BASIC interpreter handles a BASIC program. For every 6502 opcode, a corresponding routine in 68000 assembly must be developed to perform the equivalent actions. This is why software emulators can’t match the speed of the original system. Even though the 68000 runs at 8 MHz compared to the 6510’s 1 MHz, the overhead of interpreting each command results in a speed loss. While the 68000 can execute individual opcodes faster than the 6502, the time spent determining which routine to call for each opcode adds up, making the 6502 faster overall. However, as processor clock speeds continue to increase, it’s only a matter of time before 8-bit processors can be emulated at their original speeds.


Optimizing Instruction Interpretation

The process of interpreting the next opcode repeats for every new 6502 instruction, so optimizing this step is crucial. Saving even one clock cycle here can significantly improve emulation speed. One common approach is to load the opcode into a data register and calculate the address of the corresponding emulation routine. For example:

LOOP:  CLR D0
       MOVE.B (A0)+,D0
       ASL #2,D0
       MOVE.L (A1,D0),A0           ; Algorithm 1
       JSR (A0)
       BRA LOOP

Assuming A0 points to the next opcode in the 6502 address space and A1 points to a table of jump addresses for the emulation routines, this code works but is relatively slow. Replacing the ASL instruction with two ADD instructions and restructuring the routine can improve speed:

       CLR D0
       MOVE.B (A0)+,D0
       ADD D0,D0
       ADD D0,D0
       MOVE.L (A1,D0),A0           ; Algorithm 2
       JMP (A0)

This version avoids time-consuming subroutine calls but requires duplicating the code for each opcode. While this makes the emulator larger, it significantly improves speed.

An even faster approach uses 64 KB of additional memory to eliminate address calculations:

       MOVE.B (A0)+,LBL+2(A1)
LBL:   JMP $0(A1)                  ; Algorithm 3

Here, the opcode is used directly as a displacement in the jump instruction, reducing execution time to just 30 clock cycles—50% faster than Algorithm 2.


Handling Registers and Flags

The next step involves determining where the registers of the processor to be emulated can be “stored.” In this case, the question is relatively straightforward to address. The 6502 is equipped with three registers (the accumulator, X, and Y registers), an 8-bit stack pointer, and a program counter. The 68000, on the other hand, possesses a total of 15 registers, excluding A7 when it is used as a stack pointer. As a result, it is not particularly difficult to map these registers into the data or address registers of the 68000. Specifically, one address register is allocated for the stack pointer, another for the program counter, and three data registers are utilized for the remaining 6502 registers, with only the lower byte being used. Additionally, the processor status register of the 6502 must also be accommodated. It is generally not feasible to directly use the flags of the 68000, as their behavior differs slightly from those of the 6502. For instance, unlike the 6502, the 68000 lacks a decimal flag. However, for most arithmetic operations, the handling of the flags is consistent. It is therefore recommended that the flags be stored in an additional data register and transferred to the Condition Code Register (CCR) register of the 68000 only when necessary. It should also be noted that the processor status register is not modified by every instruction. After the flags have been appropriately set in the CCR during arithmetic operations, they are then transferred back to the reserved data register.

By employing the registers in this manner, some registers remain available for the programmer’s use. These can be utilized to hold data that must remain accessible during emulation, such as a pointer to the 64K address space of the 6502 and to the 64K reserved for the C64 Emulator for the Atari ST, where the emulation routines for the individual 6502 opcodes are located. In principle, it is also possible to store the contents of the 6502 registers in memory. However, memory accesses are relatively time-consuming, making it impractical to achieve satisfactory performance speeds in this way.

When emulating processors with more registers than the 6502, such as the 8080, Z80, or 8086, the allocation strategy must be reconsidered. In such cases, care should be taken to minimize absolute memory accesses by carefully selecting register assignments, as these operations are particularly time-intensive. In the case of the C64 emulator, direct memory accesses were entirely avoided, with the address space of the 6502 processor being accessed exclusively through address registers.


Addressing Modes and Challenges

Emulating certain 6502 addressing modes, like absolute addressing, can be tricky due to differences in how the 6502 and 68000 handle memory access. For example, the 68000 requires 16-bit words to be aligned on even addresses, while the 6502 has no such restriction. This means absolute addresses must be split into two bytes and reassembled, adding overhead. However, clever programming can minimize this cost.

When emulating an 8-bit processor using the 68000, certain addressing modes and instructions can be relatively straightforward to replicate, while others present significant challenges, particularly when the goal is to achieve the fastest possible execution time. As an example, the absolute addressing mode of the 6502 can be considered. At first glance, this might seem unproblematic, but caution is required. As is well known, the 68000 can only access 16-bit words if they are aligned on even memory addresses. Violations of this rule result in an address error, often represented by the infamous "three bombs" error message. In programs written for the 68000, instruction words and absolute addresses are naturally always aligned on even addresses. However, this restriction does not apply to 8-bit processors. Consequently, it is entirely possible for an absolute address following a 6502 jump instruction to be located at an odd address. This makes it impossible to load such addresses into a 68000 register with a single instruction. Instead, these addresses must be split into two individual bytes. Additionally, another complication arises: unlike the 68000, 8-bit processors store absolute addresses with the low byte first in memory. Therefore, before such an address can be used, the two address bytes must be rearranged into the correct order. The following routine demonstrates how an absolute address can be loaded from memory into a data register:

       MOVE.B (A0)+,D0
       ASL #8,D0              ; Algorithm 4
       MOVE.B (A0)+,D0
       ROR #8,D0

In this routine, the two bytes for the absolute address are individually loaded from memory and then rearranged into the correct order using shift and rotate operations. While this approach is straightforward, it is important to note that shift and rotate operations are relatively time-consuming. Although Algorithm 4 is easy to understand, it requires 58 clock cycles, which is particularly disadvantageous given that absolute addressing is used frequently. However, if an address register is sacrificed, a significantly faster alternative can be implemented:

       MOVE.B (A0)+,-(A2)
       MOVE.B (A0)+,-(A2)     ; Algorithm 5
       MOVE (A2)+,D0

In Algorithm 5, A2 points to an arbitrary even address in memory where the two bytes are combined into a word and then transferred to D0. While this unconventional programming approach may seem cumbersome, it requires only 32 cycles, as the need for shift and rotate operations is eliminated. Unfortunately, the stack pointer (A7) cannot be used in the same way as A2 in this example, as the stack pointer is always incremented or decremented by a word (two bytes), even during byte operations, making it unsuitable for this purpose.

By employing such techniques, the challenges of emulating 8-bit processors on the 68000 can be addressed, though careful consideration must be given to optimizing performance while maintaining compatibility with the original system's behavior.

 


Implementing the Operating System

Once the processor emulation is complete, the next step is implementing the operating system. The C64’s operating system is less modular than systems like CP/M or MS-DOS, making emulation more challenging. Functions like screen output must be reimplemented to work on the ST, as the hardware differs significantly.

Once the actual emulation of the processor has been completed—though it remains uncertain how many errors may still be present—attention must then be turned to the implementation of the operating system. After all, the operating system is the first and most critical program that must be made functional. If the operating system runs flawlessly under the emulator, it can generally be assumed that few errors remain in the program, as all instructions of the emulated processor are likely to have been executed at some point.

Not every operating system can be equally well adapted to another computer. The operating system of the C64 leaves much to be desired in this regard. In the best-case scenario, there is a function number or jump vector for every important task the system is expected to perform, particularly for handling input and output. While the C64 does have a list of such jump vectors, it is unfortunately not as comprehensive as one might hope. In contrast, systems like CP/M and MS-DOS are far better equipped in this respect, offering significantly more functions than the C64, which simplifies their emulation. This is largely because these systems were designed to run on a variety of different computers, unlike the C64. Regardless of the system being emulated, all tasks invoked through such vectors or function numbers must be monitored by the emulator. For example, the C64 has a jump vector, BSOUT, for character output. Since the screen display of characters is implemented differently on the ST compared to the C64, modifications must be made at this point. The output to the screen cannot be handled as it would be on the C64, as this would result in no visible changes on the ST’s display. Instead, a custom output routine must be programmed. The same applies to many other functions of the C64’s operating system. Care must also be taken to ensure that the registers containing critical data for the emulator’s operation are not altered by these routines or are only modified in ways expected by the C64’s operating system. For instance, the LOAD routine is expected to return the end address of the loaded program in the index registers.

Speed Considerations of the C64 Emulator for the Atari ST

As previously mentioned, the emulation must be designed through careful programming to ensure it runs as quickly as possible. In addition to optimizing the emulator’s code, there are other ways to increase the speed of programs on the ST, particularly when they are not running in the GEM environment. In such cases, it is possible to redirect vectors, such as GEM’s evnt_timer vector, to an RTS instruction, preventing the associated routines from being executed during interrupts. This has the side effect of stopping the clock in the control panel while the emulator is active. Since the clock is not needed for the Commodore 64 emulator, this is not a significant drawback, and the trade-off is increased speed. Additionally, it may be beneficial to disable the mouse or, better yet, take control of all actions related to the keyboard processor.

Finally, the question of how much compatibility with the C64 can realistically be achieved on another computer should be addressed. When considering what made the C64 so successful, its vast library of games stands out as a primary factor. Games often take full advantage of the Commodore 64’s unique capabilities, such as sprites, raster interrupts, and timers. Since the ST’s hardware does not support sprites, and these graphical elements cannot be replicated in software without significant performance penalties, the emulation of C64 games is particularly challenging. Furthermore, executing the C64’s interrupt routines through the emulator would result in a noticeable slowdown, as interrupts are highly time-sensitive and occur frequently.

With these limitations outlined, it is important to note that a significant portion of programs that do not rely on these specialized features can be successfully run on the C64 emulator for the Atari ST. High-resolution graphics are also possible to a certain extent. However, the bitmap for graphics must reside in the memory range $E000-$FFFF, beneath the operating system, so that the emulator can detect when the graphics memory is accessed. Since the screen memory and bitmap are relatively time-consuming, disabling this output control can further increase the emulator’s speed.

In addition to emulating the C64, the emulator also includes printer and floppy drive emulation, closely matching the functionality of the 1541 drive. The program runs in both low and high resolutions, allowing every ST owner to load their own C64 experience from disk.


Limitations and Compatibility of the Atari ST C64 Emulator

The C64’s success was largely due to its games, which often relied on hardware features like sprites and raster interrupts. Since the ST lacks these capabilities, emulating C64 games is difficult. However, many other programs, especially those that don’t rely on these features, run well on the emulator. High-resolution graphics are also possible, though they require careful handling to avoid performance penalties.

The C64 Emulator for the Atari ST is a remarkable achievement, bringing the best of the C64 to the ST. While it has limitations, it opens up a vast library of software for ST users and serves as a testament to the ingenuity of retro computing enthusiasts. Whether you’re reliving nostalgic memories or exploring new possibilities, this emulator is a must-try.

For a closer look at the emulator in action, check out this video:
The C64 Emulator for the Atari ST - Watch Now!

Download the freeware Atari ST C64 Emulator here.

Happy retro computing! 🎮

by Uwe Seimet