In 1985 Beyond released Mike Singleton's excellent Lords of Midnight to the unsuspecting Commodore 64 gamer. Although, the game had been released a year before on the Speccy, so I guess those not living under a rock at the time would have read magazines that reported on the conversion to other platforms. The PAL (EU) and NTSC (US) disk versions featured a fast loader, and the game loads almost instantly. Well, compared to standard Commodore 1541 transfer rates. It also features an elaborate copy protection scheme. On the PAL disk, that is. You see, Beyond apparently licensed Mindscape in the US to market the game there. Somehow Mindscape didn't get the memo that copy protection would be a must-have. So they opted out. But I digress. Back to the PAL version. Let's shed light on the copy protection (...attempt, I guess, as I played a pirated version back then). I will mark protection efforts with a 'P' and a number. For the impatient, I provide the summary already in the table below. You can also download my disassembly of everything with my own comments.
P# | Protection | Description |
1 | Obscure the XOR line | A machine code instruction (EOR) in the preloader is obscured by shifting and rotating bits in the bytes that make up the instruction |
2 | XOR encrypt the C64 loader | The bytes of the actual loader code have been XORed with the value $66. This value is tucked away in an area of crap bytes loaded in the pre-loader. |
3 | Add two sectors with checksum errors | Two dummy sectors have been added that have wrong checksums |
4 | Use a 40 track disk | Important code is stored on Track 36, normal copiers can't handle those. |
5 | Sectors have custom headers | Sectors on Track 36 have custom GCR headers that are looked for by loader code. |
6 | 1541 loader code is XOR encrypted | The two dummy sectors with the wrong checksum both give rise to error code $05, and that value is needed to XOR decrypt the code loaded from Track 36. |
7* | Data stored inverted | Game data is stored with bits inverted. This could also be just because of fast loader speed requirements. |
8* | Game start location needs to come from 1541 | The 1541 loader sends over the start offset of the game, it is not stored in the C64 loader somewhere. This could also just be part of a good loader. |
The summary of the protection schemes used by Beyond in 1985. Download or read the source code!
Initial C64 load
The game is loaded by typing in 'LOAD"*",8,1'. This means that the first file will be loaded to the memory location stored as first 16-bit value in the file loaded, and then run from the second 16-bit value (offset of program start).
The program is loaded to $0308 and the start offset is $0334. This is where the first trick up their sleeves happens.
1. Unobscure the XOR line (P1)
The initial loader starts by disabling the screen so there are no badlines and loading can speed up. It also for some reason stores the state of the current sprite activation for later when errors occur. Very userfriendly. It then disables all sprites (of course, since they would mess with raster time). But then a line that follows somewhere after: 'EOR $0316" is not visible since it has been obscured. The loader makes it appear, by shifting left, and rotating left and rotating right the three bytes that make up that line. After that, the program can continue.
2. DeXOR the actual C64 side loader (P2)
Since the EOR line has been revealed, what happens next is that the whole loaded data of 768 bytes, from $0365 onwards is XORed with the value at $0316 ($66, the shortened sign of the devil). So each byte is EORed with $66, revealing the true code. The program continues from the now valid code.
3. Load the loader in the 1541 diskdrive
What happens next is that the C64 side instructs the diskdrive to load sector 0 in track 14, 256 bytes of internal 1541 loader code, down to the last byte. Really, this 1541 loader part is exactly 256 bytes. Awesome. It is loaded into disk drive buffer 2 ($0500). The C64 then jumps to its loader loop at $0489, waiting for the 1541 to start communicating.
1541 Initial loader
While the c64 is patiently waiting for the CLOCK line in the serial bus to become active (a signal from the 1541 to start reading incoming data), the 1541 first has other stuff to prepare. The initial loader code that was just loaded to buffer 2 is executed.
4. Read two sectors with checksum errors (P3)
Track 14 holds two dummy sectors that have faulty checksums. Sectors 3 and 14. Upon reading those, the 1541 DOS will return an error code, $05, which means a data block has a wrong checksum. The Beyond code has DOS read those two sectors, get the error codes and ANDs them. If both error codes were $05 then the final value is $05. This value is needed later. See shot from DirMaster below of one of those dummy sectors.
5. Get two code pages from Track 36 (P4)
Most C64 disks had a maximum of 35 tracks. As a result most copiers were only capable of copying data up to track 35. But tracks could be extended to 40, or even half tracks, depending on speed of writing to the disk. LOM has 40 tracks and hides away crucial 1541 loader code on track 36. The initial loader sets the 1541 head to Track 36, sector 11. It then disables interrupts and takes control over the disk hardware via Port B (Disk Controller) $1c00.
6. Generate custom sector headers and look for them (P5)
To load the two sectors with code on Track 36, custom sector headers are generated in GCR format and the disk is then directed to find SYNC signals. When found the following sector header (in GCR format) is compared to the custom generated one. Of note, the sector header to look for is, in normal format (DISK ID, DISK ID2, Track, Sector, Checksum), 'M T 36 11 Checksum'. The corresponding GCR bytes for the two sectors headers are:
Track 36, sector 11: 52 67 65 6e 4e 7b 9d d5 29 4a
Track 36, sector 1 : 52 66 d5 2e 4e 7b 9d d5 29 4a
The checksums for the two tracks are generated by XORing 0 with the following sequence, each time XORing the result with the next one: ID1, ID2, TRACK, SECTOR.
Once the sectors have been identified, loading of the GCR bytes starts and are then converted to normal bytes. Sector 11 is stored at buffer 0 ($0300) and sector 1 is stored at buffer 1 ($0400). Note that the order (in terms of sectors) is also switched, first 11 and then 1, probably another attempt at confusing people.
7. Decode the loader code at $0300-$04ff by XORing (P6)
XOR encryption was popular (and still is by the way on most major platforms). Beyond was also fond of it. The gotten sectors that were stored in buffer 0 and 1 were encrypted. Remember those two sectors with checksum errors (see bulletpoint 4)? How the value of $05, error code for BAD CHECKSUM ERROR, was stored? Well, that is now used to decrypt the buffers. So each byte is XORed with $05 to get to the actual code. It is still awesome that the code to load the actual loader took exactly 256 bytes, one sector. Hats off to you, sir! So now the actual loader code is run from $0300.
1541 Loader
So now the actual loading can begin, and the C64 has been patiently waiting. At 1541 $0300 there is a jump to $0389. On the C64 side there was a jump to $0489, if you remember. The programmers used mirroring what is happening on 1541 and C64 side in terms of code, for timing reasons, and clarity. I like that. Yes, very agreeable and structured.
The 1541 starts by activating the CLOCK line on the serial bus, basically signalling: "Get ready, I am soon starting the process!". But before this process can begin sector 4 at track 14 needs to be loaded to buffer 4 ($0700), this holds the CHUNKS information. Data is send over to the c64 in 3 chunks of varying size. The 1541 sends information to the C64 per CHUNK: Where to store the incoming chunk, the size of the incoming chunk, and the checksum of the incoming chunk. This information is loaded from that sector in buffer 4. The byte in $0700 hold the number of chunks that are going to be send. Sending this byte over to the C64 is starting the whole process.
8. CHUNK sectors are stored in a standard interleaved manner
Before going on about the next steps taken in the process, I just want to mention that the CHUNKS that are send over are stored in a standard DOS interleaved manner of 10 on the disk, going down tracks when the end of tracks are reached.
Beyond copied the 1541 DOS code to their own routines to do this. I am unsure to call this copy protection though.
9. Data is stored inverted for easy sending over (P7?)
However, the way the data is stored in the sectors does feel a bit like helping both efficiency as well as copy protection. The bytes (after GCR conversion) have their bits inverted, you know, that which you get if you EOR with $ff. Basically, NOTing every bit. If anyone would read the sectors with a copier, you wouldn't make sense of it, unless you might think of them being inverted. Yet, we also have to consider that we are talking about a fast loader and the way the serial bus is organized in the C64 leaves a lot to be desired. Specifically, Commodore forgot to invert any incoming signals from the disk drive on the CLOCK and DATA line. This causes every bit that comes in from those lines to be the opposite of what the 1541 thinks it is sending. As a programmer I need to know what is going out and what is coming in. I am only interested in bit states. After all these years, I did not find a table like I show below, connecting clearly $dd00 and $1800, showing what happens when one sends a bit over to the other. Re-fucking-diculous. So here it is finally. I hope some poor chap that was looking for it will have use of it. As a bonus, I add the wiring schematic I created first.
Thus, every bit the 1541 is sending over the CLOCK or DATA line is coming in inverted. So, either you send the bits pre-inverted so they come out fine post-invertation, or you will have the C64 do the invertion. By opting to store the data already inverted I think Beyond killed two birds with one stone. Every bit send over will end up fine on the C64 side, a nice speed bonus, and any wannabe-pirates looking at a sector will scratch their heads in confusion. Copy-protection is boosted!
10. Sectors are send to C64 in two parts of 128 bytes.
The chunks are send to the C64 by reading the sectors, as usual, but Beyond sends the data in each sector in two parts of 128 bytes. A chunk to send consists of multiple sectors of 256 bytes, and then some left over bytes. The remainder number of bytes (that could not be a multiple of 256) are send over to the C64 byte by byte using a different routine, actually the same routine that sends over individual bytes and confirmatory checks etc. The C64 knows when to switch to that other receiving routine as well by checking the same memory size parameters (i.e. no more pages in the hi byte, then load the number of bytes in the lo byte using the other routine.
11. Game start offset is send over from the 1541 (P8ish)
Perhaps just part of a good load system, but it may also help if you haven't stored the starting location of the game on the C64 side. The loader receives it from the 1541 after all the chunks have been loaded. These are the final two bytes and then the whole process stops on both sides. The C64 side prepares the start and that's it!
This concludes the copy protection tricks used on the original C64 PAL version of Lords of Midnight by Beyond in 1985.
FAST LOADER Speed
Finally, let's take a look at the fast loader employed. The loader that sends over the 256-byte sectors, not the one that sends individual bytes. Here's the 1541 side with port $1800, and the C64 side with port $dd00.
The routines start right after the C64 brought the data line to inactive (5V) and the 1541 detected this. This is the signal for the data to be transfered. Check the cycle table next.
The colours match the different loops in the code. The core loop is the loop where 4 times 2 bits are send or recieved over the data and clock line, making up one byte. That takes 29 cycles on either end, thus is exactly matched. The extended loop is the loop that loads or stores each byte and send/gets the next byte until 128 bytes are done. Notably, that takes 19 cycles on the 1541, but 17 cycles on the C64. The pre-delay is cycles wasted before the extended loop begins. For the 1541, this is 2 cycles, but it is 49 cycles on the C64. This is done to make sure the 1541 has put valid data on the lines. See what that means for one byte transferred below. There is about 18 microseconds gap between sending and receiving and 11 microseconds between receiving and sending of the next bits. Finally, there is post-delay, which is wasting of cycles after all bytes have been transferred, but before the CLOCK line is set active (0V) again. (Which is detected as 0 bit on the C64 side, due to missing inverter for the clock line. see figure above).
The 128 bytes are send over without any further handshaking, so the code better be tight or else bits will be missed. A full loop for one byte takes 134 cycles on the 1541 side and 132 cycles on the C64 (the 2 cycle difference of the extended loop). If you would just track the number of cycles things would start to go out of phase after just a couple of bytes!
But it doesn't, because a cycle on the 1541 is faster than a cycle on a PAL C64. The C64 has a 17.7 MHz clock (crystal) that gets divided by 18 to render a CPU running at 0.98524 MHz. Meaning it is slower than 1 MHz. As a result, each cycle takes longer to complete, not 1 microsecond - like on the 1541 - but 1.015 microseconds!
134 cycles on the 1541 is roughly 134 microseconds, 132 cycles on the C64 is 132/0.98524 = 133.97 microseconds! Pretty close to the 1541, right? Good enough! There is a gap of 17.76 microseconds between the first 2 bits of byte 0 being transferred and received; that gap becomes 14.7 microseconds by the time the first two bits of byte 127 have been transferred. So there is definitely decline in the gap, but still reliable for succesful transfer of 128 bytes.
All in all, the routine gets to the end (setting the CLOCK line to active) after 17181 microseconds on the 1541 and after almost 17200 microseconds on the C64, a gap of about 18 microseconds, the 1541 finishing before the C64.
Now, one can calculate the speed of the transfer of those 128 bytes. The 1541 is sending them over at a rate of 7450 bytes per seconds, or 7 Kb/second, a BAUD rate of 59.5K (bits/s). The fast loader is not achieving those rates as it has to do other stuff, resync using handshaking and left over bytes are transferred via a slower routine. It loads the game in roughly 25 seconds from hitting return on LOAD. The chunks total 42968 bytes. So the actual speed, including overhead, to load ~43K from disk is 1718 bytes/second, or 1.7 Kb/second.
Still, we're talking 1985 and this is a massive upgrade from standard 300-400 bytes per second, that would take over 2 minutes to load! No time to pour a nice cuppa, buster! Get into the game!