How Not To Release Historic Source Code

4 min read Original article ↗

This is how to not do it:

GitHub

Don’t get me wrong, it’s absolutely brilliant that Microsoft was able to release a fairly complete (minus DOSSHELL) source code for MS-DOS 4.00 or 4.01 (see below). As much as it was hated, DOS 4.0 was an important milestone and DOS 5.0 was much more similar to DOS 4.0 than not. This source code will be an excellent reference of modern-ish DOS until Microsoft officially releases the long ago leaked MS-DOS 6.0 source code. The source code includes all required build tools, which makes building it (compared to many other source releases) extremely easy.

But please please don’t mutilate historic source code by shoving it into (stupid) git.

First of all, git does not preserve timestamps, which causes irreversible damage. Knowing when a source file was last modified is valuable information.

Second of all, the people releasing the source code clearly thought, hey, it’s source code, let’s shove it into git, what could possibly go wrong. Well, this is what could go wrong:

Nope, not building

For practical purposes, old source files are not text files. They are binary files, and must be preserved without modification. It is not OK to take an old source file and convert it to UTF-8. For one thing, UTF-8 didn’t even exist in the times of MASM 5.10 and Microsoft C 5.1, of course old tools can’t deal with it!

The above problem was most likely caused by taking a source line using codepage 437 characters and badly converting them to UTF-8. That made the source line too long, past the circa 512 byte line length limit of MASM.

In the case of getmsg.asm it’s easy enough to manually delete the too long line in a comment. But it’s much worse with the src\SELECT\USA.INF file. Here, the misguided use of git not only made some comment lines too long for MASM, but it also actively destroyed the original source code. The byte arrays defined near labels PANEL36 and PANEL37 got turned into junk, or more accurately into a sequence of Unicode replacement characters.

This blunder is all the more regrettable because similar problems affected the previous GW-BASIC source release (very old MASM versions cannot deal with UNIX style line endings).

The timestamp destruction makes it harder to pin down what the source code actually is. The DOS 4.0 release was very confused because IBM first released PC DOS 4.0 in June 1988 (files dated 06/17/1988), but soon followed with a quiet update (files dated 08/03/1988) where the disks were labeled 4.01 but the software still reported itself as 4.00.

The just released source code almost certainly corresponds to this quiet 4.01 update. At least one source comment implies 8/5/88 modification, i.e. August 1988.

At least the core files (IO.SYS, MSDOS.SYS, COMMAND.COM, FORMAT.COM, FDISK.SYS, SYS.COM) built from the source release are a perfect match for the files on “MS-DOS 4.00” disk images that can be found on winworldpc.

Said files are dated 10/06/1988 and DOS reports itself as 4.00. However, the released source code, in the file SETENV.BAT, includes the following line:

echo setting up system to build the MS-DOS 4.01 SOURCE BAK...

This further suggests that the source code in fact corresponds to the quiet update of DOS 4.01 and not to the original IBM DOS 4.00 from June 1988, which to the best of my knowledge was never available from Microsoft. After a few months, perhaps in late 1988 Microsoft changed DOS to report itself as 4.01 because—unsurprisingly—the 4.00 version number was confusing customers.

As a historic footnote, BAK stood for Binary Adaptation Kit. MS-DOS OEMs would receive the BAK to adapt to their hardware. However, most OEMs did not receive the full source code, only the code to components that likely needed modification, such as IO.SYS.

But the fact that the “Source BAK” was something that Microsoft shipped to (select lucky) customers is actually great—since it’s supposed to be built by 3rd parties, it includes all of the required tools and is in fact quite easy to build.

Executive Summary

It’s terrific that the source code for DOS 4.00/4.01 was released! But don’t expect to build the source code mutilated by git without problems.

Historic source code should be released simply as an archive of files, ZIP or tar or 7z or whatever, with all timestamps preserved and every single byte kept the way it was. Git is simply not a suitable tool for this.