Developing for a PDP-11

From CSLabsWiki
Revision as of 03:15, 2 June 2015 by Northug (talk | contribs) (Application Binary Interface: Yay manual information!)

Jump to: navigation, search

The PDP-11 was a minicomputer developed by Digital Equipment Corporation as one of the Programmed Data Processors series. These ISAs (and their successors) were extremely influential in modern stored-program computer architecture; a brief glance at the assembly reveals that it resembles a pleasant cross between the well-known Intel 8086 ISA in operation and ARM with its register model (registers are named rN for some natural N, and operational registers such as pc and sp are amongst them). Unlike many of the other PDPs--and perhaps owing to its popularity in real-time control systems, where supposedly many continue to run as intended--the PDP-11 is still a build target for modern compilers (gcc especially).

As a reference for (and expansion upon) the material in this article, you will want to keep the PDP-11 Handbook under your pillow. Rarely have I seen a platform whose intricacies are so simplistic that it can be fully documented in 112 pages.

Developing for the Platform

As a developer for this platform, it is worth noting that it is a 16-bit microcomputer with a fixed instruction size (of one word). Addresses are similarly limited; the entire virtual address space spans 64k, which was inconceivable at the time of its conception, but quickly became a reality. In order to cope with increasing memory capacity at lower costs, further models were released with 18-bit and 22-bit address busses--but the processor architecture did not change in any significant way. Rather, an MMU (Memory Management Unit) peripheral was added to convert the 16-bit addresses into the native address size. (Though the [www.bitsavers.org/pdf/dec/pdp11/handbooks/PDP11_Handbook1979.pdf processor handbook] calls these "pages", they are, more or less, the first version of segmentation, a type of virtual addressing still used in x86 processors when booted to real mode.) Luckily, PDP-11s which have larger native address sizes generally boot in a 16-bit mode that permits access to the IO bus at its usual addresses for prior versions.

Finally, as one more peculiarity, the native numeric representation of the PDP-11 is octal. Each octal digit maps to a sequence of three bits:

Octal Binary
0 0 0 0
1 0 0 1
2 0 1 0
3 0 1 1
4 1 0 0
5 1 0 1
6 1 1 0
7 1 1 1

While this did not evenly fit into the native word size of the machine (all possible words were in the range 000000-177777), it remains to be the standard for all documentation written for the platform. Ultimately, the VAX-11, another DEC successor to this model, introduced the more familiar hexadecimal notation (along with the 32-bit word size and true paging).

Getting Started

PDP-11 assembly is almost trivial enough that, with some experience, it can be written with a hex (oct?) editor. Unlke the VAX and x86 instructions, all instructions took up exactly one word size, and their operands have a consistent representation that is independent of the instruction for which they are encoded--such is the benefit of the orthogonal instruction set. Wikipedia's article on the topic contains more than enough information for anyone with a pencil, paper, and time to become an adequate PDP-11 (dis)assembler, and, indeed, you may want to become practiced in this if you plan to read through memory dumps.

But, of course, setting up an assembler and compiler is infinitely more pleasant. Most examples of assembly are written for the MACRO-11 assembler, an assembler which can still be found by an author who made a version of it for his Windows-based emulator (the source is still "cross-platform"). However, an artifact of building gcc and binutils is gas, the GNU assembler, which--when targetting the PDP-11--understands all of the MACRO-11 syntax I've thrown at it so far (though it is not its native syntax, you have been warned).

Without further ado, then, let's set up a gcc cross-compiler. This is a fairly fundamental step in compiling for any non-native architecture (I would be very surprised if a host viewing this is on a PDP-11) and tends to be the most imposing, though it's not as hard as it seems--no one seems to document it well.

First things first, gcc depends on a matching binutils somewhere--this is where it derives its assembly and various other features. Get a binutils snapshot (the latest and greatest version still works at the time of writing--I'm using 2.24) and extract it somewhere; I like doing so in /tmp because the source tree is still available in pure form as the downloaded archive if I need it:

   cd /tmp
   mkdir build
   cd build
   tar xvf /path/to/binutils-version-stuff.tar.bz2

When this is done (and it may take a little bit), we can set up the build. A relatively undocumented feature of most GNU compiler-related projects is that they expect an out-of-tree build--and may break if you try to build in tree--ssimho don't use "configure" from where it's situated! Although redundant, I like to put build directories inside the working directory of the repository, and name them with their target (so that I may build multiple targets at once).

Targets for binutils and gcc consist of, at the least, a machine architecture and a binary format for output, separated by a dash. We will be using pdp11-aout for this demonstration, as pdp11-elf does not compile in binutils at present. Besides, a.out format was the native binary format for this machine (when running any of the Unices or derivatives it supported).

At this point, you will also want to choose a prefix. The default is /usr/local, but that requires root privileges (sudo make install). If you do not have these, you can still install into a directory you own (like $HOME), but remember to be sure that the directory you choose is in the relevant PATHs (particularly, <dir>/bin in PATH and <dir>/lib in LD_LIBRARY_PATH).

Let's get to it, then. As with above, feel free to change the --target and --prefix arguments to configure:

   cd binutils-version-stuff
   mkdir build-pdp11-aout
   cd build-pdp11-aout
   ../configure --target=pdp11-aout --prefix=/usr/local
   make

Several minutes later (on fast machines; worse for slower ones), the build should finish without error. when that time comes:

   sudo make install

(or just make install if you don't have sudo--and you own the prefix directory.)

If your build errors out, you may have to choose a different target (especially in binary format). There are many ways the build can go wrong, so I couldn't possibly cover them all here. Just remember that, if you choose a different target, you will need to be consistent about it for the next step.

With binutils made and installed, you should be ready for gcc. The setup is about the same, so forgive me if I elide the details.

   cd /tmp/build
   tar xvf /path/to/gcc-version-stuff.tar.bz2
   cd gcc-version-stuff
   mkdir build-pdp11-aout
   cd build-pdp11-aout
   ../configure --target=pdp11-aout --prefix=/usr/local
   make
   sudo make install

If all goes well, after this procedure, you should be able to type pdp11-aout-gcc --version at a prompt and get back the version of GCC you just compiled.

If all hasn't yet gone well, it turns out that GCC building isn't exactly turn-key; fortunately, GCC developers are hosting an easy-to-read list of GCC dependencies, which includes their multiprecision libraries (MPFR, MPF, GMP). Don't worry, you only need these on the host platform, so you don't need to build them from source--though instructions vary widely between platforms you are building on, I found it sufficient to run the following on Ubuntu/Debian:

   sudo apt-get install libgmp-dev libmpfr-dev libmpc-dev

Other Linuces likely have similar packages available from their package managers, or at least ways of building these libraries if need be.

Application Binary Interface

The ABI of the PDP-11, as I've observed it being emitted by GCC, is corroborated exactly by the UNIX v5 specifications. In particular, should you be looking to link assembly to C, you'll want to know the calling convention:

Register Usage Saved By
r0 Return registers (up to 32-bit) Caller
r1
r2 Local variables (3x16bit) Callee
r3
r4
r5 Frame pointer (fp, bp--rarely aliased)
r6 Stack pointer (sp--usually aliased)
r7 Program counter (pc--usually aliased)

Interrupts are a slightly different beast. Unlike x86-compatibles, the Interrupt Vector Table is fixed at location 0 in memory, and has a fixed size (sources argue on the size, but 256 words/512 bytes is a safe bet [citation needed]). All interrupts to the processor possess a vector, which is the (necessarily even!) address of two words in the IVT. Most hardware devices can be programmed, either via software or hardware, to alter their vector, whereas certain software instructions generate interrupts to vectors on their own. Some common ones are as follows:

Cause Vector
TRAP instruction, generic trap 34
BPT instruction, debugger breakpoint 14
IOT instruction, IO emulation trap 20
EMT instruction, instruction emulation trap 30
Debugger serial input (from TTY) 60
Debugger serial output (to TTY) 64

The documentation of many devices lists their vectors and their configurability.

The overall trap procedure can be emulated in a processor by the following MACRO-11 for some vector VEC:

   PSW = 177776
   mov @#PSW, -(sp)
   mov pc, -(sp)
   mov @#VEC+2, @#PSW
   mov @#VEC, pc

Note that the processor status word (PSW) is memory-mapped at 177776. Also note that the EMT and TRAP instructions are related; they both begin with (octal) 104xxx, where xxx is 0-377 for EMT and 400-777 for TRAP. From within a TRAP or EMT interrupt handler, you can determine the value of the instruction using something similar to:

   mov r0, -(sp)
   mov 2(sp), r0
   ...
   mov (sp)+, r0

It is critical that, from within a service routine, all registers are callee saved. After operation, assuming the stack pointer is back where it was when the handler was entered, one can use the RTI instruction to return from the handler, undoing the trap entry procedure:

   add #4, sp
   mov -2(sp), @#PSW
   mov -4(sp), pc

Do not use RTS pc for returning for an interrupt handler--you will corrupt the stack!

Bits <7:5> of the PSW (zero-indexed) are the Interrupt Priority Level. They specify a threshold above which an interrupt needs to be (in level) to cause service by the processor. Software interrupts occupy the lowest levels (0-3, respectively Stack Overflow, Trace Trap, Trap Instruction (as above), Bus Error), whereas interrupts (4-7) are reserved for hardware devices through four physical lines, named BR4-BR7. Most devices can be somehow configured to use a different level. If the software chooses to honor a bus interrupt, it will raise the matching BG4-BG7 line, which indicates to the device that it should put its vector on the bus. Note that you, the systems programmer, have the ability to control how nested interrupts can be by what the vector's stored PSW's IPL is set to. A good practice is setting the IPL to the expected level of the received interrupt.

Working with SIMH

SIMH is an excellent little historic computer simulator that includes support for the PDP-11, amongst a long list of other contemporaneous systems. There are various ways to get it, including downloading Windows binaries, getting Debian packages, or building it from source tarballs. I won't cover that build process (I got it from a dpkg myself :), but I promise you it should be trivial after building a cross-compiler.

For how well it works, SIMH is one of the most abhorrently-documented projects I've seen. For example, each simulator supports different load formats; the one we're interested in, the PDP-11, states in the distributed PDF document that "load" will receive "standard binary format tapes". No, these aren't .tar files. The only place you can find any documentation whatsoever on the format actually accepted is by cracking open the source:

/* Binary loader.
   Loader format consists of blocks, optionally preceded, separated, and
   followed by zeroes.  Each block consists of:
        001             ---
        xxx              |
        lo_count         |
        hi_count         |
        lo_origin        > count bytes
        hi_origin        |
        data byte        |
        :                |
        data byte       ---
        checksum
   If the byte count is exactly six, the block is the last on the tape, and
   there is no checksum.  If the origin is not 000001, then the origin is
   the PC at which to start the program.
*/

simh/PDP11/pdp11_sys.c, lines 218-237

It's actually not too terrible, if you forgive the fact that there's no documentation for the algorithm computing the checksum. We can ignore the fact that it takes three octal digits to represent a byte for the moment, since we'll be dealing with that anyway.

Lucky for you, I already pounded my head against the wall and banged out this "small" Python script that should do the right thing:

#PDP-11 crappy "terp" (tape) format for loading into simh/pdp11's terp dervs.

import struct
import sys

if len(sys.argv) < 2:
    print '''Usage: python mkterp.py {BLOCK} {BLOCK} {BLOCK}
where each {BLOCK} is:
	[-O <origin>] to set the origin (default 1)
	-d <file> a binary file to read in data (terminates the block).
OR:
	-o <fname> to set the output file name (last instance wins).
OR:
	-p <pc> to set the address at which the program shall start (last instance wins).
'''
    exit()

ofile = 'image.out'
pc = 1 #For some reason, 1 does not set PC to any special value on load.
blocks = [] #[{'org', 'fname'}]
org = 0

i = 1
while i < len(sys.argv):
	if sys.argv[i] == '-o':
		oname = sys.argv[i+1]
		i+=2
	elif sys.argv[i] == '-p':
		pc = eval(sys.argv[i+1])
		i+=2
	elif sys.argv[i] == '-O':
		org = eval(sys.argv[i+1])
		i+=2
        elif sys.argv[i] == '-d':
		fname = sys.argv[i+1]
		i+=2
		blocks.append({'org': org, 'fname': fname})
		org = 0
	else:
		print 'Unrecognized option:', sys.argv[i], 'skipped'
		i+=1

print 'Blocks to be put into %s:'%(ofile,)
for block in blocks:
	print 'File', block['fname'], '@', block['org']

of = open(ofile, 'wb')
for block in blocks:
	inf = open(block['fname'], 'rb')
	data = inf.read()
	inf.close()
	pkt = struct.pack('<HHH', 1, 6+len(data), block['org'])+data
	csum = 0
	for ch in pkt:
		csum = (csum + ord(ch)) % 256
	of.write(pkt + chr(256 - csum))
pkt = struct.pack('<HHH', 1, 6, pc)
csum = 0
for ch in pkt:
	csum = (csum + ord(ch)) % 256
of.write(pkt + chr(256 - csum))
of.close()
print 'Complete.'

The usage of this script should look something like the following:

   python mkterp.py -O ldaddr_A -d A -O ldaddr_B -d B -O ldaddr_C -d C ... [-o outfile] [-p startPC]

where A is to be loaded into memory starting at ldaddr_A, B is to be loaded into memory at ldaddr_B, and so on. The output file is cooked and ready to serve with "load" on the simulator command prompt.