An Imaginary History of S/370 Hardware Design

Imaginary History of S/370 Hardware Design

Before we get started with POPs, let's take an imaginary journey through the evolution of the S/370 hardware design project. Our journey will not be true to actual events, but rather is intended to get you started thinking about the hardware from a designer's viewpoint.

Let's begin with some requirements for the S/370 CPU. It should support the following operations:

Mathematics
Logical
Memory movement
I/O
Branching

The CPU will execute programs that reside in memory. Despite what Bill Gates is famously reported to have said, 64KB (16 bits) isn't enough for anyone; we'll add another byte of memory addresses so our memory will range from X'000000' through X'FFFFFF' which are 24 bits.

Management assigns the software development team to provide the CPU specification, and after some time passes they deliver the following high level design.

Design One

Instructions are byte aligned

The first byte of each instruction is reserved for the operation code, allowing 256 different instructions

The CPU will treat X'00' as an invalid operation which will halt the CPU. This brings us down to a maximum of 255 valid instructions.

Instruction lengths are implied by the opcode

Instructions will be separated into several formats, and each format will consist of distinctly defined fields

The CPU has private internal registers and some private storage, such as might be required to establish a correspondence between instruction lengths and opcode

One of the CPU's private storage areas will be called the PSW (Program Status Word) which will contain the address of the next instruction to be executed. The PSW will be 24 bits in length.

I/O hardware will be addressed by a structured 3 byte field:

One byte designating the channel to perform the operation

Two bytes designating the device number attached to the channel

The ASCII character set does not appear to meet our needs, so we will specify a different character set which we will name EBCDIC for Extended Binary Coded Decimal Interchange Code. The high order bit will indicate the character type: zero for control characters, one for non-control characters.

Numeric data will be encoded as binary format

The following instruction formats will be recognized by the CPU. The numbers beneath the fields show how many bits each field occupies.

Memory movement
Math
Logic OPCODE LENGTH ADDR1 ADDR2 8 24 24 24 10 bytes

I/O OPCODE CHANNEL DEVICE 8 8 16 4 bytes

Branch OPCODE FLAG ADDR 8 8 24 5 bytes

At this point, we notice we need some kind of 8-bit flag to control branching, so the PSW is extended from 24 bits to 32 bits.

Management gives the software team's CPU design to the hardware team for review and to determine an implementation schedule. The hardware team is not at all pleased. They begin calling the CPU design by the initials WRD (which they begin to pronounce "word"), for "Widely Reviled Design". Before they produce an implementation schedule they demand a design change: since instructions vary in length, the instruction length will be encoded in the high-order two bits of the instruction opcode. Some of the CPU's internal storage will be used to house the following table:

I/O 00 4 bytes

Branch 01 5 bytes

Reserved 10 ??? bytes

Memory movement 11 10 bytes

Additionally, they suggest that they might be able to reduce instruction lengths and would like to provide an alternative instruction design. Management consents, and gives them the rest of the afternoon to do so. While not happy about the time constraint, they do win management's blessing to attend all future design meetings. Indeed, management decrees that henceforth no design meetings will be held without at least one hardware designer present.

Design Two

The first thing the hardware team notices is that storing memory addresses and the maximum possible memory length in each instruction occupy most of the instruction. Quick reference to the parts kit reveal they have some high-speed chips that they've used in other projects. Being high-speed, they're of course also costly so they decide to provide a limited number of them: 256. These registers will be connected to the CPU, which will select them by address (0-255). These registers will be used for just about everything, so they will be called General Purpose Registers.

Having decided on using registers to address main memory, they recall they've been asked to provide the ability to index off the address in each register from prior projects, so they also provide this ability (having already done the work to do this, they want to reuse the design). They're pressed for time, so they decide that 8 bits of offset is sufficient. This reduces the 24-bit memory address to 16 bits:

REGISTER-NUMBER MEMORY-OFFSET 8 8

This memory offset will address a 256-byte "page" of memory. The offset will be appended to the GPR's contents thusly:

GPR-CONTENTS MEMORY-OFFSET 16 8

producing the 24-bit memory address. They've got a good supplier for 16 bit registers, so they're convinced they can deliver this GPR technology at a reasonable cost.

Amongst themselves, the hardware team refers to the REGISTER-NUMBER as the BASE, and the MEMORY-OFFSET as the DISPLACEMENT (which they abbreviate DISP). Engineers like short labels.

BASE DISP 8 8

The next day the hardware team presents their design to management, with the software team in attendance. The new instruction formats look like:

Memory movement
Math
Logical OPCODE LENGTH ADDR1 ADDR2 8 8 16 16 6 bytes

I/O OPCODE CHANNEL DEVICE 8 8 16 4 bytes

Branch OPCODE FLAG ADDR1 8 8 16 4 bytes

The ADDR1 and ADDR2 fields consist of a BASE-DISPLACEMENT field.

The GPR technology is enthusiastically accepted by the software team, but they notice there is no ability to manipulate the GPRs themselves. They also ask that the GPRs be extended in length so they can use them to contain larger numbers. At this point, management interjects that money doesn't grow on trees and that if they want larger GPRs they'll have to agree to fewer of them.

The hardware team confers, and decides they can string two 16-bit GPRs together to form a 32-bit GPR. They begin discussing how many GPRs they can provide when someone on the software team says they'd like to have the number easy to read when the program aborts, so a single hex digit would be convenient to contain the register number; at least if they can't have 256 GPRs that is. Other members of the software team really like the 2-byte GPR-OFFSET layout, since they can fit more code in the same memory space as compared to the old instruction format layouts. Everyone agrees 2-byte GPR-OFFSETs it is. So the new GPR-OFFSET layout would look like:

REGISTER-NUMBER MEMORY-OFFSET 4 12

One of the new software developers says they'd like the CPU to halt when it tries to branch to an odd address, since all the instructions have lengths that are multiples of two. The hardware team, happy that the GPR technology they know so well is being used readily agrees. The new software developer asks if the new 12-bit memory-offset thing will still be called a page and everyone agrees that 4K pages seem OK.

One of the senior software team members asks if the CPU can add two GPRs together to calculate the memory address. He'd like to be able to use a branch table indexed by a GPR. The hardware team confers and decides they might be able to do that.

At this point, management asks for a new design document that consolidates all the ideas discussed so far. The lead hardware designer walks to the whiteboard and draws the following diagram.

Register to Register [RR] OPCODE R1 R2 8 4 4 2 bytes, opcode high-order bits 00

Conditional Branch [BC] OPCODE FLAG INDEX REGISTER OFFSET 8 8 4 4 12 4 bytes, opcode high-order bits 01

Register to Storage [RS]
Storage to Register [SR] OPCODE REGISTER SPARE REGISTER OFFSET 8 4 4 4 12 4 bytes, opcode high-order bits 01

I/O, Miscellaneous [??] OPCODE CHANNEL DEVICE 8 8 16 4 bytes, opcode high-order bits 10

Storage to Storage [SS] OPCODE LENGTH ADDR1 ADDR2 8 8 16 16 6 bytes, opcode high-order bits 11

Historical note: The lead hardware designer had a bowling league tournament in 45 minutes, and is anxious to get going. Such constraints, although not prevalent, do occur in S/370 designs. This was explained to me by Doug White, and will hereafter be referred to as the Doug White Bowling League Theory. Doug originally introduced this theory to explain why various IBM Utilities use varying syntax to express the same thing, but it seems to apply well to our imaginary S/370 design story so I'm borrowing it. This helps explain why the number of bits in the Conditional Branch row don't add up to 4 bytes. Doug also gave me a copy of the S/360 FE Handbook, which I treasure to this day; he's a swell guy.

Happily, someone notices the number of bits in the Conditional Branch instructions don't add up properly. The lead hardware designer says he'll think about it, but he really has to go; they'll have to continue the meeting tomorrow. So off he goes. History fails to record how well he bowled that night.

The next day, the design group reconvenes and tries to clean up some loose ends. First on the agenda are the Conditional Branch instructions, which contain too many bits to fit in four bytes. The lead hardware designer proposes the FLAG byte be reduced in length to four bits. Since no one has yet specified what the FLAG byte does anyway, he doesn't think this will be much of a problem.

One of the software developers asks whether they'll be able to branch based on a GPR's contents, as long as they're talking about branching. This looks like it should work with a 4-bit FLAG and still fit in a 2-byte [RR] format instruction. Following a detailed explanation of Boolean Logic from one of the software developers, it is decided that the FLAG will be a 4-bit field; each bit will represent one of four possible states. After some further discussion, it is decided this FLAG should act as a mask thereby allowing the user program to test for more than one condition at a time. There is some general complaining from the hardware team until it is pointed out that they had already committed to providing the CPU with logical operations, at which point they agree that a logical MASK is OK.

Someone notices there are four spare bits in the Register to Storage and Storage to Register instructions; can they do the same Index GPR thing there? Following more discussion, the new [RX] instructions replace the old [RS] and [SR] instructions. The [RS] and [SR] instructions were the same format anyway, two names for the same thing just confused matters.

It is at this point we leave our imaginary history of the S/370 design. We haven't quite accounted for the complete set of instruction formats, but we're getting closer. Should we need to call upon our imaginary designers at a later date, they'll be ready and waiting for our call. For now, they've served our purpose of showing how design tradeoffs occur and how those tradeoffs end up etched in silicon.

Design Three

Register to Register [RR] OPCODE GPR1 GPR2 8 4 4 2 bytes, opcode high-order bits 00

Register Conditional Branch [BCR] OPCODE MASK GPR1 8 4 4 2 bytes, opcode high-order bits 00

Storage Conditional Branch [BC] OPCODE MASK INDEX-GPR BASE-GPR OFFSET 8 4 4 4 12 4 bytes, opcode high-order bits 01

Register Index [RX] OPCODE GPR INDEX-GPR BASE-GPR OFFSET 8 4 4 4 12 4 bytes, opcode high-order bits 01

I/O, Miscellaneous [??] OPCODE CHANNEL DEVICE 8 8 16 4 bytes, opcode high-order bits 10

Storage to Storage [SS] OPCODE LENGTH ADDR1 ADDR2 8 8 16 16 6 bytes, opcode high-order bits 11

CONTENTS