The Secret Life of Vector Generators
By Jed Margolin
During my time at Atari/Atari Games I worked on several XY games. This article represents what I know about Vector Generators. This is the companion piece to The Secret Life of XY Monitors.
Vector Generators  Contents
1. Digital Vector Generators
2. The Vector Generator State Machine
3. Lunar Lander, Asteroids, and Asteroids Deluxe
4. Analog Vector Generators
5. The Vector Generator State Machine Revisited
6. BattleZone, Red Baron, and Malibu Grand Prix
7. Tempest
8. Space Duel and the Gate Array
9. Quantum
10. Star Wars
11. Major Havoc and The Empire Strikes Back
12. TomCat
13. The Future of XY
14. A Final Thought
__________________________________________________________________________________________
Digital Vector Generators
The Digital Vector Generator was the first vector generator Atari developed, and was used in Lunar Lander, Asteroids, and Asteroids Deluxe.
We will start with the standard unipolar DigitaltoAnalog Converter (DAC) shown in Figure 1.
First, notice that the DAC's most significant bit is 'B1' and that the order of the bits is backwards from what we normally see. This is common in DACs. In earlier days there was quite a battle over whether to start with '0' or '1' (as in 'd0' or 'd1') and whether 'd0' (or 'd1') should be the most significant bit or the least significant bit. Texas Instruments persisted in labeling their EPROM data as 'd1'  'd8' long after others adopted the current standard. Perhaps the people who designed DACs were similarly late in getting the message.
Even today the world is divided into two warring factions when it comes to how to order the bytes in a word. The Motorola Camp uses the High Byte, Low Byte order; the Intel Camp uses the Low Byte, High Byte order. Not only does it matter in microprocessors, it can matter when two computers are exchanging data. The classic article on the subject is On Holy Wars and a Plea For Peace by Danny Cohen, written in 1980. You can find it in a number of places with a Google search. For a mirror copy click here.
Now, back to Figure 1.
Because the DAC is unipolar, 10 bits produce an output with 1024 steps ranging from 0 to 1023 (decimal). In this example we are assuming a voltage output which represents the position of the beam on the screen of the XY monitor.
What we want is to have 0 in the middle of the screen, with positive numbers to the right and negative numbers to the left. Referring to Figure 2, we introduce a negative offset of Vmax/2 to the output of the DAC. We now have a digital range of 0 to 1023 (decimal) producing an output of Vmax/2 to Vmax/2 with 512 (binary $200) representing an output of zero.
If we complement the most significant bit as shown in Figure 3, an input of $000 becomes $200 (512 decimal) which produces a VOUT of 0. An input of $1FF (511 decimal) becomes $3FF (1023 decimal) which produces a VOUT of Vmax/2. An input of $200 becomes $000 which produces a VOUT of Vmax/2. With a 10bit number in Two's Complement Form the most significant bit is the sign bit. Positive numbers have a sign bit of '0' so the largest positive number is $1FF (511 decimal). Negative numbers have a sign bit of '1' so the most negative number is $200, representing 512 decimal. Notice that the range of positive and negative numbers is not exactly symmetrical (512 to +511). Well, that's life I guess.
Two's Complement Form is exactly what we want. Numbers in Two's Complement Form are the easiest to manipulate in binary arithmetic.
As a final check, let's give it an input of 1 (decimal). In a 10bit number, 1 has all the bits set ($3FF). Complementing the most significant bit produces $1FF (511 decimal), which is one less than 512, which produces a VOUT of 1 step. Essentially, what we are doing is adding 512 digitally to the input and subtracting 512 (in analog form) from the output.
Now that we have gotten that out of the way, let's do some digital stuff. Let's connect a counter to the DAC as shown in Figure 4.
We can load the counter using by presenting data to d0d9 and strobing the Load input. We can also increment or decrement the counter by selecting Up/Down as desired and strobing the Clock input.
There is a small problem to deal with. The DAC contains a resistor ladder network, and changing the input causes the DAC's internal switches to select a different combination of resistor taps. This causes a glitch in the DAC output. To prevent the glitch from getting to the XY Monitor, we use a sampleandhold as shown in Figure 5. When the DAC output is stable we close Switch SW and charge Capacitor C. (That's the sample part.) Once Capacitor C is charged, we open Switch SW and are free to change the DAC data. (That's the hold part. ) The Buffer amplifier has a high input impedance so it doesn't discharge Capacitor C within the period of the sample/hold cycle.
Since we have two axes (X and Y) we will use two circuits of the type shown in Figure 5.
Now that we can load the counter to position the beam, increment/decrement the counter to move the beam, and
deglitch the DAC, let's draw some vectors.
Let's assume for this example that the Deflection Amplifiers can move the beam at a maximum speed of 1 screen unit/microsecond.
In Figure 6a we will draw a vector 40 units long, along only the X axis. This will take 40 us. We can use a 1MHz clock and use an X vectorlength counter to produce 40 pulses.
In Figure 6b we will draw a vector 30 units long, only along the Y axis. This will take 30 us. We can use a 1 MHz clock and use a Y vectorlength counter to produce 30 pulses.
In Figure 6c we will draw a vector 40 units along the X axis and 30 units along the Y axis. If we use a 1MHz clock on both the X and Y vectorlength counters we end up with the vector shown in Figure 6c.
Oops. The vector that we actually want is shown in Figure 6d.
It's clear that the X and Y vector length counters cannot use the same clock unless the X and Y vectors happen to be the same length.
For example, if we draw the X component at 1 unit/us (40 us for 40 unit), we have to draw the Y component slower, so that at the end of 40 us it has gone only 30 units. Therefore, the Y component must be drawn at a rate of 30/40 = 0.75 units/us. If we drew it so that the Y component was drawn in 30 us (1 unit/us), the X component would also have to be drawn in 30 us, so that its drawing rate would have to be 40/30=1.33 units/us.
However, that would exceed the maximum drawing speed of the X deflection amplifier, so we have to scale the drawing speed to the longest axis (in this example, the X axis).
There are several methods for producing the clock rates we need. Atari used Binary Rate Multipliers (BRMs). A BRM is a counter that divides the input clock by a digital number. Although the pulses it produces are not guaranteed to be evenly distributed through the counting cycle they will be close enough for our purpose.
The BRM used by Atari was the 7497. The 7497 is a 6bit BRM. With a digital input of 63, it will produce 63 output pulses for every 64 input clocks. With a digital input of 1 it will produce one output pulse every 64 input clocks. Two 7497s were chained together to produce a 12bit BRM. The data sheet for the 7497 is available here (PDF 282KB).
Part way through the run of Asteroids, we used up the world's supply of 7497s and Texas Instruments (the only manufacturer of 7497s) did not have them on their schedule to make more for several months. Rather than shut down the production of Asteroids, Howard Delman designed a daughter board with smallscale ICs to replace the 7497s. A new layout for the Asteroids PCB was also done using the new circuitry.
The BRM's supply the appropriate clocks to the X and Y Position Counters (the counter in Figure 5). Now we have to either count the clocks or time them.
That requires a discussion of how fast the resultant vector should be drawn.
If we want all vectors in our example to have the same brightness density, they should be drawn at 1 unit/us. Since the vector that results from 40 X units and 30 Y units is 50 units, the vector should be drawn so it takes 50 us. { We have a right triangle, so the Hypotenuse R = sqrt(x*x + y*y) = sqrt(40*40 + 30*30) = sqrt(1600 + 900) = sqrt(2500) = 50 }
Why do we want a constant brightness density? Well, if we take two vectors (Vector 1 and Vector 2) that are drawn in the same amount of time, if Vector 2 is twice as long as Vector 1, Vector 2 will have its energy distributed over twice the distance as Vector 1, and will appear dimmer. (How much dimmer it will appear will be discussed shortly.)
Therefore, if we want a constant vector density, we would have to scale the clocks for both the X and Y by a factor of R so that:
X Clock = X / R and Y Clock = Y / R
(Because we are interested in the length of the vectors, and not the sign, we need to take the absolute values of the vectors.)
As an example, let's take the worse case, which occurs when the angle is 45 degrees.
In Figure 7a we will draw the vector in 100 us, the maximum rate for the deflection amplifiers. The resulting vector will be 141 units long,. Since it is drawn in 100us we will give it a density figure of 100us/141 units = 0.71 . If we were to draw only along the X axis, it would be 100us/100 units = 1.0 .
In Figure 7b we will draw the vector in 141 us,. The resulting vector will again be 141 units, but the density figure will be 141us/141 units = 1.0 .
One of the downsides is that we have pissed away some drawing time, which we would probably rather use to put more vectors on the screen.
The other downside is that we would have to do two multiplications, an add, a square root, and two divides (one for each vector).
This is a lot to do during program runtime, even if we simplify it by using a kludge for calculating R.
(The square root of the sum of the squares can be approximated by taking the absolute values of the two numbers, and by adding the larger one to a fraction of the smaller one.)
If our game shows only predetermined pictures, as in Lunar Lander and Asteroids, we can do the calculations during program assembly and avoid doing them during program runtime. The cost is increased program storage.
If the vectors are game dependent, as in the 3D objects in BattleZone, we don't have this option.
Let's resume the discussion of whether this method, as precise as it is, is necessary.
It turns out that a vector whose intensity is 40% greater than another vector, will not appear to be 40% brighter to the human eye because the human eye has a logarithmic response. In fact, the difference will be barely noticeable.
The object of this exercise was simply to understand what's really going on so we can make an intelligent decision about what to do and be confident we are making a reasonable decision.
The next choice of methods is to determine which axis is longer and use it to normalize the shorter vector. so that:
1. If X is longer: X Clock = 1 and Y Clock = Y / X
2. If Y is longer: Y Clock = 1 and X Clock = X / Y
We have simplified things a great deal but we still need to store more data if the calculations are performed during program assembly or, if performed during program runtime, we need a digital divider.
Atari's Digital Vector Generator simplifies one step further by using binary normalization performed during program assembly. The way binary normalization works is as follows.
X and Y are each loaded into a shift register; the Time register is loaded with a preset value. The X and Y Shift
Registers are shifted left (made larger by a factor of two by each shift) until either register is in danger of overflowing. Each time the registers are shifted left the Time Register is shifted Right, decreasing the time the vector will be drawn by a factor of two each time.
Example: X Vector = 106 units, Y Vector = 14 units.
X Y Timer
  
Binary (Decimal) Binary (Decimal) Binary (Decimal)
Start: 000001101010 (106) 000000001110 (14) 100000000000 (2048)
Shift: 000011010100 (212) 000000011100 (28) 010000000000 (1024)
Shift: 000110101000 (424) 000000111000 (56) 001000000000 (512)
Shift: 001101010000 (848) 000001110000 (112) 000100000000 (256)
Shift: 011010100000 (1696) 000011100000 (224) 000010000000 (128)
Stop: otherwise X will overflow into the sign bit.
The X, Y and Timer registers always maintain the correct ratios. The vector is then drawn with the normalized values of X, Y and time (from the Timer register) The vectors are drawn at maximum speed within a worst case factor of almost two (000010000000 [128] gets normalized the same as 000011111111 [255] ).
Because the initial state of the Timer has only one bit set at a time (the remainder are always zero) it can be represented as a 4bit number.
Thus, in the Digital Vector Generator, Binary Normalization is performed during program assembly and the initial state of the Timer is stored (as a 4bit number) in the vector database.
A 4bit Adder is used to allow for additional binary scaling for short vectors. (Otherwise, the 4bit value would overflow ) This is especially useful for small objects such as asteroids.
As we will see later in the Analog Vector Generator, circuitry was added to perform binary normalization during program runtime. This has nothing to do with whether the vector generator is Digital or Analog. It was added because the Analog Vector Generator was used in BattleZone where the object vectors were the result of 3D calculations performed during program runtime and therefore, could not be done during program assembly time.
Note that the Delta X and Delta Y values stored in Vector Generator memory are in Sign Magnitude form.
The Magnitude is the normalized absolute value and goes to the BRMs; the Sign determines whether the Counter counts Up or Down. The Outputs of the Counters are in Two's Complement Form.
The Vector Generator State Machine
Feeding the DACs with data and keeping everything going at full speed is a formidable task. The 6502 was nowhere near fast enough even if it didn't have to do anything else, like run the game.
What we used was a custom processor made out of SSI and MSI.
The heart of the Vector Generator processor is a State Machine consisting of a PROM and a Latch shown in Figure 8. The PROM is programmed so that the data at each address selects the next address. The Latch allows the output of the PROM to stabilize before it is applied back to its input, and provides the basic timing of the machine. Clearing the Latch causes the machine to enter State 0. The data at State 0 determines the next State. Because this machine allows us to select different states, it is called a State Machine.
We can decode the states to provide the maximum number of functions (eight). The disadvantage is that we will only be able to perform one function per machine cycle. By not decoding the states we will be able to perform several functions per machine cycle but then we will need a bit for each function. Or, we can do a little of each.
We could also combine decoded states on the backend, but since this is a teaching example, we won't.
In Figure 9 we have added a Decoder to the output of the Latch so each state can be used to perform a function. The Decoder is gated by the Clock signal to produce strobed signals for each state. These Functions will normally be performed at the end of the machine cycle.
We have also added two outputs to the PROM. We have not increased the number of states. These outputs are only there for the ride so we will be able to perform some functions in parallel with the strobed functions.
In Figure 10a (on the next page) we have added a ROM memory, controlled by a Counter which can be cleared and incremented. There are also two Latches whose outputs will each go to a DAC (X DAC and Y DAC). We will also use a SampleandHold circuit on each DAC and control them with the same signal. The DAC and SampleandHold circuits are shown in Figure 10b.
The Counter is labeled "Program Counter." Later we will find out why.
Because the ROM address is incremented after each DAC data access but not after a SampleandHold Command, we will increment the Program Counter with one of the separate PROM outputs. Incrementing the Program Counter still requires a strobed signal so we have added an AND gate.
Here are the States and what we will make them do.
Current State Increment PC Next State
State 0  Latch Data to X Latch 1 State 1
State 1  Latch Data to Y Latch 1 State 2
State 2  SampleandHold 0 State 0
State 3  not used 0 State 7
State 4  not used 0 State 7
State 5  not used 0 State 7
State 6  not used 0 State 7
State 7  not used 0 State 7
After a Reset, we start up at ROM Address 0, State 0.
Assuming the Reset is long enough to access the ROM, State 0 will load the data into the X Latch and increment the Program Counter to Address 1. The next state will be State 1.
State 1 will load the data into the Y Latch and increment the Program Counter to address 2. The next state will be State 2
State 2 will trigger the SampleandHold circuits. The Program Counter will not be incremented because it already contains the data for the next X DAC value. The next state will be State 0.
We have now created a simple processor with one Instruction consisting of three microinstructions. We will continue to execute this Instruction, fetching and loading data for the X and Y DACs and strobing their SampleandHolds forever, or until we get tired of it and turn it off.
In this example, if we were to get a glitch that put us into an unused state we will end up spinning in State 7. We could have just as easily programmed it to go to State 0 in order to continue. Or perhaps we should generate a Reset pulse. We could also have programmed it so that all errors go to State 7 and then used the signal to turn on an LED.
Let's make our State Machine more interesting. Referring to Figures 11a and 11b, we have added several items.
(The X and Y DACs are the same as those used in Figure 10b.)
We have added three inputs to the PROM and have connected them to the output of a Latch which receives its data from the ROM Data Bus (D0D7).
As a result, we now have the capability of performing eight different sequences. In other words, we now have eight instructions.
In addition, because the instructions to be executed come from the ROM we now have a Stored Program Computer. It is why the counter that provides the address to the ROM was given the name Program Counter.
If you look at the Program Counter you will see that we can now load it as well as increment it. Not only that, but the Multiplexer (Mux) gives us a choice of two different sources to load it from.
We can load it directly from the ROM data. This will allow us to make a Jump Instruction.
The other source requires some explanation. It loads the data for the Program Counter from the Register File Memory (see Figure 11b) which is configured as a memory stack. It will allow us to do subroutines.
In Figure 11b, a memory with separate input and output data paths receives its address from an Up/Down Counter. When we write to the current Register File address and increment the counter, it is called Pushing the Data on the Stack (otherwise known as a Push). When we decrement the counter and read the current Register File address, it is called Popping the Data from the Stack (otherwise known as a Pop).
To Jump to a Subroutine (JSR) we Push the Program Counter's data on the Stack so we will know where to come back to when we Return from the Subroutine (RTS).
Returning from a subroutine poses a subtle problem. The Stack pointer points to the next available address, so we have to use one state to pop the Stack and another one to Load the Program Counter. In this example it is a noop whose strobe signal is not used for anything else.
Now, let's program this puppy.
Instruction 0 (000)  Load X,Y (3 bytes = Cmd, X, Y; 4 machine cycles)
Current State Increment PC LOAD PC Next State
State 0  Latch Data to Instruction Latch 1 0 State 1
State 1  Latch Data to X Latch 1 0 State 2
State 2  Latch Data to Y Latch 1 0 State 3
State 3  Strobe SampleandHold 0 0 State 0
State 4  Push 0 0 State 7
State 5  Pop 0 0 State 7
State 6  (Nop) 0 0 State 7
State 7  Halt 0 0 State 7
Instruction 1 (001)  JMP addr (2 bytes = Cmd, Adr; 2 machine cycles)
Current State Increment PC LOAD PC Next State
State 0  Latch Data to Instruction Latch 1 0 State 6
State 1  Latch Data to X Latch 0 0 State 7
State 2  Latch Data to Y Latch 0 0 State 7
State 3  Strobe SampleandHold 0 0 State 7
State 4  Push 0 0 State 7
State 5  Pop 0 0 State 7
State 6  (Nop) 0 1 State 0
State 7  Halt 0 0 State 7
Instruction 2 (010)  JSR addr (2 bytes = Cmd, Adr; 2 machine cycles)
Current State Increment PC LOAD PC Next State
State 0  Latch Data to Instruction Latch 1 0 State 4
State 1  Latch Data to X Latch 0 0 State 7
State 2  Latch Data to Y Latch 0 0 State 7
State 3  Strobe SampleandHold 0 0 State 7
State 4  Push 0 1 State 0
State 5  Pop 0 0 State 7
State 6  (Nop) 0 0 State 7
State 7  Halt 0 0 State 7
Instruction 3 (011)  RTS (1 byte = Cmd; 3 machine cycles)
Current State Increment PC LOAD PC Next State
State 0  Latch Data to Instruction Latch 1 0 State 5
State 1  Latch Data to X Latch 0 0 State 7
State 2  Latch Data to Y Latch 0 0 State 7
State 3  Strobe SampleandHold 0 0 State 7
State 4  Push 0 0 State 7
State 5  Pop 0 0 State 6
State 6  (Nop) 0 1 State 0
State 7  Halt 0 0 State 7
Instruction 4 (100)  NOP (1 byte = Cmd; 2 machine cycles)
Current State Increment PC LOAD PC Next State
State 0  Latch Data to Instruction Latch 1 0 State 6
State 1  Latch Data to X Latch 0 0 State 7
State 2  Latch Data to Y Latch 0 0 State 7
State 3  Strobe SampleandHold 0 0 State 7
State 4  Push 0 0 State 7
State 5  Pop 0 0 State 7
State 6  (Nop) 0 0 State 0
State 7  Halt 0 0 State 7
Instruction 5 (101)  HALT (1 byte = Cmd; 1 machine cycle)
Current State Increment PC LOAD PC Next State
State 0  Latch Data to Instruction Latch 0 0 State 7
State 1  Latch Data to X Latch 0 0 State 7
State 2  Latch Data to Y Latch 0 0 State 7
State 3  Strobe SampleandHold 0 0 State 7
State 4  Push 0 0 State 7
State 5  Pop 0 0 State 7
State 6  (Nop) 0 0 State 7
State 7  Halt 0 0 State 7
The remaining instructions (57) are all programmed the same as the Halt Instruction.
Other Things We Can Add
1. Replace the Program ROM with RAM and add multiplexers so that a host processor (like a 6502) can access it to change the program and data to display different patterns. We can even use a mixture of ROM and RAM, putting frequently used programs in ROM and accessing them with JSR instructions. (That's why we created the JSR and RTS instructions.) We probably also want to increase the size of the memory.
2. We are using only three of the eight bits in the command byte. We can use two of the five unused bits to control screen intensity.
3. While we're at it we should add screen blanking so that the screen is blanked during the DAC Sample phase and unblanked during the Hold phase.
4. We can add a Timer so we can control how long the dots are displayed. There are unused three bits left in the command byte. We can use them to select eight different timing lengths, or we can add an instruction to load eight bits of data into a Timer Register.
If we choose to add an eightbit Timer Register we will need another strobe to load the data into it. For this we would need to either expand the State PROM to increase the number of states or squeeze in the extra instruction by creative state decoding.
Adding a Timer requires that we add another input to the State PROM so that while the Timer is running the State PROM executes a Spin instruction in which State 6 is programmed to remain in State 6.
This amounts to creating a second block of instructions where all the instructions are the same. We'll call it the Spin instruction.
An example, where we have expanded the number of States and where State 8 is "Timer Write" is:
Instruction 6 (110)  TIMER (2 bytes = Cmd; Timer Data)
Current State Increment PC LOAD PC Next State
State 0  Latch Data to Instruction Latch 1 0 State 8
State 1  Latch Data to X Latch 0 0 State 7
State 2  Latch Data to Y Latch 0 0 State 7
State 3  Strobe SampleandHold 0 0 State 7
State 4  Push 0 0 State 7
State 5  Pop 0 0 State 7
State 6  (Nop) 0 0 State 6
State 7  Halt 0 0 State 7
State 8  Timer (Sets Timer Flag) 1 0 State 6
The State PROM is programmed so that when Timer Flag is asserted, all instructions are programmed as:
Spin On Timer Flag
Current State Increment PC LOAD PC Next State
State 0  Latch Data to Instruction Latch 0 0 State 6
State 1  Latch Data to X Latch 0 0 State 6
State 2  Latch Data to Y Latch 0 0 State 6
State 3  Strobe SampleandHold 0 0 State 6
State 4  Push 0 0 State 6
State 5  Pop 0 0 State 6
State 6  (Nop) 0 0 State 6
State 7  Halt 0 0 State 6
State 8  Timer (Sets Timer Flag) 0 0 State 6
Since the Program Counter was incremented before we entered the Spin cycle, when the timer is done we will be ready to decode the next instruction.
5. We can reduce the number of memory accesses by increasing the size width of the data bus to 16 bits so that both X and Y DAC values can be loaded in the same instruction. The cost is that the single bye instructions will also now be 16 bits, thereby wasting some memory.
If we were to keep adding these things, pretty soon we would end up with the Asteroids Vector Generator State Machine.
We could add an Arithmetic Logic Unit (ALU) to perform addition and subtraction and logical operations such as AND, OR, XOR, Bit Shifts, and Bit Rotates using a register called the Accumulator. (We should also add the capability of writing to main memory.)
At this point we have created a real processor. We should add an Interrupt capability.
It's easy to get carried away when you're designing a processor.
The traditional view is that processors are divided into two types: They are either Microprogrammed with a State Machine (called a Control Store) or use Random Logic.
Microprogrammed processors are faster to design, easier to modify, but slower in operation than Random Logic processors.
Random Logic processors take longer to design, are more difficult to modify, but are faster in operation than Microprogrammed processors.
However, the tradeoffs between Microprogrammed processors and Random Logic processors are not as clearcut as they used to be.
If you are building the processor out of discrete logic, the traditional view is certainly correct. Modifying a Microprogrammed processor mostly requires changing the State PROM. Changing a Random Logic requires rewiring physical devices.
However, if you are using a Field Programmable Gate Array (FPGA) to implement a Random Logic processor, changes are made to the FPGA software and the changes can be simulated in software first. Other software tools can also be used in designing the processor.
In the old days, PROMs (even Bipolar PROMs) were slower than random logic. When implemented in an FPGA this is not the case.
Also, in the old days, software tools for designing ICs were crude or nonexistent. You had to spend a large amount of money to fabricate an IC to find out if it worked.
Also consider that regardless of whether the processor is Microprogrammed or uses Random Logic it still has to talk to the same registers and perform the same microoperations.
As an intellectual exorcise, let's look at how we could implement Figure 11's Microprogrammed machine using Random Logic. This will be shown in Figures 12a  12i.
The design hasn't been optimized. It hasn't even been tested. You could probably improve it if you wanted to.
Processors are designed using a mixture of Art and Science, not Magic.
Referring to Figure 12a, we'll start by generating a number of clock phases by using a Counter which is Decoded and gated to produce the timing chart as shown. (Clock is the Master Clock.)
We start with a Reset pulse to enable the Counter. (A Halt command will stop the Counter.)
Each Instruction will begin at Clk 0. When an Instruction ends, it will assert a Clear command to clear the Counter with the next Clock signal.
In Figure 12b, we load the Instruction into the Instruction Latch and Decode it so we know what Instruction to execute. (Each Instructions ends by providing the memory address of the next Instruction.)
In Figure 12c we show how the Load X,Y Command is implemented.
Clk1 will increment the Program Counter to fetch the X DAC data;
Clk2 will load the XDAC data into the XDAC Latch and also increment the Program Counter to fetch the YDAC data;
Clk3 will load the YDAC data into the YDAC Latch and also increment the Program Counter to fetch the next Instruction;
Clk4 will operate the SampleandHold for the X and Y DACs and tell the Counter in Figure 12a to get ready to start a new Instruction cycle.
In Figure 12d, we have implemented the JMP Instruction.
All we need to do is Increment the Program Counter (Clk0) to get the JMP Address, and Load it back into the Program Counter (Clk2). This also ends the instruction.
A JSR is almost as easy. In Figure 12e we increment the Program Counter (Clk0) to get the JSR Address, and load it back into the Program Counter (Clk1) at the same time we Push the old Program Counter on the Stack and signal the end of the end of the instruction.
In the RTS instruction in Figure 12f, we
Pop the Stack during Clk1;
Load it into the Program Counter during Clk2;
Increment the Program Counter during Clk3, which also ends the instruction.
The reason for incrementing the Program Counter during Clk3 is that when we did the JSR the address on the stack was the address of the JSR target.
A NOP instruction just increments the Program Counter and ends the instruction. If we wanted a longer NOP we could have used Clk2 (Clk3, etc.).
And, finally, the Halt Instruction. All we do is disable the Counter (Figure 12a).
The Memory Addressing mechanism (Mux, Program Counter and ROM) is shown in Figure 12i.
The X Latch and Y Latch are the same as shown in Figure 11a.
The XDAC and YDAC circuits are the same as shown in Figure 10b.
The Register File Memory is the same as shown in Figure 11b.
And that is all there is to a Processor using Random Logic.
There is a very good article called A Brief History of Microprogramming by Mark Smotherman, Associate Professor, Department of Computer Science, Clemson University at:
http://www.cs.clemson.edu/~mark/uprog.html
Also worth visiting is: Selected Historical Computer Designs at:
http://www.cs.clemson.edu/~mark/hist.html
His home page is at:
http://www.cs.clemson.edu/~mark/
Lunar Lander, Asteroids, and Asteroids Deluxe
The Digital Vector Generator used in Lunar Lander, Asteroids, and Asteroids Deluxe was designed by Howard Delman.
Lunar Lander was released in August 1979. Asteroids came out in November 1979. Asteroids Deluxe came out
in 1980.
I have scanned the schematic for the Digital Vector Generator used in Asteroids and broken it down into printable sheets. I have also included the commentary from the original schematic. (PDF 772KB)
As you can see, although it uses a State Machine there is also a considerable amount of random logic.
I am also including the Data Sheets for ICs of special interest.
The Data Sheet for the 7497 Binary Rate Multiplier is here (PDF 282KB).
The Data Sheet for the 74LS670 Register File is here (PDF 319KB).
The Data Sheet for the AD561J DAC is here (PDF 211KB).
The AD561J is generally considered difficult and expensive to find. I found it on the Analog Devices Web site (www.analog.com). They sell it through their Web site for $32.59 (1's), $27.27 (25's), and $20.79 (100's).
(PDF 88KB).
Analog Vector Generators
As we discussed in the section on Digital Vector Generators, the Digital Vector Generator uses an Up/Down Counter connected to DAC. The DAC's output is deglitched with a SampleandHold. This is shown in Figure 5 which has been reproduced here. The result is that VOUT is a sum of all the clock pulses that have been applied to the Counter. (Add all the UPs and subtract all the DOWNS.)
In the Analog Vector Generator we will accomplish this using Analog means, which is why it's called the Analog Vector Generator.
We start with a capacitor.
The current through a capacitor is given by:
