The basic operation is a Multiply/Accumulate with a Pre-Multiply Subtract of the form:
ACC = ACC + (A-B) * C Equation 2
Although there were some very fast bipolar multiplier/accumulators available from TRW, they were much too expensive (and Star Wars was designed before the invention of the DSP), so I settled for 74LS384 Serial Multipliers and 74LS385 Serial Adder/Subtractors.
The 74LS384 multiplies an 8-bit word multiplicand by a 1-bit multiplier. It produces a 1-bit product and keeps a running sum for multiplying the next most-significant-bit. By using two 74LS384s we can multiply a 16-bit multiplicand by a 1-bit multiplier. The 74LS385 Serial Adder/Subtractor operates with two serial bit streams, adding with Carry or subtracting with Borrow (however it is set up) and producing a 1-bit result while saving the Carry (or Borrow) for the next most-significant-bit.
With a 12 MHz clock, Equation 2 takes 2.58 us to perform. This is for 31 clocks. Normally it would be 32 clocks to get the entire 32-bit result of multiplying two 16-bit numbers. Why there is one less clock relates to how we represent the number 1.000. I think it will take another article to explain it.
Although a TRW Multiplier/Accumulator would do it in 50 ns, the Bit-Slice Math Box would probably take
10 us (I'm guessing).
The basic block diagram for the Multiplier/Accumulator with Pre-Subtract (the MAC) is shown in Figure 25.
For controlling the Multiplier/Accumulator and keeping it supplied with data I designed a simple Data Pump. Although it is microprogrammed, it is not a State Machine.
We start with a PROM to provide the various strobe signals to load the shift registers and the parallel input to the Multipliers and to start the counter to provide the correct number of serial clocks to the MAC.
The PROM is programmed to provide the desired set of microinstructions starting from a selected address. After each micro-instruction a counter increments the PROM address. The last instruction in the sequence sets the MHalt flag which stops the process and alerts the 68B09E that it has finished its task.
The programmer selects different programs by selecting the appropriate starting address in the PROM.
The data for the MAC is stored in a completely separate memory (a RAM) that the 68B09E writes to when the MAC is halted. The results of the MAC are not stored in the RAM; the 68B09E reads the contents of the Accumulator directly.
The address for the RAM comes from a counter that can be set by the 68B09E. After each program this counter is automatically incremented to make it easy to perform an operation on blocks of data. An example is when we transform the points that make up an object (like Darth Vader's ship).
The block diagram for the Matrix Processor Controller is shown in Figure 26.
The Divider for performing the Perspective Divide is a totally separate unit. It is unsigned and is of conventional design.
Because they are separate, the programmer can perform Perspective Divides while waiting for the Matrix Processor to deliver the next transformed point.
I designed the Matrix Processor specifically to perform Equation 1. However, Greg kept figuring out how to make it do more things, such as Dot Products and Cross Products.
The 3D math that Star Wars was capable of performing allowed any object (and the observer) to be in any orientation. However, it was decided that players might be confused by being approached by an upside-down TIE Fighter, so they were forced to be right-side up most of the time.
I want to mention the Sound Board, which had its own processor (another 68B09E). Although other companies had done games where the Sound System had its own processor, this was a first for Atari.
Star Wars didn't start out with a separate processor for sounds. In fact, in the first prototype board there was only one 68B09E that did sounds as well as the game. However, well into the project Greg and Norm absolutely needed more ROM space for the game and the 68B09E was completely full. I jokingly suggested that the best way to give the game more ROM was to add a separate processor for sounds. They liked the idea; so did Mike. And so did Rick Moncrief, the Director of our group. So, I came up with a Sound Board with its own processor. And, as long as I was at it, I put in four Pokeys. (The Sound System always had the Texas Instruments TMS5220 Speech Synthesizer.)
At one point I suggested we modify the Sound Board so it could accept either a Quad Pokey or four Pokeys. I was told by Engineering Management, "No. We don't want people to know that a Quad Pokey is four Pokeys."
I also put in something else; an image expander, also known as a stereo faker. The way this works is that you take your mono signal, delay it, and use it as a difference signal that you separately add to the original signal to produce the Left Channel and subtract from the original signal to produce the Right Channel (or vice versa). The result is to randomize the phase which your ears interpret as, "Where the Hell is that coming from?"
Unfortunately, the speakers in Star Wars are mounted too close together and too far from the Player to produce much of a stereo effect. It sounds great in headphones, though.
At some point, Rick hired Earl Vickers to do sounds and speech development. Up until then, the game programmers were responsible for sounds, which they generally hated doing, so usually the programmers would use Pokey sounds that had already been developed. That's why, for many years, most of Atari's games sounded alike. The other game groups decided that having someone just do sounds was a good idea so Earl became the nucleus of a completely separate Sound Group that did sounds for all of the games. Well, almost all of the games. Hard Drivin'/RaceDrivin' was an exception. I will leave that story for perhaps another time.
There is a design on the Sound Board artwork that is the signature of the PC Board designer, Denny Simard. Denny was experienced in doing analog boards and was a member of our group, as opposed to being in the PCB Group. He was a nice guy and also looked *exactly* like Frank Zappa.
Unfortunately, he only got to do a few boards before being caught in one of the first Great Layoffs. [Which became "Reduction-In-Force" and then "Downsizing"].
The control yoke for Star Wars was a downsized version of the control from Army BattleZone (minus the palm switches), which came directly from an actual Bradly Fighting Vehicle. (It was the Gunner's control.) You might have noticed that the centering of the Star Wars control yoke is funny at times. Star Wars originally used a Pokey to read the pots. At that time, people either made their own A/D converter with a counter, a comparator, and a ramp, or they used Pokey. The Pokey was a full custom IC designed for the Atari 800/400 to read pots and keys, which gave it its name, POts and KEYs. There was some room left over so they put in some crude square wave sound generators as well as a UART. Unfortunately, Pokey does a really awful job of reading pots; it is guaranteed to produce occasional wrong values. The software to deal with it is pretty nasty. After Greg Rivera brought this to my attention I took the daring step of actually putting in a real A/D (Gasp!), the ADC-0809. Unfortunately, many people continued to use the original code to treat the A/D values as though they had come from a Pokey. Like Greg. That is why the Controller in Star Wars keeps getting recentered, usually badly.
During an early phase of development, when it was still Warp Speed, we tried a two-player game by connecting two monitors to the hardware with a circuit to allow the software to independently blank the video to the monitors. The vectors were still drawn on both monitors, but you didn't see them on the blanked monitor.
There was still only one hardware system, so it could only calculate and draw half as many vectors for each player, but it was an instructive experiment. It turns out that if you fire the laser (phasor, photon torpedo, photonic cannon, or whatever) by pointing the ship, the best strategy is to sit and fire at the other ship. If you try to maneuver, you have to point away from your opponent's ship. When you point away from your opponent's ship, you can't fire at it. (Well, you can but you won't hit it.) Therefore, the ship that tries to maneuver, loses. It wasn't much fun.
At one point during the development of the game there was a Thrust Control. However, the faster you could pilot your ship, the less game time you got, so Mike and Greg took it out. The Thrust input is still shown in the schematic.
At one time, the game had a joystick instead of the Flight Controller. The people at the focus group were confused about which way they were supposed to move the stick and we were able to use that to justify the higher cost of the Flight Controller
Star Wars joke: "No, No, No, " exclaimed Obi Wan at dinner one evening, " Use the Fork, Luke!"
When Star Wars came out, it was the Number #1 Game until Dragon's Lair came out. Unfortunately, Dragon's Lair came out two weeks after Star Wars.
Nonetheless, we sold around 15,000 Star Wars games with a margin of about $1,000 per game for a total of about $15M. The period 1983-1984 was especially tough for Atari. We referred to it was "Going Supernova."
Warner Communications gave Atari Consumer to Jack Tramiel for $70M in paper which they later tore up. After Jack Tramiel had built up Commodore from a small manufacturer of business equipment and turned it into a successful maker of personal computers, he had a disagreement with the chairman of the company, and quit. He then turned around and offered to take Atari Inc. off of Warner's hands. Two things to note:
1. Warner retained a minority ownership
2. Tramiel didn't want Coin-Op because he didn't think games had any future.
We (Coin-op) were given to Namco in payment for the royalties that Atari, Inc. owed Namco in royalties for Consumer's Pac Man game.
Toward the end of that period we were told that Atari Games (our new name) had just barely broken even.
Presumably, that's why we hadn't been shut down or otherwise gutted.
Breaking-even included the $15M from Star Wars.
Major Havoc and the Empire Strikes Back
After Star Wars, the remaining XY games that were produced were Major Havoc and The Empire Strikes Back.
Major Havoc (November 1983) had Linear Scaling, a Window circuit that operates on the Y Axis only, the AVG Gate Array, and two 6502 processors.
The Empire Strikes Back (March 1985) was a kit for Star Wars.
It featured the Slapstic for expanding the 64 KByte address space of the 6809 as well as providing security. Slapstic was a small semi-custom IC and had three security levels: Simple, Medium, and Complex. Most games used the Simple level because it was the easiest for the programmer to deal with. Most programmers preferred to spend their time working on the game, so Slapstic security was usually left until the night before the ROMs had to be released for production.
The last game to use Slapstic was Hard Drivin'/Race Drivin'. After that, Slapstic was scrapped and replaced with a GAL6001 PLD which had buried states that were not brought out to the outside world, making it difficult to analyze the states programmed into it. It was also supposedly secure once the security bit was programmed. Unfortunately, the GAL6001 were cracked even more easily than the Slapstic in Simple Mode.
This is about the hardware.
I have scanned the schematics and put them into a printable form (PDF 1.8MB). There is also a picture of the board (526KB), one of only two boards known to exist. The picture was taken by Scott Evans and comes from the TomCat article.
TomCat used a 68010 running at 6 MHz. The 68010 was a terrific processor. (I still like it.)
The 68010 is a 32-bit machine internally, with a 16-bit bus to the outside world. (There was another variant, the 68008 which was 32-bits inside with 8-bits to the outside world.)
One of the nice things about the 68010 was that it had a small cache which was large enough to hold the following instructions:
* Initialize Register R1 = Source Address, R2 = Destination Address, D1 = # of words to move
loop: MOVE.W (R1)+,(R2)+ * move the word whose address is contained in Register R1 to the
* address contained in Register R2, increment the contents of
* Register R1 and Register 2
DBNE D1,loop * If the contents of Register D1 are not zero, decrement the contents of
* Register D1 and branch to loop
This allows you to move data around faster than with the vanilla 68000 since it does not have to spend time repeatedly fetching the instructions from external memory.
The 68010 also executed the Divide instruction faster than the 68000. (I guess they fixed the microcode.)
Unfortunately, the 68010 never caught on and was always more expensive than the 68000.
The 68010 presents special problems when interfacing it to the AVG, or any other processor for that matter.
The 6502 has a synchronous memory bus where the master clock is divided into two phases (Phase 1 and Phase 2). The address is always generated during Phase 1 and all memory accesses take place during Phase 2.
The 68010 has an asynchronous bus where the number of clock phases (and the phases where memory is accessed) depend on the instruction. This is a consequence of the 68010 being a microprogrammed machine with its own state machine, while the 6502 is a random-logic machine. If you have read the previous section of this article on the Vector Generator State Machine you know exactly what the difference is.
Because it has an asynchronous bus, the 68010 requires that the hardware acknowledge that it has completed the bus transaction by asserting a Data Acknowledge signal (/DTACK). Until /DTACK is asserted, the 68010 will just sit and wait.
To deal with the problems of an asynchronous bus Quantum used a rudimentary circuit to generate Wait states for the 68000 so it wouldn't interfere with the Vector Generator State Machine when the Vector Generator State Machine was accessing data from the Vector Memory it shared with the 68000. As I recall, there were times when the 68000 stalled the Vector Generator when it didn't need to, and there were times when the Vector Generator stalled the 68000 when it didn't need to, either. To be fair, it's a difficult problem because the 68000's asynchronous bus may start a memory access in the middle of a Vector Generator access.
I felt I could do better.
In addition to /DTACK, the 68010 provides three outputs (F0, F1, and F2) that tell the hardware where the 68010 is in its execution cycle.
I decoded these signals so I would know exactly where the 68010 was in its execution cycle.
I also decoded the Vector Generator state so I would know exactly where it was during its execution cycle.
Then I used a PROM to decide who would get the memory and who would get stalled depending on who was where.
The 68010 state decoding is shown on Sheet 2A. The Vector Generator state decoding is shown on Sheet 9B.
I later used a variant of this method in Hard Drivin'/Race Drivin' so its 68010 could talk to the TMS34010 Graphics Signal Processor.
On to the Vector Generator.
The boards that were made contained the same basic Vector Generator as the one used in Star Wars. However, I modified one board in a special way that dramatically improved the quality of the vectors.
The original circuit is shown in Figure 18 and is reproduced here as Figure 27. The vector is started and stopped by turning the DAC reference on and off.
In the new circuit, shown in Figure 28, the DAC is on all the time. It is the connection between the voltage produced by the DAC and the input to the integrator that is turned on and off. (I am using the symbol for an analog switch.) A series-only or parallel-only switch does not work because of leakage in a series switch and non-zero resistance in a closed switch. Combining the two is the solution. The inverter causes one switch to be off when the other is on, and vice versa. (This is called an 'L' Switch.) Note that I have converted the DAC output current into a voltage before applying it to the 'L' Switch. In the actual circuit shown in the TomCat schematics on Sheets 12A and 12B there is an additional potentiometer on each axis to null out any offset to the integrators. The LF13201 analog switches turn on and off faster than the DAC, producing vectors with sharply defined end points.
Next is the Game Link.
I had always wanted to link games together. I even described it in the game description I brought to my job interview. In Star Wars I brought the Pokey's Serial I/O Ports out to the edge connector, hoping to get people interested in linking games. It didn't happen. So I did it in TomCat.
The TomCat Game Link is a 16-bit interface with full handshaking. See Sheets 7A and 7B.
My best work on TomCat, though, is probably what I did with the TMS32010 Digital Signal Processor (DSP) that had only recently been brought out by Texas Instruments. Although it wasn't the first single-chip DSP (that honor goes to the NEC uPD7720), it was the most successful. (As a result of Texas Instruments' success with DSPs they would eventually dump many of their other product lines such as DRAMs and EPROMs in order to concentrate on DSPs.)
I used the TMS32010 for performing the 3D math calculations.
The 32010 had a 200ns instruction cycle time and most of their instructions required only one cycle, including the Multiply/Accumulate instruction.
The Multiplier was 16 x 16 bits with a 32-bit Acculumator, and all this in 200 ns! This was a gigantic leap forward compared to the Bit-Slice Math Box and the Star Wars Matrix Processor.
It also had a 0-15 bit barrel shifter, so you could shift anywhere from 0-15 places in one instruction cycle.
The DSP had 288 bytes on on-chip RAM and was available with 3K Bytes of on-chip (masked) program ROM or it could use external memory.
I used external memory.
And that presented a problem.
It seems that Texas Instruments had neglected to consider that someone might want to use the TMS32010 with a Host Processor sharing memory with it.
There was no way to stall the TMS32010. And even worse, Texas Instruments specifically warned that the contents of the internal RAM were not guaranteed to stay put when the device was given a Hardware Reset.
I could have used a parallel interface, but that would have slowed down the data transfer more than I wanted.
The TMS32010 had one saving grace, a Conditional Branch instruction that tested the state of an I/O pin (the BIO pin). The instruction was BIOZ which would branch to
when the BIO pin was low.
So here's what I did.
I used RAM for the Program Memory with buffers that could connect it to either the 32010 or the 68010.
I then decoded the 32010's address bus to set a flag when the 32010 address was in any of its first four memory addresses (addresses 0-3).
When the 32010 was in any of the first four memory addresses I switched the buffers to disconnect the 32010's address and data buses from the Program RAM (giving it to the 68010) as well as switching the 32010's data bus to a set of buffers hardwired to produce a BIOZ 0 instruction. Therefore, when the BIO pin was low and the 32010 was at one of the first four memory addresses, it would branch to address 0 and stay there until the BIO pin (controlled by the 68010) was released.
I chose the first four addresses because the 32010's Reset vector was at address 0.
The way it worked was that the 68010 started out by setting the BIO Pin and resetting the 32010, sending it to address 0 where it would 'stick.' The 68010 could then talk to the Program Memory, setting up program code and data.
When the 68010 released the BIO pin, the 32010 would cease branching to address 0, would breeze past addresses 1, 2, and 3, and get to address 4 which would reconnect the Program RAM to the 32010.
The 32010 would then start executing the program in the Program RAM starting at address 4.
Each program would end with a branch to address 0, where it would 'stick', alerting the 68010 that the program was done.
The 68010 was then free to access the Program RAM in order to read the results of the program and set things up for the next program.
In practice, several programs were downloaded into Program RAM during initialization so the programmer (me) only had to place a Branch
instruction at Address 4 to call the desired program.
It worked out very well.
I later used the TMS32010 in Hard Drivin', but only on the Sound Board. By then, Analog Devices had come out with a DSP that was better suited for doing 3D calculations (the ADSP-2100).
I also used the TMS320P15 (a TMS32010 with an on-board EPROM Program Memory) in Race Drivin', mostly to provide security. I supported another team's project using the TMS320P15 for security. The TMS320P15 was supposedly hack-proof once the Security Bit was set. It wasn't. The reason I know is that in that project I put in an undocumented program that sent out the Atari copyright message in Morse Code. Because of the DSP's speed it could be received by just placing a standard AM radio near the PC Board. The program was called only by grounding an innocuously unused I/O pin during Reset. When Atari received a counterfeited game to examine, I placed an AM radio near the board, grounded the aforementioned I/O pin, gave it a Reset, and heard my Copyright Message on the radio.
The name of that game was Road Riot.
The Future of XY There is something different about XY. Perhaps it is because an object is completely drawn before moving on to the next one, while in a raster game objects are drawn in pieces at different times on different scan lines.
It may simply be nostalgia; the old memory of when XY games had so much better resolution than their raster brethren.
And, since there haven't been any new XY games in 16 years, we may associate XY games with that period of our lives.
Some people are interested in XY games the same way other people are interested in old radios. Some people are purists; they would rather have a non-working old radio than use a few modern parts to get it going.
Some people are perfectly happy with a modern replication; they are in love with the idea of an old radio. Either way is fine. The nice thing about a hobby is that you don't have to justify it.
There are several possibilities for someone wanting to continue the development of XY technology.
1. Put the entire State Machine Vector Generator into a Field-Programmable Gate Array (FPGA). FPGAs are now large enough that you could probably include the vector memory as well. The downside is that FPGAs that are large enough to do this are available only in Surface-Mount Devices (SMDs) which require multi-layer PC Boards with solder masks. In prototype quantities these boards are several times more expensive than two-layer boards with no solder mash. SMDs also require specialized soldering equipment, which is also expensive. Finally, large FPGAs tend to operate only at 3.3 Volts, so that if it is necessary to interface them to 5 Volt circutry, some kind of level translation is necessary.
2. Before I ended TomCat, the next thing I was going to try was to replace the State Machine Vector Generator with a TMS32010 DSP. Today, there are much more advanced DSPs available such as the ADSP-2186 from Analog Device (www.analog.com). The ADSP-2186 is a fast (33 MHz) DSP with enough RAM memory on-chip to do the whole thing. All you need is the analog back end. It's even a 5 Volt part. One of the downsides is that it is an SMD part, The other downside is that Analog Devices has not make it easy to interface to other devices to it (like latches). Although the buses are brought out to the outside world, the signals are very fast and the address and data setup and hold times are difficult to work with.
3. Replace the entire State Machine Vector Generator with a Pentium-class PC. Just put the back-end circuitry (and a vector timer) on a PCI (or AGP) card. You would probably have to run DOS or Linux. Windows would be difficult to work with and might be too slow.
The back end I was referring to could be the one that ended up in TomCat, shown in Figure 28.
It could also be an updated version of the Digital Vector Generator. The AD561 was a 4 MHz DAC. Today, you can easily get a 30 MHz DAC. (Of course, you would need an equally fast Sample-and-Hold.)
A third alternative is something I have not completely worked out. Each axis has two DACs, two analog multipliers, an adder, and a Ramp. Referring to Figure 29, as the Ramp goes from 0 to 1 Volt, the 1-Ramp goes from 1 to 0 Volts, causing VOUT to smoothly transition between position X2 and position X1.
We start at position X2 when Ramp = 0 and 1-Ramp = 1. When the Ramp reaches 1, 1-Ramp will have reached 0, so we will have arrived at position X1. At this point we can change X2 for the next end point. We reverse the Ramp so when it reaches 0, we will end up at the new position X2.