In my previous column, I commenced the task of cogitating and ruminating on what I might do to put food on the table if I inadvertently wandered into a timeslip and found myself transported back to the late 1930s or early 1940s. Just to increase the fun and frivolity, I pondered the possibility that this slippery timeslip also transported me into a parallel dimension in which computer science remained rooted in the analog domain.
One of the things I flatter myself I would be good at would be designing a digital computer from the ground up (the alternative of supporting myself based on my skills as an interpretative dancer really would not be a great option).
As you may recall, the process we determined to follow was to decide on an instruction set architecture (ISA) for our central processing unit (CPU), select an implementation technology (relays, vacuum tubes…), and build the hardware portions of the beast, including components like read-only memory (ROM), random-access memory (RAM), and devices like perforated paper product writers, readers, and media in the form of Teletype terminals and paper tapes or punched cards.
The next step on which we would have to focus our attention would be how to program our bodacious beauty. At its core, any digital computer understands only its own machine code; that is, the binary patterns of 0s and 1s that represent its core instructions and any associated data.
Our first option (the one adopted by people in our own slice of the multiverse) would be to capture our programs in machine code using pencil and paper. Assuming a 16-bit address bus and an 8-bit data bus, for example, such a program might look a bit like the following:
Machine code program captured using pencil and paper
(Image source: Max Maxfield)
We could subsequently input this machine code program into our computer’s memory using two banks of trusty toggle switches: one to specify the address and the other to define the opcodes (instructions) and operands (additional data or address values associated with the instructions).
In addition to being time-consuming and prone to effort, capturing programs in machine code is not as much fun as I make it sound, especially when you consider that the act of editing the program to insert or delete an instruction would change the addresses of everything that came after it.
Thus, our next step would be to conceive and document some form of symbolic representation that would be easier for humans to write, read, and understand. In addition to having mnemonics to represent each of the instructions supported by our CPU (e.g., LDA = “Load Accumulator,” DECA = “Decrement Accumulator,” JNZ = “Jump if Not Zero”), such a language would also support things like constants, comments, symbolic labels, assembly directives, and suchlike.
Returning to the way things actually happened in our world, since programs captured in this symbolic form eventually need to be “assembled” into their machine code counterparts, the symbolic form came to be called “assembly” language, while the act of translating these programs into machine code came to be called “assembling.”
Initially, we would capture our assembly code programs using pencil and paper. An example program might look something like the following:
Assembly code program captured using pencil and paper
(Image source: Max Maxfield)
The great advantage of assembly language over machine code is that we (humans) can actually read and understand it. At first, however, we would still have to assemble the program (i.e., convert it into executable machine code) by hand. Once again, we would do this using pencil and paper. And, once again, once we had hand-assembled our program, we would still have to enter it into our computer using our banks of toggle switches.
As we also discussed in my previous column, in addition to defining our version of the ASCII code (the binary patterns of 0s and 1s used to represent alphanumeric characters, punctuation characters, and control codes), our next step would probably be to use our hand coding and assembly process to create some simple utility programs to perform tasks like reading and writing paper tapes and/or punched cards.
We would also use our hand coding and assembly process to create a simple text editor and a simple assembler, where the latter is a program that can read an assembly source file and assemble it into machine code for us. At this point, we would almost certainly start to loop around using our simple text editor to capture the assembly source code for a more sophisticated assembler, and then use our current assembler to assemble the more sophisticated assembler. At the same time, we would probably use our current text editor to capture the assembly source code for a more sophisticated editor, and then use our latest and greatest assembler to assemble the more sophisticated editor.
On the one hand, this process is pretty mindboggling, but it also has a certain elegant simplicity about it (I’m also reminded of the old programmer’s joke: “In order to understand recursion, one must first understand recursion”).
One of the problems with an assembly language is that it is tied to its computer’s core instruction set, which means each machine has its own language. In turn, this means that (if we exclude the concept of cross assemblers from our discussion), it isn’t possible to write an assembly language program for one computer and then assemble it for use on another computer.
Another problem with assembly languages is that, although an expert user can craft concise and efficient code, this typically takes a lot of time, which means users typically do things only one way and they don’t spend a lot of time experimenting with alternative implementations.
Eventually, our users will become dissatisfied working at the assembly language level and will request an easier way of doing things. One solution is to define a higher-level form of language representation that allows us to use statements like “if ( (a== 6) and (b < 10) ) then…”. One such language in our real world is called C. Why ‘C’? Well, there was already a language called ‘B’ (seriously).
Once we’ve defined our new language, we can use our latest and greatest editor to capture a source code program in this language, but what do we do next? We still need some way to translate this source code into an equivalent machine code representation that can run on our computer. One way to perform this translation would be to use a program that we might call a compiler, so named because it will start by reading the source code and compiling lists of variables and functions and suchlike.
Of course, when you come to think about it, we are going to have to create our first C compiler in our assembly language, because that’s all we currently have to work with. It’s only after we’ve used our editor to capture the assembly language version of our compiler, and after we’ve used our assembler to assemble this into machine code, that we can finally use this machine code version of our compiler to translate (“compile”) programs written in our C source code into their machine code equivalents.
It possibly won’t surprise you to hear that our first C compiler will probably be a rudimentary implementation that supports only a subset of our C language. Once again, we will follow a cyclical process of using our current text editor to capture the C source code for a more sophisticated C compiler, and then use our current C compiler to compile the more sophisticated compiler, and around and around we go (and, once again, my head is spinning).
Just for giggles and grins, a C compiler doesn’t actually generate machine code, although it may appear to do so from the perspective of a casual observer. What actually happens is that – behind the scenes and under the hood – the compiler generates assembly code, which is handed over to an assembler to generate the final machine code.
There are multiple advantages of moving to a higher level of programming abstraction like C. Suppose we create several different types of computers, for example. In this case, once we’ve created a compiler for each computer, we can write a single program in our C source code and compile it into the machine code required by each machine. Another advantage is that working at a higher level of abstraction allows us to more quickly and easily explore different solutions for each programming problem.
So, there we have it. Three levels of abstraction: C, assembly language, and machine code. Ah, if only things were this simple. We still have to consider the fact that there are myriad high-level programming languages (see also What Is a Compiler, Anyway? by Jim Turley); also, that compilers are just one way to go because we also have interpreted languages, threaded interpretive languages, languages that are translated into bytecode that runs on virtual machines, and… then things start to get complicated.
I don’t know about you, but I find this sort of thing to be extremely interesting. I feel the urge to delve deeper into these alternative ways of doing things, but I don’t want to do so if you don’t share my enthusiasm, so I’ll leave it up to you to post a comment saying “Yay” or “Nay.” As usual, of course, I also welcome any other comments, questions, or suggestions.
13 thoughts on “Building a Computer, Should You Inadvertently Travel Back in Time (Part 2)”
Hi Max, interesting as usual and I am a Yay vote for more. Turing Machines have a lot of parts and pieces to make them go and most users never think about the parts and pieces. A Flowpro Machine is an alternate way to compute and comparing pieces of the two machines might be interesting to your readers.
Hi Ron — thanks for the kind words — in addition to the interpreter / threaded interpretive stuff, you’ve given me some ideas for a follow-on column — I have some other columns to write first, but I’ll add this to the list 🙂
“Been there, dun that!” C was great in my opinion mainly because of loops, pointers, and libraries of system functions. Before C there were branches and subroutine calls. (not sure of “goto’s”). There was a hidden notion that there was only one program running at a time (no multiprogramming). Yes, Virginia, those were the “dark ages”. Luckily those days have passed.
But the sun still isn’t shining too bright because everybody and his cousins are defining new languages because they have found some limitation of all the existing languages. Practicality and common sense have taken a hiatus.
I am a hardware engineer also. And I am here to tell you that load, add, store, branch cpus are not the only way to run programs.
There is a fundamental need for a stack in computation. No, Max, I did not say a “stack machine”.
Hi Karl — where would BASIC have been without its GOTO statement? And where would Reverse Polish Notation (RPN) be without a stack? I just added “all the languages” to my list of things to talk about next time (not in my next article, but the next one in this series).
And along with RPN comes the “shunting yard algorithm”. And that leads to the Abstract Syntax Tree(AST) which just might be the key to building a state of the art computer.
Python is a very popular interpreted language and C# is a newer compiled language(my favorite!), both based on an AST which is based on, you guessed it, RPN/ stack.
Looking on the hardware side of things, there are embedded memory blocks, especially on FPGAS, just sitting there wasting power because nobody has figured out how to use them(of course, I have).
Of course you have LOL
Hi Max, YAY! Very interesting. Now what if you were to inadvertently wander into a timeslip and found yourself transported forward to the late 2030s or early 2040s?
Would you be defining ALUs with qubits? Would Quantum entanglement zero delay time enable a 1-bit ALU to calculate 64 bit numbers as fast as a 64-bit current ALU, thus simplifying CPU architecture?
Hi John — first of all, I doubt that quantum computing is ever going to “simplify computer architecture” LOL I actually plan on checking the 2030s and 2040s out, but since I don’t currently have a functional time machine (you simply cannot get the parts in Huntsville, Alabama), I’m taking the scenic route to get there (or “then,” I suppose). Now you’re reminded me of the episode of The Big Bang Theory when Sheldon starts to talk about the tenses you need to use to talk about event sequences in time (https://youtu.be/oNDbl3nZdyA) This is a classic 🙂
Hi Max, the first computer I worked on had toggle switches and LEDs. I kind of missed the blinking lights… During the COVID lock-down, I thought I’d try to build a simple computer the old fashion way with SSI/MSI logic (and lots of LEDs), but new MSI logic has become as hard to find as unicorns. Simple PLDs, just about as rare, but still in production. Then I considered all the things you did, assembler, editor, high level language, file system. Egads, there isn’t enough life left in this body to do all of that.
What if I took the venerable 8080/8085 and put it into an FPGA? All the above tools are available to run on a PC, either native, or in a CP/M simulator – wait! why not just write an 8080 emulator to run on an ARM host? Could an ARM Cortex-M7 running several hundred time faster than the original 8080 be able to emulate to the bus level (what’s the point of a computer if you can’t interface it to something)? The answer is yes. Not wishing to program in ARM assembly language (finite life issue again) I sat down with an ancient Intel data book and began constructing state machines as described by Intel into “C” code. I soon settled on emulating the 2nd generation 8085 bus, it required less support hardware and I could use the Altaid-8085 SBC as a test platform (thanks to David Hunter).
Initial bus timing tests indicated that using switch() statements for state machines involved too much overhead (even when optimized for maximum speed). Converting to tables of function pointers did the trick (something I never thought I would do). A SPI port was used to read the switches and dump the virtual registers out to an LED array in almost real time. That’s ALL the registers! Not just address & data as the old Altair/IMSAI boxes.
So now I have a computer like the old days, toggle in the machine code directly, or load the bootstrap address in ROM from the switches, toggle RUN and start it up. It’s mesmerizing watching the blinking LEDs as Palo Alto Basic searches for prime numbers, or I hunt Klingons.
Hi Neil — this sounds AWESOME — I want to see a picture (or a video) — can you email it to me at firstname.lastname@example.org? One of my friends has one of the original Altair 8800 (he says the aluminum case could withstand a house falling on it) — I really want that machine! I would love to get my hands on an IBM 360 control panel (http://infolab.stanford.edu/pub/voy/museum/pictures/display/3-1.htm) — I’d be happy just with the parts with the switches and lights — and then use an Arduino to control everything LOL
Are you sure those lights were LEDs? I thought toggle switches disappeared before LEDs existed.
I’m guessing the lights on the IBM panel were small incandescent bulbs. If I ever lay my hands on one of these panels, I’ll have to decide whether to keep the bulbs or replace them with LEDs.