A couple of days ago as I pen these words, I received a message on LinkedIn from someone asking, “Can you advise me about what books a beginner can peruse for learning NASM?” To be honest, this was a bit of a tricky one, not least that I didn’t have a clue what NASM was, so I did what I usually do in a time of crisis, which is to have a surreptitious Google.
You can only imagine my surprise and delight to discover that NASM stands for “Netwide Assembler,” which is an assembler and disassembler for the Intel x86 architecture. NASM can be used to write 16-bit, 32-bit and 64-bit programs and — according to Wikipedia — is considered to be one of the most popular assemblers for Linux.
My first thought was, “Wow, people are still writing in assembly language!” My second thought was, “Why am I surprised?” I remember writing a couple of columns — Is Anyone Still Using Assembly Language? You Betcha! (Part 1 and Part 2) — back in the days of yore we used to call 2007. The answer at that time was a resounding “Yes,” and, on reflection, I see no reason why things should have changed.
These musings on assembly language caused me to play one of my favorite games, which is to daydream about what would happen if I were to fall headfirst through a timeslip and arrive back in the early 20th century (don’t judge me).
It’s strange how things change. When I was a young adult in the early days of the microprocessor in the 1970s, very few people had a clue as to how computers worked and how to use them. This is why books of that time on topics like “Assembly Language for the XXX” started off by explaining things like logic gates and the binary number system and what computers were and how they performed their magic. As a result, the relatively small number of people who did use computers at that time ended up knowing an awful lot about them.
These days, by comparison, almost everyone uses computers on a daily basis, but very few people have a clue as to what actually goes on inside. Today’s computer books typically have titles along the lines of, “Learn Visual Gobbledygook 6.0 in Only 21 days” (you have only 21 days because that’s when version 6.1 will hit the streets with a completely revamped — a.k.a. obfuscated — user interface). Even world-leading scientists performing mind-numbing calculations on esoteric problems typically have no idea as to how their floating-point numbers are represented, stored, and manipulated inside their computers, which is one of the reasons why errors so often creep into their algorithms, but that’s a topic for another day.
As an aside, I was just reading Jim Turley’s column — Where Do Programming Languages Go to Die? — in which he said: “…with each new generation of languages, we leave an old generation behind […] Are we losing touch with what a computer really is and how it works? […] I think all programmers, regardless of product focus or chosen language(s), should be trained in the assembly language of at least one processor. It doesn’t even have to be the processor they’re using — any one will do…” I have to say that I agree wholeheartedly with Jim on this one.
I have a friend (stop laughing, it’s true) who — as the result of an unfortunate wager made at a Christmas party after imbibing a little too much punch — is currently working on building a computer using only those technologies that were available in 1900. In this case, he’s predominantly using logic gates created out of tiny neon bulbs and light dependent resistors (LDRs).
The thing is that, if I were to find myself back in the early 1900s, after exclaiming “Oh no, not again,” I would be sorely tempted to make my fortune by “inventing” a digital computer, and paradoxes be damned! In this case, I would probably employ relays for the task because (a) these little rascals have been around since around the 1840s and (b) I’ve wanted to build a relay computer ever since I saw Harry Porter’s bodacious beauty.
Let’s perform a little thought experiment. Suppose you and I were transported back in time as we just discussed and, together, we constructed a relay-based digital computer. Can you imagine it sitting there in front of us? If I close my eyes, I can easily envisage this magnificent beast; in fact, I can practically smell it. All we need to make our lives complete is a program to run on our computer, which is where our troubles really start.
The phrase “pulling yourself up by your bootstraps” refers to the concept of improving your situation by your own unaided efforts. (As James Joyce wrote in Ulysses, “There were others who had forced their way to the top from the lowest rung by the aid of their bootstraps.”) These days we talk about “booting up” in the context of computers when they are first turned on. This terminology originated in the early days of computers, when a small amount of code — sometimes loaded by hand using toggle switches — was used to load a slightly more complex piece of code, which was used to load yet more complex code, with the process continuing until the machine was ready for use.
Well, this is sort of where we are with our virtual computer. What we need is some way to capture and load our programs, but we don’t yet have a computer language, and we wouldn’t be able to do anything with it if we did. Also, remember that this is the first computer on the planet, so we don’t have any existing computers we can use to help us in this process.
The lowest level in the computer language hierarchy is that of machine code. These are the binary instruction codes (opcodes) that instruct the central processing unit (CPU) what we want it to do along with any associated data (operand) values. Our very first programs would be written in machine code using pencil and paper and then entered into the computer’s memory using toggle switches on its front panel.
Trust me when I say that it wouldn’t be long before we grew tired of writing programs in machine code, because this is time-consuming and prone to error. The next step up is to devise a simple assembly language that employs mnemonics to represent instructions accompanied by a simple syntax to specify things like operand values and addressing modes.
An assembler is a software application (i.e., an executable program in machine code running on the computer) that takes a program written in assembly language and translates it into corresponding machine code. Of course, in our current “back-in-time” scenario, we don’t actually have an assembler yet. Instead, we would use our trusty pencil to capture our assembly programs on paper, hand-assemble them to machine code on paper, and then load the machine code into the computer using our toggle switches.
Somewhere along the way, we would devise a method for storing programs in machine code and automatically loading them into the computer — perhaps via paper tapes, for example.
Assembling programs by hand isn’t as much fun as it may sound to the uninitiated. What we need is an assembler, but how are we going to obtain such a beast? This is where the concept of pulling oneself up by one’s bootstraps really starts to kick in. What we do is use our pencil and paper to capture a simple assembler program in our assembly language. We then assemble this program into machine code by hand and punch this machine code representation onto a paper tape so we can load it into our computer whenever we wish.
Now we are in a position to capture a new program in our assembly language and punch this source program onto a paper tape. We would then use the machine code version of our assembly program stored in the computer’s memory to read the assembly source code for our new program off its paper tape, translate it into machine code, and store the machine code version on a new paper tape.
What do you think would be the first program we would run through this process (that is, using the computer to assemble it)? For myself, I’m pretty sure this would be a really simple “Hello World” equivalent — perhaps reading the states of some switches and controlling the states of some lights.
What do you think would be the second program to run through the process? Well, if I were doing it, I think I would take the source code for our simple assembly program — the one that’s currently captured using pencil and paper that we assembled by hand — punch this source code onto a paper tape, and then use our simple assembler running in the computer to assemble this new source code version of itself. I’d then compare the results to ensure that the assembler generated the same machine code that I had obtained via my hand-assembly.
This is where things start to get a little more interesting, because we can now design and capture a slightly more sophisticated assembler, and then use our original simple assembler to assemble the more sophisticated version into machine code. It may be that this more sophisticated version includes some cunning optimizations that weren’t available to the original assembler, so the next step would be to use this new version to re-assemble itself, if you see what I mean.
And so it goes, with us cycling around developing incrementally more sophisticated assemblers, each of which is first assembled by the previous version, after which it is reassembled by itself.
But wait, there’s more, because it won’t be long before we are tempted by the thought of writing our programs in higher level languages like C (remember that you and I are living back in time before C was a twinkle in Dennis Ritchie’s eye).
Tighten up your bootstraps, because here we go again. First, we would define the syntax for our C language. Next, we would identify a subset of the language that we could use to get things up and running. We would then use our assembly language to capture our first simple C compiler targeted at our C subset, after which we would assemble this utility into machine code and load it into the computer. Once again, the first program to be compiled would probably be of the “Hello World” variety.
What do you think will be the output from our C compiler? Newcomers to computers often assume that C compilers generate machine code, but this is almost never the case. Instead, the compiler takes the C source code and compiles it into assembly source code. This assembly source code is then passed to the existing assembler, which uses it to generate the corresponding machine code. This intermediate assembly step takes place “behind the scenes,” but users can instruct the compiler to show them the assembly source code if they wish.
At this point, we would recreate our original C compiler in our C subset (remember we currently have only the assembly language version) and use the compiler to compile itself. As before, we can now create a slightly more sophisticated version of our compiler that accepts a larger subset of our C language, use the original C compiler to compile this new version, and then use the new version to compile itself so as to take advantage of any optimizations we’ve added. And so it goes…
There’s an old programming joke that goes, “In order to understand recursion, you must first understand recursion” (I didn’t say that it was a good joke). The thing is that, once you start to wrap your brain around the problems associated with building a computer and associated programming languages and tools from the ground up, you really start to get a feel for the concept of recursion.
Just talking about this has stirred my creative juices. One day when I have some free time, I would really like to go through this entire process. Alternatively, I might just have a cold beer or three and tell myself how magnificent this machine would have been had I had the time to create it. Meanwhile, I will continue to hone my ideas in case I inadvertently run into another one of those pesky timeslips.