feature article
Subscribe Now

Just Call Me an Indexing Fool

I love words. I wish I knew more of them. I like to believe that I have a reasonably extensive vocabulary at my disposal, but I also recognize that I know only a subset of all the words that are out there. The problem is exacerbated by the fact that new words are popping up like mushrooms, while the meaning of existing words can evolve over time.

There are many different aspects to words, including the way in which we write them down. As I mentioned in my recent column — Using USB4? Require ReTiming? Kandou Can-Do! I’m currently working on a book: Wroting Inglish: The Essential Guide to Writing English for Anyone Who Doesn’t Want to be Thought a Dingbat. One of the tidbits of trivia I share in this book is as follows:

When we come to written language, at one end of the spectrum we have pictographs/pictograms (a pictorial symbol representing a word or phrase) and logographs/logograms (a sign or character representing a word or phrase). Chinese characters provide a familiar example of logograms (a handful derive from pictograms). Imagine the problems associated with having tens of thousands of different characters to work with. Being encumbered with such a system has many ramifications, such as the fact that you are somewhat unlikely to go on to invent things like the Morse telegraph or conceive the concept of a typewriter. How would you set about creating a dictionary for such a language? Constructing a crossword puzzle would be problematic at best, and it would take at least six strong men to carry the box of tokens used in the Chinese equivalent of a game of Scrabble.

Toward the other end of the spectrum, we have alphabet-based languages in which a standard set of letters (symbols or graphemes) are used to capture the written word.  There are dozens of alphabets in use today, the most popular being the Latin alphabet, which was itself derived from the Greek. Many languages, including English, use modified forms of the Latin alphabet.

One interesting point to ponder is that we currently have only 26 letters in the English language (I say “currently because this was not always the case — also, the sounds associated with certain letters and letter combinations have changed over time). One issue here is that there are more sounds in the English language than there are letters to represent them. We call these distinctive sounds “phonemes,” and, depending on one’s regional accent, most people employ around forty-four.

Now, your knee-jerk reaction might be to simply add more letters such that we had one letter for each phoneme, but this wouldn’t actually solve the spelling problem because everyone would write things down phonetically (i.e., the way they said them using their own accent and/or dialect), which basically means everyone would have their own way of spelling things. In order to address this problem, we came up with a variety of techniques, like adding a “silent” ‘e’ to the end of a word to indicate that the preceding vowel should be pronounced in an elongated way; for example, “hop” is said with a short ‘o’ sound, while “hope” is pronounced with a long ‘o’ sound (similarly for the ‘a’ in “hat” and “hate”). An alternative approach is to double up on vowels, like ‘ee’ in “seek” and ‘oo’ in “boot.”

But we digress… A little earlier, while talking about Chinese characters in the form of logograms (and pictograms), we posed the question: “How would you set about creating a dictionary for such a language?” This task is vastly simplified when using an alphabet-based language, especially if we all agree to use the same ordering from ‘a’ to ‘z’, which means that we know that we’ll find the ‘g’ words between the ‘f’ words and the ‘h’ words, and so forth.

As an aside, I had no idea as to how complex it was to create the first “Full Monty” dictionaries until I read The Professor and the Madman: A Tale of Murder, Insanity, and the Making of the Oxford English Dictionary by Simon Winchester. This is one of the books that I would wholeheartedly recommend to anyone who is interested in learning more about “stuff” (see also my review: A Professor, a Madman, and a Love of Words).

In addition to enabling the creation of dictionaries, the use of an alphabet-based language in which we all agree to a defined ordering facilitates the creation of indexes at the back of books. As the Wikipedia tells us: “An index (plural: usually indexes, more rarely indices) is a list of words or phrases (‘headings’) and associated pointers (‘locators’) to where useful material relating to that heading can be found in a document or collection of documents.”

Indexes are obviously a jolly useful idea, so it’s unfortunate that they are so poorly executed in Oh so many books. It may be that authors are so exhausted by the time they finish their works that they cannot rouse the energy or enthusiasm to devote the time required to create a good index, or it may be that they assume this is a job for their publishers. In this latter case, I fear that the task is, oftentimes, either performed by a computer program that indexes everything exhaustively but ineffectively, or it’s handed over to a junior employee who lacks sufficient subject matter expertise to do the job justice.

Even in the case of a book with a relatively robust index — like A Crack in Creation: Gene Editing and the Unthinkable Power to Control Evolution by Jennifer A. Doudna and Samuel H. Sternberg, for example (see also my review: Life as We Knew It) — I find things that annoy me. For example, I recently used this index to track down the word “phages,” only to be instructed to, “see bacteriophages.” Arrggghhh (and I mean that most sincerely), so I bounced around as ordered to discover “bacteriophages, 46-50”.

Why couldn’t the index simply say, “phages (bacteriophages), 46-50” and also “bacteriophages (phages), 46-50”? Actually, there is a reason, in that this entry has a number of indented sub-entries, so I can see why the people creating the index and printing the book would like to keep things as concise as possible. From a user’s perspective, however, this is a pain in the nether regions. I don’t want to spend the rest of my life trapped in an index being redirected from one entry to another — all I want is to track down the information I’m looking for in the most efficient and timely manner possible.

As another aside, the worst example I’ve ever experienced of indexorial redirection (yes, I made “indexorial” up — sue me), is in Being and Nothingness by Jean-Paul Sartre. Although many regard this book to be a philosophical classic and major cornerstone of modern existentialism, I’m more tempted to describe it as the ravings of a deranged man, but perhaps this is because I once got trapped in its index and spent weeks trying to get out. The problem is that everything is defined in a circular manner. As you are reading, you are presented with a factoid like, “The ‘For Itself’ is inextricably intertwined with the ‘Is Itself’.” So, you go to the index looking up ‘Is Itself,’ which points you at another part of the book, where you find that ‘Is Itself’ is predicated by the ‘Of Itself.’ You return to the index to track down ‘Of Itself,’ which points to yet another part of the book where you read that ‘Of Itself’ is an integral aspect of the ‘For Itself.’ Sad to relate, your next trip to the index to locate ‘For Itself’ returns you to your original starting point (and people wonder why I drink).

The reason for my rambling musings here is that my chums Adam Taylor, Dan Binnun, and Saket Srivistava recently finished writing a book called A Hands-On Guide to Designing Embedded Systems. They just shared an early PDF copy with your humble narrator and asked me to read it and share my thoughts before it goes to press (I’ll be writing a full review here on EEJournal in the not-so-distant future).

First, I took a superfast skim through the PDF to get a high-level view of everything, which led me to a page titled “Index” with an accompanying note saying, “Index goes here.” Do you remember my saying that they asked me to share my thoughts? Well, my first thought was, “Who is going to create your index?”

I’ve spent an inordinate amount of time creating the indexes for my own books, so I scanned the first page of the index from Bebop Bytes Back: An Unconventional Guide to Computers (now sadly out of print), annotated it as shown below, and sent it to Adam for him to peruse and ponder.

The first page of the index from Bebop Bytes Back (Image source: Max Maxfield)

One thing I forgot to mention to Adam (I’ll point him to this column) is that when I’m reading a book myself, all sorts of weird and wonderful things will stick in my mind. Later, I may not be able to recall the exact name of the topic I’m looking for, but I may remember that it was associated with something tangential like a “sea cucumber,” so I go to look up “sea cucumber” in the index in the hope that it will place me in the ballpark of what I’m searching for.

It may not surprise you to learn that I’m usually disappointed. This is why, in my own books, even though they might be on subjects like electronics or math or computers, you’ll find unusual words in the index like “parrots,” “peas,” “pebbles,” and “pianos” (I just pulled these examples out of one of my books).

Looking at the image above, I see a prime example in the form of “accordion, invention of” (my index also tells me that this will be found in a footnote, which will save me time when I visit that page). Why would this be in a computer book? Well, the first British electric telegraph was created by the physicist and inventor Sir Charles Wheatstone, who also invented a form of accordion called the concertina. Six months after someone has read this book, they may remember “accordion” or “concertina,” but not recall the name “Sir Charles Wheatstone.” In the case of my index, they can come at things from multiple directions to more quickly track down what they are looking for.

I also index the same things in multiple ways, like “binary addition” and “addition, binary,” each pointing at the same page(s). In fact, I’m quite proud of everything illustrated in the image above, like the use of bold font to indicate the primary definition of an entry in a bunch of page numbers (the main definition may not be on the first reference). How about you? Do you have any thoughts you’d care to share on indexes in general and the format I’ve developed for my own indexes in particular?

One thought on “Just Call Me an Indexing Fool”

  1. my theory is that there is a general association of intelligent order with regimentation, fascism, tyranny and hence an aversion to it….the day-to-day world given over to otherly structuring is indeed reason enough to shuck some pointless burden….ah, the days when I could chug it back without much care…

Leave a Reply

featured blogs
Jan 21, 2022
Here are a few teasers for what you'll find in this week's round-up of CFD news and notes. How AI can be trained to identify more objects than are in its learning dataset. Will GPUs really... [[ Click on the title to access the full blog on the Cadence Community si...
Jan 20, 2022
High performance computing continues to expand & evolve; our team shares their 2022 HPC predictions including new HPC applications and processor architectures. The post The Future of High-Performance Computing (HPC): Key Predictions for 2022 appeared first on From Silico...
Jan 20, 2022
As Josh Wardle famously said about his creation: "It's not trying to do anything shady with your data or your eyeballs ... It's just a game that's fun.'...

featured video

Synopsys & Samtec: Successful 112G PAM-4 System Interoperability

Sponsored by Synopsys

This Supercomputing Conference demo shows a seamless interoperability between Synopsys' DesignWare 112G Ethernet PHY IP and Samtec's NovaRay IO and cable assembly. The demo shows excellent performance, BER at 1e-08 and total insertion loss of 37dB. Synopsys and Samtec are enabling the industry with a complete 112G PAM-4 system, which is essential for high-performance computing.

Click here for more information about DesignWare Ethernet IP Solutions

featured paper

Add Authentication Security to Automotive Endpoints Using the 1-Wire Interface

Sponsored by Analog Devices

By adding a single authentication IC, automotive designers can authenticate a component with only one signal between an ECU and endpoint component. This is particularly important as counterfeit and theft are increasingly problems in automotive applications. This application note describes how to implement the DS28E40 Deep Cover 1-Wire Authenticator in a system to provide authentication for optical cameras, headlamps, EV Batteries, occupancy sensors, and even steering wheels, and more.

Click here to read more

featured chalk talk

Complete Packaging for IIoT Devices

Sponsored by Mouser Electronics and Phoenix Contact

Industrial Internet of Things (IIoT) design brings a new level of demands to the engineering team, particularly in areas like thermal performance, reliability, and scalability. And, packaging has a key role to play. In this episode of Chalk Talk, Amelia Dalton chats with Joel Boone of Phoenix Contact about challenges and solutions in IIoT design packaging.

Click here for more information about Phoenix Contact ICS 50 Enclosure System