Information Monoculture

?“When two opposite points of view are expressed with equal intensity, the truth does not necessarily lie exactly halfway between them. It is possible for one side to be simply wrong.” – Richard Dawkins

A reference book about reference books. It doesn’t sound like page-turning summertime beach reading, but Jack Lynch’s book, “You Could Look It Up,” is actually pretty interesting. In it, he describes the historical attempts to create dictionaries, encyclopedias, atlases, codices, and every form of reference work, catalog, compendium, list, and litany that you could think of.

The overarching theme of Lynch’s book is that all such attempts at creating a definitive reference work are, and always have been, doomed. It’s impossible to write down “everything worth knowing,” in part because knowledge keeps moving and changing. At best, you can capture a snapshot of your local culture’s view of the world at a certain point in time. But such works are often out of date even before they’re published (the first Oxford English Dictionary took 44 years to compile its 15,487 pages). Old dictionaries, medical references, and schoolbooks instead become time capsules, of more value to historians and anthropologists than to their intended audience.

In the last chapter [spoiler alert!], Lynch describes how the market for printed reference materials has collapsed in the Internet Era. It’s tough to sell physical dictionaries and reference books these days. Once a sign of middle-class status, the multi-volume set of encyclopedias is now a quaint novelty, replaced by Google and Wikipedia.

Lynch then makes an interesting and counterintuitive point: That ready access to online information may be making us dumber, not smarter. More information isn’t necessarily better information. He’s not just being a sentimental Luddite, pining for the days of letterpresses, hot lead, and parchment. On the contrary, he has a real point. And the same problem applies to our engineering careers.

Take Wikipedia. It is famously compiled by volunteers, and absolutely anyone can create, update, or edit any article. This could – should – naturally lead to a massive reference work with a reasonably balanced and even-handed approach to most topics. Crowdsourced articles shouldn’t betray any one author’s particular biases, right? Okay, sure, you’ll still find the occasional obscure article about a little-known football club that’s clearly been written by an avid fan, but for the most part, Wikipedia is self-correcting. It’s unbiased by design, right?

Not so fast. Despite its democratic origins, Wikipedia itself is still just one source. E pluribus unum – from many, one. It may have thousands of authors but it’s still one work. There’s only one article on photolithography, forgery, or the Ford Motor Company. Moreover, volunteers tend to write about topics they like, not necessarily what’s important in the broader sense. Hence, the article about Michael Jackson is five times larger than the one on Thomas Aquinas. O.J. Simpson gets more coverage than Mother Teresa and Florence Nightingale combined. Nintendo’s Legend of Zelda gets 160,000 words, far more than does Shakespeare’s Hamlet – and longer than Hamlet itself. One suspects that any professional editor, compiler, or lexicographer working on a “real” reference work would have applied a bit more editorial rigor than that.

When we research new components for a board design, we typically rely on the chip vendor’s datasheet specifications. The vendor is, after all, the canonical source for all vital statistics on their own chip. But whom do we reference before that? What sources of information do we consult when we’re still comparing Chip A to Chip B (and C, and D…)?

More and more, we rely on the vendors for that, too. We expect each vendor to put their best face forward and to present Chip A in a flattering light. If we’re lucky, that vendor might also offer a few comparisons with their competitors and give us a glimpse of how Chip A compares to Chip B. Such comparisons are always under ideal circumstances, of course, measured with a tailwind and a pinch of salt, but we all knew that going in.

But that’s still just one source. As complete and authoritative as the vendor’s information might be, it’s all coming from one direction. It might not be consciously biased, but that doesn’t matter. As Lynch points out, even the most even-handed compiler of facts winds up cataloging only the information he can find (or can verify), which may not correlate with what the reader wants to learn. In other words, a chip vendor can provide 100 pages of information on how to program their UART, but say nothing about how well the chip works in high-ESD environments. It’s not because they don’t know; they just didn’t think to mention it.

Multiple independent sources of information become important, not only to provide different points of view, but also to avoid an information monoculture. In biology, a monoculture is any ecological system with insufficient genetic diversity. Monocultures are susceptible to disease, infection, abrupt changes in environment, and genetically inherited defects. Entire crops can be wiped out by a single virus when there’s no genetic diversity. Anything that kills one plant will kill them all. Diversity breeds strength and resilience. Too much similarity exposes avenues for corruption and attack. (Broadly speaking, the same is true of computer viruses. A technical monoculture – Windows, for example – means that a single attack vector will work on many millions of similar computers.)

Even Google isn’t immune to the monoculture effect. Its search algorithms are notoriously secret, but we generally trust them to highlight the pertinent information we’re looking for without outward bias. (There are some exceptions.) But even Google is just one source. Type identical search terms into Bing, Yahoo!, DuckDuckGo, and other search providers and you’ll get links to sites that Google didn’t find, or that it relegated to the dreaded fourth page.

Back in the “antegoogluvian” era (Nicholson Baker’s term), researchers necessarily had to hunt and scramble for tidbits of printed information, conduct interviews, or perform their own research. It was slow, tedious work. But the information came from multiple sources (some more reliable than others). Just as important, the resulting reference work itself was one among many. Competing dictionaries contained different words, or differing definitions for the same words. Encyclopedias varied wildly in their coverage and depth. Medical references disagreed on diagnoses and treatments. In most fields of study, there was no single acknowledged gold-standard reference. No Google to query; no Wikipedia to which everyone turned. The variety kept debate alive and kept lexicographers, librarians, encyclopedists, and editors honest.

As engineers, developers, programmers, or managers, we need to keep our ears and eyes open and beware the trap of informational monoculture. Just because you Google different technical specs on different days doesn’t mean you’re getting all the information. It’s true that scanning posts on the support forum will expose some dirty laundry that the vendor wouldn’t have told you about, but even that’s biased. People don’t post problems they don’t have, so good experiences go unrecorded. And the anonymous users who have the time to post, and answer, hundreds of support questions probably aren’t your best source of information, anyway. The good engineers won’t be represented in the support group at all.

You can’t draw a trend line without (at least) two data points, and it takes three points (or more) to define a plane. The Nyquist rate applies to research, too. Without enough data samples, we get noise, not information.