The third hard problem
✑✂glasperlenspielprogrammingAccording to a classic joke attributed to Phil Karlton, computer science has only two hard problems: naming things and cache invalidation. They are hard because no algorithm can solve them: good names need empathy, and cache invalidation requires systems thinking and careful analysis. I often come across another such problem, pervasive and insidious yet rarely noticed: embedding a general graph into a hierarchy. I call it tree mapping.
Spaces, trees, and webs
Our brains excel at dealing with physical space. We can intuitively orient ourselves in an unfamiliar city and draw an accurate map of a garden we visited decades ago.
We perceive space as hierarchical and localized: each region interacts only with its neighbors. Objects combine into structures we grasp as units: particles form atoms, atoms form molecules, molecules form bodies, and so forth, up to galaxies and superclusters.
Hierarchies are so natural to us that they have become our universal organizing tool. We sort things and ideas into labeled boxes and put these into larger boxes. For physical objects, we have no choice: a book can only be on one shelf in one library at a time. Ideas and information, however, resist taxonomies. They form intricate, far-reaching webs.
Trees formalize hierarchies. They are universal space organizers: B-trees slice keys in databases, k-d trees partition space in graphics, and abstract syntax trees group token sequences in compilers.
Trees are ubiquitous, but they can’t model webs. Thus, tree mapping is an imperfect embedding of a web into a tree.
Let’s look at a few examples.
File systems
The digital world thereby allows us to transcend the most fundamental rule of ordering the real world: Instead of everything having its place, it’s better if things can get assigned multiple places simultaneously.
Imagine receiving a bill from your dentist.
In which folder do you file it?
Archive
? Medical
?
In the 2026 taxes
project folder for future tax returns?
Do you photocopy the bill and put it in all these folders?
The question might seem moot in the age of Dropbox and Google Drive. Yet Victorian-era gentlemen might have faced the same questions sorting letters that we face sorting virtual paper. The user-facing design of our state-of-the-art distributed systems inherited most of the restrictions of the physical world.
Operating systems face the same dilemma:
should files be organized by the application they belong to or by type?
Windows and macOS traditionally preferred the former option, packaging applications into (mostly) self-contained bundles.
Linux systems embraced the other approach,
scattering packages all over the filesystem:
libraries go to /usr/lib, documentation—to /usr/man, configuration—to /etc.
This choice comes with a trade-off:
shredding packages simplifies sharing and tooling
(man only looks in a handful of directories),
but complicates software management.
The pain of packaging on Linux gave rise to macOS-inspired bundle formats,
such as Snap and Flatpak.
If you’ve ever tried to organize a non-trivial code repository,
you’ve likely run into the same issue.
Modern projects have components implemented in different languages, for example,
TypeScript on the frontend and Rust on the backend.
You can group files by the component they implement (/acme/payments/index.ts and /acme/payments/main.rs) or by language (/acme/ts/payments.ts and /acme/rs/payments.rs).
Slicing by component is easier for humans because it reflects the organizational structure.
Most tools don’t support this setup and force a technology-centric approach.
Google and a few other companies bit the bullet and organized their repositories by project (/search, /shopping, /maps),
developing a plethora of language-agnostic build tools along the way
Google’s Blaze was among the first such tools.
It inspired Pants, Buck, and Please, and was later open-sourced as Bazel.
.
These dilemmas are mostly self-inflicted. There is no inherent reason for a digital file system to be a skeuomorph of a file cabinet. Several projects experimented with web-like filesystems (BeFS and WinFS, for example), but none changed the status quo. Now that the Web has accustomed us to tags and links, our filesystems may finally become webs.
Writing
the writer’s goal is to encode a web of ideas into a string of words using a tree of phrases.
Books are the epitome of hierarchy. They have chapters that contain paragraphs with sentences composed of words made of letters, all sliced into neat numbered pages. The ideas we express on these pages are anything but hierarchical.
Many stories seem linear. Events appear in the order they occurred. Chapter boundaries mark scene changes and time jumps. But beneath every story lies a web of ideas. Untangling this web into a sequence of words that recreates the web in another mind is the primordial struggle of every writer.
In fiction, the web involves relationships between characters and the emotional connections that the reader forms with those characters. Good fiction uses sequential constraints to its advantage, presenting events in an order that engages the reader most. People feel joy when they connect the dots and assemble the web.
When the web gets dense and abstract,
the writer’s job becomes daunting,
and so does the reader’s.
Math textbooks are a perfect example.
Math might seem like a Lego brick tower:
you lay simple concepts at the foundation
and stack increasingly elaborate ones on top.
Indeed, each coherent presentation of math since Euclid’s Elements
follows this pattern.
Yet the choice of foundational blocks is often arbitrary.
Mathematical concepts shift shape and gain nuance as you learn how they relate to each other and to other subjects
I remember waking up a few months after finishing my real analysis course
and realizing that I finally understood limits,
thanks to the ordinary differential equations course I was taking at the time.
Even my understanding of natural numbers seems to change with every textbook I read..
The selection of ideas and the order of their presentation mold the reader’s experience,
so no two math books have the same table of contents.
The notion of a category itself arises at all levels of abstraction, and so does the concept of a monoid and a monad. Which one is the most basic? As it turns out they are all interrelated, one leading to another in a never-ending cycle of abstractions.
The next time you sit down to an empty design doc and don’t know where to start, be kind to yourself. You’re solving a hard problem.
Writing is easy. All you do is stare at a blank sheet of paper until drops of blood form on your forehead.
Architecture
The units of which an artificial city is made up are always organized to form a tree.
In his 1965 essay, A City is not a Tree, architect and mathematician Christopher Alexander classified cities into artificial and naturally grown. He argued that artificial cities such as Levittown and Chandigarh feel stifled, and natural cities such as Siena and Kyoto feel alive.
According to Christopher Alexander, the difference lies in the mathematical structure underpinning the city design. Artificial cities have a tree structure. They consist of isolated neighborhoods, each with a housing complex, a school, and a shopping mall, along with a few specialized areas, such as a cultural center and an industrial zone. Natural cities have a semilattice structure. The boundaries between aspects of life are blurred. Work, leisure, and play coincide and interact.
A city design embeds a web of human relationships into a tree of districts and buildings. No canonical mapping exists, so it’s natural that Christopher Alexander couldn’t provide a blueprint for a natural city In his later works, The Timeless Way of Building and A Pattern Language, he cataloged patterns that seem to work well. .
You are no doubt wondering by now what a city looks like which is a semilattice, but not a tree. I must confess that I cannot yet show you plans or sketches. It is not enough merely to make a demonstration of overlap—the overlap must be the right overlap.
Biology
Biological taxonomy classifies living organisms. It was originally based on easily observable traits, such as whether an animal has a spine or feeds its offspring milk. This approach was error-prone because useful traits often evolve independently.
For example, cephalopod eyes are strikingly similar to those of vertebrates, even though the common ancestor likely resembled a blind worm. If camera-type eyes defined a biological group, its members would be a weird bunch.
History is full of actual misclassifications. Fungi were classified as plants until they were given a separate kingdom in the mid-20th century. Crocodiles and birds used to belong to sibling classes (Reptilia and Aves), even though crocodiles are more closely related to birds than to other reptiles.
Taxonomy based on traits is another case of tree mapping.
Traits form concepts
whose inclusions form lattices, not trees.
Jorge Luis Borges mocked such taxonomies in his essay, The Analytical Language of John Wilkins,
quoting a fictional ancient Chinese encyclopedia,
The Celestial Emporium of Benevolent Knowledge
:
the animals are divided into: (a) belonging to the Emperor (b) embalmed (c) trained (d) piglets (e) sirens (f) fabulous (g) stray dogs (h) included in this classification (i) trembling like crazy (j) innumerables (k) drawn with a very fine camelhair brush (l) et cetera (m) just broke the vase (n) from a distance look like flies
Cladistics is a modern system that classifies organisms based on their common ancestry. Even though this classification is imperfect due to horizontal gene transfer, it’s more accurate than the traditional one. It doesn’t impose artificial connections but reveals existing ones.
Conclusion
Now that you know about tree mapping,
you’ll see it everywhere.
It’s responsible for the size of your node_modules directory and the layout of your cookbook.
It lurks in MongoDB schemas.
It dooms class hierarchies.
It is the crux of many struggles with Rust’s borrow checker:
object ownership graphs are trees,
but object interactions are webs.
The primary strategy for dealing with the problem is to be intentional. Hierarchical thinking is so ingrained in us that we’re often unaware that we’re making a choice. We must stop and ask: Which web do we flatten? Which links do we sacrifice? And, most importantly, must the target structure be a tree in the first place?