Universal domain types
✏ 2024-02-26 ✂ 2024-02-26in strong typing we trust
Introduction
Skillful use of a strong static type system can eliminate certain classes of bugs. Using custom application-specific types instead of raw integers or strings is a powerful technique that will save you hours of debugging. The following quote from one of my favorite books on software design illustrates this point:
It took six months, but I eventually found and fixed the bug. […] At one point in the code there was a
block
variable containing a logical block number, but it was accidentally used in a context where a physical block number was needed.
The author attributes the bug to poor variable naming, but this blame is misplaced. If the programmer had defined distinct types for logical and physical block numbers, the compiler would have caught this mistake immediately. In this article, I call such definitions domain types. They serve as documentation, help catch bugs at compile time, and make the code more secure The book Secure by Design by Dan Bergh Johnsson et al. provides many examples of using domain types for building a secure system from the ground up. .
This article shows a systematic approach to domain types, provides examples of domain types applicable to most applications, and contains hints on how to implement them effectively.
Language features
Many languages provide syntax for simplifying domain type definitions. Such definitions create a new distinct type sharing representation with a chosen underlying type (e.g., a 64-bit integer). The semantics of such definitions vary across languages, but they usually fall into one of two categories: newtypes and typedefs.
Newtypes
Newtypes wrap an existing type, allow the programmer to inherit some of the operations from the underlying type, and add new operations. Newtypes are flexible but may need boilerplate code to implement all features required in a real-world application.
Haskell and Rust are examples of languages supporting newtypes.
Typedefs
Typedefs introduce a new name for an existing type, inheriting all underlying type operations.
Go, D and Ada provide typedefs (Ada calls typedefs derived types).
The Boost project for C++ implements typedefs as a library
(C’s typedef declarations are weak typedefs
: they introduce an alias for an existing type, not a new type).
Newtypes and typedefs are versatile and practical, but they approach the problem in a way that’s too simplistic and mechanical. There is a more systematic way to think about domain types.
Domain type classes
A constraint on component design leads to freedom and power when putting those components together into systems.
Over the years, I found that specific classes of domain types appear repeatedly in most applications I work on. This section is an overview of these categories.
I use pseudo-Rust syntax to illustrate the concepts, but the ideas should easily translate to any statically typed language.
The code snippets present minimal interfaces for each type class. Practical concerns often require adding more operations. For example, using identifiers as keys in a dictionary requires exposing a hash function (for hash maps) or imposing an ordering (for search trees), and serializing values requires accessing their internal representation.
Identifiers
One of the most common uses of domain types is a transparent handle for an entity or an asset in the real world, such as a customer identifier in an online store or an employee number in a payroll application. I call these types identifiers.
Identifiers have no structure, i.e., we don’t care about their internal representation. The only fundamental requirement is the ability to compare values of those types for equality. This lack of structure suggests an appropriate mathematical model for such types: a set, a collection of distinct objects.
Newtypes are a perfect fit for identifiers thanks to their ability to hide structure. Typedefs, on the other hand, impose too much structure, allowing the programmer to add and subtract numeric identifiers accidentally. But given the choice, typedefs are safer than raw integers or strings.
Amounts
Another typical use of domain types is representing quantities, such as the amount of money in usd on a bank account or the file size in bytes. Being able to compare, add, and subtract amounts is essential.
Generally, we cannot multiply or divide two compatible amounts and expect to get the amount of the same type back Unless we’re modeling mathematical entities, such as probabilities or points on an elliptic curve. . Multiplying two dollars by two dollars gives four squared dollars. I don’t know about you, but I’m yet to find a practical use for squared dollars.
Multiplying amounts by a dimensionless number, however, is meaningful. There is nothing wrong with a banking app increasing a dollar amount by ten percent or a disk utility dividing the total number of allocated bytes by the file count.
The appropriate mathematical abstraction for amounts is vector spaces. Vector space is a set with additional operations defined on the elements of this set: addition, subtraction, and scalar multiplication, such that behaviors of these operations satisfy a few natural axioms.
Newtypes allow us to implement amounts, but they might need some tedious code to get the multiplication and division right. Typedefs are handy, but get multiplication and division wrong, confusing dollars and dollars squared.
Loci
Working with space-like structures, such as time and space, poses an interesting challenge. Spaces have two types of values: absolute positions and relative distances.
Positions refer to points in space, such as timestamps or geographical coordinates. Distances represent a difference between two such points.
Some natural languages acknowledge the distinction and offer different words for these concepts, such as o’clock
vs. hours
in English or Uhr
vs. Stunden
in German.
While distances behave the same way as amounts, positions are trickier. We can compare, order, and subtract them to compute the distance between two points. For example, subtracting 5 am on Friday from 3 am on Saturday gives us twenty-two hours. Adding or multiplying these dates makes no sense, however. This semantic demands a new class of types, loci (plural of locus).
One example of the locus/distance dichotomy coming from system programming is the memory address arithmetic.
Low-level programming languages differentiate pointers (memory addresses) and offsets (distances between addresses).
In the C programming language, the void*
type represents a memory address, and the ptrdiff_t
type represents an offset.
Subtracting two pointers gives an offset, but adding or multiplying pointers is meaningless.
We can view each position as a distance from a fixed origin point. Changing the origin or the distance type calls for a new locus type.
Timestamps offer an excellent demonstration of the distance type + the origin
concept.
Go and Rust represent timestamps as a number of nanoseconds passed from the unix epoch (midnight of January 1st, 1970),
The C programming language defines the time_t
type, which is almost always the number of seconds from the unix epoch.
The q programming language also uses nanoseconds, but chose the millennium (midnight of January 1st, 2000) as its origin point.
Changing the distance type (e.g., seconds to nanoseconds) or the origin (e.g., unix epoch to the millennium) calls for a different timestamp type.
The Go standard library employs the locus type design for its time
package, differentiating the time instant (time.Time
) and time duration (time.Duration
).
The Rust standard module std::time
is a more evolved example.
It defines the SystemTime
type for wall clock time (the origin is the unix epoch), Instant
for monotonic clocks (the origin is some unspecified point in the past
, usually the system boot time), and the Duration
type for distances between two clock measurements.
Quantities
So far, we considered applications where domain types barely interact with one another. Many applications require combining values of different domain types in a single expression. A physics simulation might need to multiply a time interval by a velocity to compute the distance travelled. A financial application might need to multiply the dollar amount by the conversion rate to get an amount in euros.
We can model complex type interactions using methods of dimensional analysis. If we view amounts as values with an attached label identifying their unit, then our new types are a natural extension demanding a more structured label equivalent to a vector of base units raised to rational powers. For example, acceleration would have label (distance × time-2), and the usd/eur pair exchange rate would have label (eur × usd-1). I call types with such rich label structure quantities.
Quantities are a proper extension of amounts: addition, subtraction, and scalar multiplication work the same way, leaving the label structure untouched. The additional label structure gives meaning to multiplication and division.
The result of multiplication will have a base unit vector with the component-wise sum of the power vectors of the unit factors. For example, car fuel consumption computation could use an expression like 2 (km) × 0.05 (liter × km-1) = 0.1 (liter).
Dividing values produces a label that’s a component-wise difference between the dividend and divisor power vectors. For example, running pace computation could use an expression like 10 (min) / 2 (km) = 5 (min × km-1).
Quantities require complex type-level machinery, which makes them hard to implement in most languages. Boost.Units is one of the first libraries to provide comprehensive implementations of quantity types in C++. Rust ecosystem offers the dimensioned package. The units package is a popular choice in the Haskell ecosystem.
If your language doesn’t support advanced type-level programming or using rigid types is impractical in your application, you can do unit checks at runtime. Python’s quantities package is an example of this approach.
Conclusion
This article shows a systematic approach to designing domain types: we identify the minimal interface the type must satisfy to address practical needs and find suitable mathematical machinery.
We discussed four classes of domain types that fit naturally in almost any application: identifiers, amounts, loci, and quantities. Each type class is a little gem, a universal structure akin to a design pattern. Unlike design patterns, the universal domain types are based on mathematical abstractions and can be specified precisely.
I’m sure you will identify a few places in your application where one of these type classes fits perfectly. If you use Rust, you might find my phantom_newtype package helpful. Maybe you’ll discover some gems of your own. Have fun!
Exercises
The following questions and exercises will help you understand and apply the material in this article.
- Think of bugs you found in your career that were embarrassingly hard to find but trivial to fix. Could using more precise types prevent those bugs?
- Are there any areas in the application you’re working on where one of the universal domain types could improve type safety?
- How do universal domain types relate to one another? Draw a line diagram.
- Non-spacial quantities, such as mass and electrical charge, don’t seem to require corresponding locus types. Why is that?
- Did you find any other domain types in your work that might be universal? If so, comment on GitHub issue roman-kashitsyn/mmapped.blog#50. I’ll gladly add your gem to this article with a proper attribution.