The numeric tower fiasco

  2023-11-22



Object-oriented programming is an exceptionally bad idea which could only have originated in California.

Edsger Dijkstra

Introduction

Object-oriented programming Object-oriented is a vague term that means different things to different people. This article focuses on the class-based flavor that the most popular programming languages (Java, Python, C++, C#) implement. is still the dominant paradigm in the software industry. The typical advice on modeling the problem domain is to fit your entities into hierarchies where specific types inherit common attributes and behaviors from base classes.

That idea looked intuitive and attractive to me when I first encountered it. I spent countless hours learning about the good object-oriented design and trying to apply these ideas in practice.

The practice showed that real engineering problems rarely fit into rigid class hierarchies. I grew more frustrated with object-oriented programming and eventually entirely gave up on it. This article describes one example that stuck with me from the start of my programming career: modeling the hierarchy of number types.

The numeric tower

God made the integers, all the rest is the work of man.

Leopold Kronecker

I like math, so one of the first structures I tried to model with classes was the most precisely specified and well-studied hierarchy in human history: the numerical tower.

Mathematicians discovered The primary motivation for introducing integers was to create a numeric space where equations of the form a + x = c, where a and c are naturals, always have a solution. Rationals play the same role for equations of the form a × x = c. quite a few types of numbers; usually, each new type is the most natural extension of the previous one:

A few floors of the numeric tower represented as a Venn diagram.

The tower goes up and includes real numbers, complex numbers, quaternions, etc.

The OOP design

In this section, we’ll design a class hierarchy modeling the numbers using an imaginary object-oriented programming language. We’ll limit the tower to only two number types to keep the example simple: Naturals and Integers.

Most people (me included) will instinctively reach for the class structure where Natural is the base class, and Integer extends that base.

Pseudocode of a class hierarchy in which the Natural class is the base and the Integer class extends it.
class Natural
  value : UInt

  def make(value : UInt) : Natural = …
  def +(other : Natural) : Natural = Natural.make(self.value + other.value)
end

class Integer <: Natural
  negative? : Bool

  def make(value : Natural, negative? : Bool) : Integer = …
  def magnitude() : Natural = self.value
  def +(other : Integer) : Integer = …
end

I’m sure you immediately spotted a problem with this design: we inverted the hierarchy, breaking the is a relationship! We claim that all integers are also naturals, but that’s false. The design violates the Liskov substitution principle: passing an Integer to a function that expects a Natural can produce incorrect results:

> Natural.make(5) + Integer.make(10, true) // 5 + -10
15 // ouch

Enlightened by this failure, we reverse the inheritance hierarchy. Integer becomes our base class, and Natural extends it.

Pseudocode of a class hierarchy in which the Integer class is the base and the Natural class extends it.
class Integer
  value : UInt
  negative? : Bool

  def make(value : Natural, negative? : Bool) = …
  def magnitude() : Natural = self.value
  def +(other : Integer) : Integer = …
end

class Natural <: Integer
  def make(value : UInt) : Natural = super.make(value, false)
  def +(other : Natural) : Natural = …
end

Once the initial excitement dissipates, we face a new issue. The simplest case, the Natural numbers, must have the same representation as the more complex type, Integers. The problem becomes more apparent if we add more types to the tower, such as Rational. Instead of starting with the simplest case and building from it, we started with the most complicated one and made everything else a special case. That’s why the original incorrect design was so compelling: we moved from simpler to more complex types, not the other way around!

Another design option is to give up concrete type hierarchies and clamp all number types under a single interface.

Pseudocode of a type hierarchy where all numeric types implement the same interface.
interface Number
  def +(n : Natural) : Number
  def -(n : Natural) : Number
  // …
  def +(i : Integer) : Number
  def -(i : Integer) : Number
  // …
end

class Natural <: Number
  // a lot of boring code …
end

class Integer <: Number
  // a lot of boring code …
end

This approach is the most versatile, but it has many drawbacks:

It’s time to abandon our blunt OOP tools and start from the first principles.

The functional design

The object-oriented tradition obsesses with encoding is a relations among types as a class hierarchy. In this section, we’ll use the approach mathematicians employ: we start with simple structures and combine them to form more complex objects. This approach is similar to using composition instead of inheritance in the OOP world.

Our tool of choice will be the Haskell programming language, but the same principles will work in all languages supporting algebraic data types and functions (even Rust would do, though it would be less pretty).

In our first attempt, we will model each numeric type with a separate Haskell type.

Modeling the Natural and Integer numeric types in Haskell.
-- Each natural is either zero or a successor of a smaller number.
data Natural = Zero | Succ Natural -- 

-- There are two disjoint classes of integers:
-- n and -1 - n for all natural values of n.
data Integer = NonNegative Natural | MinusOneMinus Natural -- 

-- We model the `is a' relation as a pure total function
-- mapping naturals to integers.
int_of_nat :: Natural -> Integer -- 
int_of_nat = NonNegative

-- We can model the reverse relation as a partial function
-- mapping integers to naturals.
nat_of_int :: Integer -> Maybe Natural
nat_of_int (NonNegative n) = Just n
nat_of_int (MinusOneMinus _) = Nothing

plus_nat :: Natural -> Natural -> Natural
plus_nat n Zero = n
plus_nat n (Succ m) = Succ (plus_nat n m)

plus_int :: Integer -> Integer -> Integer
plus_int = …

minus_int :: Integer -> Integer -> Integer
minus_int = …
  1. According to Peano, a natural number is either a zero or a successor of a smaller natural number. This definition is equivalent to the unary numeral system, which is inefficient for computation but convenient for demonstration.
  2. An integer number is either a non-negative natural number n or a negative number -1 - n, where n is a natural number. Note that with this definition, each integer has a unique representation.
  3. We encode the is a relation between numeric types as a pure total function. OOP languages usually generate these type-casting functions automatically and apply them implicitly.

Note how modular this implementation is. Naturals can live in a separate module and don’t need to know about the existence of integers. We can easily change the implementation of naturals (e.g., use the more efficient binary representation) or add more numeric types (e.g., Rationals) without breaking other code.

However, the design is not yet satisfactory: we lost the ability to operate uniformly on numbers of different types. We can add a universal Number type to reclaim this property.

Modeling the numeric tower in Haskell with a sum type.
-- The sum type encoding all number types.
data Number = Nat Natural | Int Integer

-- Adds arbitrary numbers.
plus :: Number -> Number -> Number
plus = …

-- Subtracts arbitrary numbers.
minus :: Number -> Number -> Number
minus = …

This design might seem like a re-iteration of the Number interface story from the OO world, but it’s not:

To address the boilerplate issue, we’ll introduce the numeric type promotion operation. When we add two numbers of different types, we convert the simpler type to the more complex one using the previously discussed type conversion functions. We then apply the binary operator dealing with numbers of the same promoted type. Finally, we demote the result to the simplest type that can hold the value.

Reducing the boilerplate with type promotion.
data Number = Nat Natural | Int Integer

-- Promotes the arguments to the largest common type.
promote :: (Number, Number) -> (Number, Number)
promote (Nat n, Int m) = (Int (int_of_nat n), Int m) 
promote (Int n, Nat m) = (Int n, Int (int_of_nat m))
promote x = x

-- Reduces the number to its most canonical form.
simplify :: Number -> Number
simplify (Int (NonNegative n)) = Nat n
simplify x = x

type BinaryOp = Number -> Number -> Number

-- Implements a total binary operation given a kernel defined
-- only on numbers of the same type.
impl_binary_op :: BinaryOp -> BinaryOp
impl_binary_op kernel x y = let (x', y') = promote (x, y)
                            in simplify (kernel x' y')

-- Adds arbitrary numbers.
plus :: Number -> Number -> Number
plus x y = impl_binary_op plus_simple x y
  where plus_simple (Nat n) (Nat m) = Nat (plus_nat n m)
        plus_simple (Int n) (Int m) = Int (plus_int n m)
        plus_simple _ _ = undefined

-- Subtracts arbitrary numbers.
minus :: Number -> Number -> Number
minus x y = impl_binary_op minus_simple x y
  where minus_simple (Nat n) (Nat m) = Int (minus_int (int_of_nat n) (int_of_nat m))
        minus_simple (Int n) (Int m) = Int (minus_int n m)
        minus_simple _ _ = undefined

The impl_binary_op function captures the promote/execute/simplify pattern. Concrete binary operators (plus, minus, etc.) must deal only with numbers of the same type.

With this, we get a clean and concise model of numbers on a computer. The code is easy to extend to support more numeric types, which is left as an exercise for the reader I find adding the Complex type particularly interesting. Hint: you might need two different generic Number types. .

Conclusion

I find OOP technically unsound… I find OOP philosophically unsound… I find OOP methodologically wrong.

When I use a tool and get unsatisfactory results, I blame myself for not using the tool correctly. At first, I felt the same about the numeric tower case, but now I’m sure the problem is not me; it’s the class hierarchies.

The approach of piling classes on top of one another and trying to make this stack coherent is fundamentally flawed. It fails spectacularly even on tiny examples where the problem domain is mathematically specified. The numeric tower is one example, but there are more. The circle-ellipse problem is a good one, but my favorite is how inheriting ColorPoint from Point breaks the transitivity property of the equals method:

There is no way to extend an instantiable class and add a value component while preserving the equals contract, unless you’re willing to forgo the benefits of object-oriented abstraction.

Joshua Bloch, Effective Java, 3rd edition, item 10, p. 42

If something doesn’t work on small examples, why do we expect it to work in large codebases? When was the last time you looked in a complex class hierarchy and admired how easy to understand the code was?

Experts often recommend favor composition over inheritance E.g., Joshua Bloch, Effective Java, third edition, item 18, p. 87. . I agree with them and favor algebraic data types and simple functions as the ultimate form of composition.