Square joy: pre-order

✏ 2022-06-05 ✂ 2022-06-05

The problem: recover a binary tree
Vectorizing the problem
- Depth vector
- Parent vector
A solution
Bonus: stinking loop solution
Where to go next

In this article, we will solve another programming puzzle using an array programming language, J. We will play with a data structure that is rarely associated with arrays: a binary tree. Our goal is to solve the problem using only flat arrays and high-level array transformations.

The problem: recover a binary tree

Given two sequences of distinct values, inorder and preorder, where inorder is an in-order traversal of a binary tree, and preorder is a pre-order traversal of the same tree, recover the tree structure.

For example, if our inputs are the following sequences

`inorder`	`A P L J K Q F U N`
`preorder`	`Q P A J L K U F N`

then the corresponding tree has the shape depicted on the figure below.

Vectorizing the problem

A problem well stated is a problem half solved.
John Dewey

The way we represent the inputs and outputs of our problem shapes our approach to a solution.

Our inputs, inorder and preorder, are already arrays. However, there is something peculiar about these arrays: they contain the same values, or, in math terms, preorder is a permutation of inorder. We can eliminate this redundancy by compressing the inputs to a single numeric array: the permutation that turns inorder into preorder. A primitive J operation, dyadic i. (index of), transforms the original inputs to the more convenient and compact permutation form.

    'APLJKQFUN' i. 'QPAJLKUFN'
5 1 0 3 2 4 7 6 8

The next step is to decide on the tree representation. There are two popular representations in the array community: depth vector and parent vector.

Depth vector

The depth vector is a tree representation that you often see in graphical user interfaces, such as text editors displaying the project file structure. This representation consists of the vector of nodes as they appear in the pre-order traversal and the vector of corresponding node depths.

    T =. 5 1 0 3 2 4 7 6 8
    D =. 0 1 2 2 3 3 1 2 2

The pre-order vector specifies the vertical order of the nodes; the depth vector specifies how far these nodes lie horizontally.

   ((D>:/H) + (T+1) * D=/ H =. i.1+>./D) { ' .APLJKQFUN'
Q
.P
..A
..J
...L
...K
.U
..F
..N

This representation has many benefits, but it is not convenient for our problem.

Parent vector

The parent vector is an array mapping nodes to their parents. Technically, the root node does not have a parent, but many algorithms become more elegant if we extend the definition of parent to work for the root node as well: we declare that the parent of the root node is the root node itself.

Array 1 5 3 1 3 5 7 5 7 is the parent vector encoding for our example tree.

    P =. 1 5 3 1 3 5 7 5 7

Let us play with this representation a bit to understand it better. We will use a couple of J primitives: I. (indices) returning the indices of set bits in a boolean mask, and e. (member in) returning a boolean mask that has the shape of the left argument and has bits set for items that appear in the right argument.

Let us find theParent vector representation also works for forests. tree root, the node that is its parent.

    P = i.#P
0 0 0 0 0 1 0 0 0

    I. (P = i.#P)
5

How about finding direct children of node 3? Those are nodes having 3 as their parent.

    I. P = 3
2 4

How about finding all leaves? Those are nodes that do not appear in the parent vector.

    -. (i.#P) e. P
1 0 1 0 1 0 1 0 1

   I. -. (i.#P) e. P
0 2 4 6 8

The parent vector representation does not distinguish right children from left children. However, we label our nodes in-order, so there is a simple way to tell the direction: nodes with indices lower than their parents are left children, and nodes with higher indices are right children.

   NB. all left children
   I. (i.#P) < P
0 1 2 6

   NB. all right children
   I. (i.#P) > P
3 4 7 8

Parent vector gives us a simple, elegant, and self-contained tree representation. It is perfect for solving the problem.

We have decided on the form of our solution: a monadic function that accepts a permutation and returns a parent vector of the same shape.

A solution

Finding left children

Where should we start? I like to start by finding a way to visualize the data. One way to visualize a permutation is to draw a table showing how it scrambles items in space.

    T =/ i.#T
0 0 0 0 0 1 0 0 0
0 1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1

Look at the ones! If you squint a bit, you will see that they form a picture of the tree we started withThe reason is obvious in hindsight, but I was quite surprised when I drew the picture for the first time.. Let us use the >: (greater or equal) verb instead of equality in the table and multiply each column by T.

    T * T >:/ i.#T
5 5 5 5 5 5 0 0 0
1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
3 3 3 3 0 0 0 0 0
2 2 2 0 0 0 0 0 0
4 4 4 4 4 0 0 0 0
7 7 7 7 7 7 7 7 0
6 6 6 6 6 6 6 0 0
8 8 8 8 8 8 8 8 8

This table has an interesting property: if we look at the places where ones used to be (underscored), the value right above that place is either zero (for all right children) or the parent in the tree (for all left children). Why is that?

Let us inspect the structure of a pre-order permutation vector. After adding a node index to the vector, we descend into the left child (if there is one). That left child will always follow the parent in the pre-order vector. Left children have indices lower than the index of their parent because we assign indices according to in-order traversal. A simple pattern emerges: if we see numbers a and b next to each other, and a > b, then a is the parent of b.

Let us put this observation into code. We shift the input vector to the right, filling the space on the left with the first value in the vector (the root node). We then compute Shifted × (T ≤ Shifted).

    NB. get the head of the vector
    {.T
5

    NB. shift to the right, filling the space with the head
   _1 |.!.({.T) T
5 5 1 0 3 2 4 7 6

    NB. compute T <= Shifted
    T <: _1 |.!.({.T) T
1 1 1 0 1 0 0 1 0

    NB. L = Shifted * (T <= Shifted)
    ] L =. T (<: * ]) _1 |.!.({.T) T
5 5 1 0 3 0 0 7 0

    NB. the parent of node 5 is 5, the parent of node 1 is 5, etc.
    T,:L
5 1 0 3 2 4 7 6 8
5 5 1 0 3 0 0 7 0

Array L is not the correct answer yet: the parents we computed for the right children are zero.

Finding right children

To find the parents of the right children, we have to look closely at the pre-order vector again. When we construct the vector, we first append the parent, follow the whole left subtree, and then goes the right child. All nodes in the left subtree have indices smaller than the parent, and the right child has an index larger than the parent.

This observation gives us an algorithm: we need to find the largest value to the left smaller than the node for each position in the array. The simplest way to express this algorithm in J is to use the } adverb (prefix scan). We look at each prefix and find the largest value in that prefix that is smaller than the last element of the prefix. Let us take the prefix 5 1 0 3 as an example.

    NB. {: computes the last element of an array
    {: 5 1 0 3
3

    NB. compute the boolean mask of items smaller than the last element
    (] < {:) 5 1 0 3
0 1 1 0

    NB. filter out elements according to the boolean mask
    ((] < {:) # ]) 5 1 0 3
1 0

    NB. find the maximum
    (>./ @ ((] < {:) # ])) 5 1 0 3
1

Let us now apply our composite verb to each successive prefix of the pre-order array.

    NB. __ means "negative infinity".
    ] R =. (>./ @ ((] < {:) # ]))\ T
__ __ __ 1 1 3 5 5 7

    NB. node 5 has no parent, the parent of node 3 is 1, etc.
    T,:R
 5  1  0 3 2 4 7 6 8
__ __ __ 1 1 3 5 5 7

Array R has incorrect values for parents of all left children. We shall deal with this issue in the next section.

Computing the parent vector

We have two ways to compute the parents: one works for left children (let us call it the L-algorithm), and another works for right children (the R-algorithm). How do we combine these results?

Note that array L had zeros for all right children. Also, note that the L-algorithm produces larger values than the R-algorithm: the L-algorithm looks for larger nodes to the left of each position, while the R-algorithm looks for smaller nodes. So taking the maximum of arrays L and R gives us the correct answer.

    L >. R
5 5 1 1 3 3 5 7 7

    NB. node 5 has parent 5, node 1 has parent 5, etc.
    T ,: L >. R
5 1 0 3 2 4 7 6 8
5 5 1 1 3 3 5 7 7

We are not there yet: the parent vector appears scrambled because it mirrors the pre-order permutation. We need to enumerate parents according to the in-order traversal by applying the inverse permutation to get the final answer.

    NB. Compute the inverse permutation for T
    T i. i.#T
2 1 4 3 5 0 7 6 8

To apply a permutation, we use it as an index of the array we want to permuteApplying permutations is a common operation in array programming, so J has a primitive function for that, C. (permute)..

    NB. apply the inverse permutation to the scrambled parents
    (T i. i.#T) { L >. R
1 5 3 1 3 5 7 5 7

We got our answer. Let us extract our little exploration into a function.

recover_tree =. {{
  L =. y (<: * ]) _1 |.!.({.y) y
  R =. (>./ @ ((] < {:) # ]))\y
  (y i. i.#y) { L >. R
}}

Same function in tacit form:

recover_tree =. (i. i.@#) { ((<: * ]) ({. , }:))~ >. >./@((] < {:) # ])\

Bonus: stinking loop solution

While playing with J, we learned that we need only one permutation as input, and that we can represent the result as a flat array of parents. Let us apply this knowledge to get an efficient and short solution in C.

We will use recursion to solve the problem. Let us consider a traversal of a subtree spanning from node L to node R (R excluded). The first item in the pre-order array gives the root P of that subtree. P - L nodes to the left of P (indices L..P) form the left subtree, and R - P nodes to the right of P (indices P+1..R) constitute the right subtree. We handle those subtrees recursively, assigning P as their root.

Let us look at the example, 5 1 0 3 2 4 7 6 8. The first node in the array is the root, 5. It divides the rest of the vector into two sections forming the left and the right subtrees correspondingly: nodes 0 1 2 3 4 (five nodes in total) and nodes 6 7 8 (three nodes in total). We can compute these subtrees recursively and assign 5 as the root of these subtrees.

⊕ A C implementation of the recursive algorithm for computing the parent vector. Pre is the pre-order traversal array, P is the output array of parents, N is the number of tree nodes.

static void restore_range(int Pre[], int P[], int L, int R, int V) {
    if (L >= R) return;
    P[Pre[0]] = V;
    restore_range(Pre + 1, P, L, Pre[0], Pre[0]);
    restore_range(Pre + Pre[0] - L + 1, P, Pre[0] + 1, R, Pre[0]);
}

void restore_tree(int Pre[], int P[], int N) {
    if (N > 0) restore_range(Pre, P, 0, N, Pre[0]);
}

Note that this code is inherently polymorphic: it does not care about the type of elements we store in the tree.

Where to go next

Discuss this article on Reddit.
Solve this problem on Leetcode.
Solve a similar problem: recover a tree from its in-order and post-order traversals. Hint: all of the tricks discussed in this article will work after changing the direction.
Watch Aaron Hsu’s talk, High-performance Tree Wrangling, the APL way.
Read Aaron Hsu’s dissertation, A data parallel compiler hosted on GPU. It goes deep into tree representations and conversions between these representations. There are many exciting algorithms on the parent vector representation, too.
Listen to the Arraycast podcast.

Similar articles