Functional Sets, Part 3: Balancing
Last time, we covered how rotation could keep our trees balanced. But how do we automatically apply our rotations?
When we rotate a tree, we move the subtrees and heads around so that the tree has a different root, but the same order. It looks like this:
Last time we created two functions,
rotateRight, to do this rotation for us.
So, OK, we can rotate our trees.
That’s a nice first step!
But we don’t want to have to rotate left or right manually, we just want
insert to take care of it for us.
So how do we get from here to there?
Fortunately, tons of smart people have thought about this problem, and we have tons of options! Red-black trees, scapegoat trees, splay trees, treaps… but we’re going to use AVL trees (which take their name from their inventors: Adelson-Velsky and Landis.)
AVL trees balance by making sure the height of each subtree is about the same. In our example from before, the balanced tree had a height of 2, while the unbalanced tree had a height of 5.
Take note that we’re using the number of levels of nodes to define the height, not how many nodes there are (this is called the weight.) This means that:
- An empty set has a height of 0.
- A singleton set has a height of 1.
- Anything else is the height of it’s tallest subtree, plus 1.
insert, we’ll look at the new heights of the trees after insertion.
If the difference between them is greater than 1 (so if the left is taller than the right, or the reverse), we’ll rotate!
We want to keep track of these heights without recomputing them every time, so we’ll change our implementation of
Set to look like this:
type Set comparable = Tree Int comparable (Set comparable) (Set comparable) | Empty
Int as the new first field of
Tree will hold the tree’s height.
We’ll need to refer to this field often, let’s write a quick function to get the height of any tree:
height : Set comparable -> Int height set = case set of Empty -> 0 Tree height _ _ _ -> height
Now we change our constructor functions.
empty can stay the same, but singleton will change to add a height:
singleton : comparable -> Set comparable singleton item = Tree 1 item empty empty
…and we’ll need an entirely new constructor,
tree : comparable -> Set comparable -> Set comparable -> Set comparable tree head left right = Tree (max (height left) (height right) + 1) head left right
We’ll use this in
insert in place of direct calls to
insert : comparable -> Set comparable -> Set comparable insert item set = case set of Empty -> singleton item Tree _ head left right -> if item < head then tree head (insert item left) right else if item > head then tree head left (insert item right) else set
And finally, we need to change
rotateLeft : Set comparable -> Set comparable rotateLeft set = case set of Tree _ head lessThans (Tree _ subHead betweens greaterThans) -> tree subHead (tree head lessThans betweens) greaterThans _ -> set rotateRight : Set comparable -> Set comparable rotateRight set = case set of Tree _ head (Tree _ subHead lessThans betweens) greaterThans -> tree subHead lessThans (tree head betweens greaterThans) _ -> set
That’s all we need to do to set the stage! Now that we track of balance, we can rotate during insertion to keep the tree balanced. This takes less work that you might think!
Our Trigger: Height Difference
Remember that we measure the balance of a tree as the height of the right subtree minus the height of the left subtree. A tree is a valid AVL tree if the height is -1, 0, or 1. So, to make our trees valid AVL trees, we’ll rotate if the balance is outside that range.
We can get the height difference for a set with a new function:
diff : Set comparable -> Int diff set = case set of Empty -> 0 Tree _ _ left right -> height right - height left
This means that if the tree leans to the right the value will be positive, and to the left the value will be negative.
The Simplest Case: 2 or More
To balance our tree, we’ll first need to handle if the height difference is greater than one.
Say we insert the letters A through C, in order.
This is not a valid AVL tree, since the balance is 2 (a height of two on the right minus 0 on the left.) Since the tree is leaning right we’ll rotate left, so that B is at the root.
Now the height of the left and the right subtrees is the same, so the difference is 0. This will make our tree valid again.
We’ll implement our initial balancing function like this:
balance : Set comparable -> Set comparable balance set = case set of Empty -> set Tree _ head left right -> if diff set < -1 then rotateRight set else if diff set > 1 then rotateLeft set else set
An empty set is already balanced, so we return it immediately. Same for a set whose difference falls between the valid range of -1 and 1. Otherwise, we rotate!
We’ll also change
insert to call
balance after inserting:
insert : comparable -> Set comparable -> Set comparable insert item set = case set of Empty -> singleton item Tree _ head left right -> if item < head then tree head (insert item left) right |> balance else if item > head then tree head left (insert item right) |> balance else set
Notice that we’re only balancing the tree after insertion. This will make sure that each level of our tree is balanced from the bottom up, and only if any levels need it.
Left- and Right-Leaning Sets
This seems great so far, but there’s one more tree shape we need to consider. What if we inserted A, C, then B?
The difference would be 2, but if we rotated right it would swap to -2!
In this case, we need to rotate the tree at C right, then the tree at A left. The right rotation at C will create our unbalanced A-B-C set from above, then the left rotation at A will move B to the root, balancing the tree. If the tree were leaning the other way, we’d just reverse the operation.
The code looks like this:
balance : Set comparable -> Set comparable balance set = case set of Empty -> set Tree _ head left right -> if diff set == -2 && diff left == 1 then -- left leaning tree with right-leaning left subtree. tree head (rotateLeft left) right |> rotateRight else if diff set < -1 then rotateRight set else if diff set == 2 && diff right == -1 then -- right leaning tree with left-leaning right subtree. tree head left (rotateRight right) |> rotateLeft else if diff set > 1 then rotateLeft set else set
Whew, that function has quite a few conditionals!
Generally, we’re examining the difference in each tree, and the difference in the relevant subtrees.
If we find that the difference is 2 (or -2) and the difference in the subtree is -1 (or 1) we’ve hit our problem.
We send the subtree to
rotateRight before rotating the whole tree in question.
It looks sort of messy, but these 9 lines of code will maintain a valid AVL tree (and fast insert/lookup operations) in every situation.
We’re done with balancing and
Make Your Own Trees, Again
Now you can try this out for yourself. This is the same visualization from last time, but with balancing turned on. Insert values to see how the balancing code reacts and keeps everything tuned up.
Things to play around with:
- Insert three values in order.
- Insert three values so that you would get a leaning tree (like A C B.) Check out how the tree balances itself.
- Create some trees with the same values in different insertion order. Notice how the composition of the tree will change with different insertion orders.