Published on

Implementing Union in C++

Authors

Introduction

Union data structure is also called disjoint-set data structure. If the operation is required to partition a set into disjoint subsets, then using union is preferred. A popular implementation is disjoint-set forest

Operations

  1. Add new set
  2. Merge two sets
  3. Find the representative member of a set

Initialize sets

You have to initialize elements to sets that only contains itself, in order for them to be merged.

void MakeSet(Node* a) {
    if (a->parent != nullptr) return;
    a->parent = a;
    a->rank = 0; // if union uses rank 
    a->size = 1; // if union uses size
}

Merge operation (Union)

When deciding if set A is merged into set B or the opposite way, we can check its rank(height of tree) or the size of the set. The set with smaller rank or smaller size is merged into the other set.

Example: merge using rank

void Union(Node* a, Node* b) {
    Node* setA = Find(a);
    Node* setB = Find(b);
    
    if (setA == setB) { // already in the same union
        return;
    }
    
    if (setA->rank > setB->rank) {
        setB->parent = setA;
    }
    else if (setA->rank < setB->rank) {
        setA->parent = setB;
    }
    else {
        setB->parent = setA;
        setA->rank++;
    }
}

Find operation

The representative member (or the root) of the set has parent of itself. For a node in the set, follow the chain of parent pointers until reach the representative member.

Node* Find(Node* node) {
    if (node != node->parent) {
        return Find(node->parent);
    }
    else {
        return node;
    }
}

There are a few efficient implementations, here is an example of Path Splitting:

Node* Find(Node* node) {
    while (node->parent != node) {
        Node* tmp = node->parent;
        node->parent = node->parent->parent;
        node = tmp;
    }
    return node;
}

And there is Path Halfing:

Node* Find(Node* node) {
    while (node->parent != node) {
        node->parent = node->parent->parent;
        node = node->parent;
    }
    return node;
}

Why path halfing works is that we know the parent of the root is also itself. Therefore, we can always jump two steps further when finding the root.

Further Reading

Kruskal's algorithm uses union to find minimum spanning tree.

(Leetcode) Regions Cut by slashes

Reference

Wikipedia