# Set data structures & Implementation

Sets are an important mathematical concept. A set can be defined as a collection of distinct elements. In mathematical literature, curly braces are used to denote a set. So, a set S containing the elements 1, 2 and 3 is denoted by S = {1, 2, 3}.

A set supports the following basic operations,

**Union**- The union of two sets is a set that contains elements from both sets. For instance, given,- S
_{1}= {1, 2, 3} and - S
_{2}= {3, 4, 5} - S
_{1 ∩ 2}= {1, 2, 3, 4, 5}

- S
**Intersection**- The intersection of two sets is a set that contains elements common to both sets. For instance, given,- S
_{1}= {1, 2, 3} and - S
_{2}= {3, 4, 5} - S
_{1 ∪ 2}= {3}

- S
**Difference**- The difference of two sets is a set that contains elements from the first set with the elements common to both sets removed. Given,- S
_{1}= {1, 2, 3} and - S
_{2}= {3, 4, 5} - S
_{1 − 2}= {1, 2}

- S

We also have the following additional operations,

**IsMember**- Tells you if an element is a member of the set. It is denoted by the ∈ symbol. So,- 1 ∈ {1, 2, 3} but
- 4 ∉ {1, 2, 3}

## Implementing sets

Let’s look at the different ways we can implement the Set data structure and analyse each implementation.

### Bit-vectors

Bit vectors are an array of bits (x_{1}, x_{2}, … x_{n}) such that x_{i} = 1 if i is a member of the set. For example,

S_{1} = {1, 2, 3} will be represented as `11100`

while

S_{2} = {3, 4, 5} will be represented as `00111`

.

#### Advantages

`IsMember()`

,`Add()`

and`Delete()`

are`O(1)`

operations.`Union()`

,`Intersection()`

and`Difference()`

are at most`O(N)`

where`N`

is the size of the universe.- For non-sparse sets, bit-vectors can represent the set quite tightly.
- Bit operations are typically faster.

#### Disadvantages

- Takes too much memory for large universes.
- Needs a mapping function for non-integer members.

### Linked-list

A linked list implementation keeps a sorted list of elements for every set. This means,

S_{1} = {1, 2, 3} will be stored as `1 -> 2 -> 3`

while

S_{2} = {3, 4, 5} will be represented as `3 -> 4 -> 5`

.

#### Properties

`IsMember()`

is at most`O(n)`

operation where`n`

is the size of the set.`Union()`

,`Intersection()`

,`Difference()`

,`Add()`

and`Delete()`

are take at most`O(mn)`

where`m`

is the size of the first set and`n`

is the size of the second set.

#### Advantages

- For very large universes, this outperforms a Bit-vector implementation.

#### Disadvantages

- Typically the slowest implementation compared to the rest.

### Hashtables

A set can also be implemented as a hashmap. The time complexity of the various functions is dependent on the particular implementation of a hashtable.

#### Properties

`IsMember()`

is typically a`O(1)`

operation.`Add()`

and`Difference()`

are typically`O(n)`

where`n`

is the size of the second set.`Union()`

and`Intersection()`

are typically`O(mn)`

where`m`

is the size of the first set and`n`

is the size of the second one.

### Ordered trees

Sets can also be implemented as ordered trees. Priority Queues are an example of an ordered tree. A Binary search tree implementation of a set would give you the following properties,

#### Properties

`IsMember()`

is an`O(log n)`

operation.`Add()`

,`Difference()`

,`Union()`

and`Intersection()`

are`O(log n)`

.

## Applications

- Information retrieval - Inverted index.
- AND & OR queries.