Creating a generic set type in Go

tl;dr: map[interface{}]struct{} gives us the properties we want with a low memory overhead. There’s a great implementation of this idea (including thread safety) at dekarep/golang-set.

First things first, if you’re not familiar, a set is a data structure that maintains a unique collection of elements. They work similarly to other collection types like vectors, but with a few special properties: In order to maintain uniqueness, inserting an element that’s already in the set does nothing, and in order to make this check reasonable to do often, the Contains(el) algorithm is as fast as possible. This is typically achieved by arranging the elements in a way that they can be looked up, rather than searching the list from start to finish, which yields a complexity of O(log n) or even O(1) rather than O(n).

So how do we create this type in Go? It doesn’t provide a native set type like python or include one in it’s STL like C++. It also has fairly clumsy generic programming through interface{} and unsafe.Pointer, so creating this by yourself could be overly complex or error-prone. The answer is that the desired functionality is already hidden in plain sight in Go’s map type. Go’s maps are hash tables, which gives them a constant time average key lookup complexity. They also only allow one copy of each key, enforcing uniqueness. This makes them a great foundation for building our set.

There is one more thing to consider here, though, which is memory usage. We only care about the keys, but we also need a value for each one. What is the best value to use when we don’t care about it? If you’re coming from a C background, you might chose 1 because that’s a small truthy value often used as a sentinel. If you’re more familiar with Go, you might choose true because you know it’s actually 1 bit rather than 8 bits for a small int. If you’re even more familiar with go, you’d choose an anonymous struct because it actually allocates 0 bits while still representing a value in the Go runtime.

import (
	"fmt"
	"unsafe"
)

func main() {
	fmt.Println("Sizeof int:", unsafe.Sizeof(1))
	fmt.Println("Sizeof bool:", unsafe.Sizeof(true))
	fmt.Println("Sizeof struct:", unsafe.Sizeof(struct{}{}))
}

// Sizeof int: 8
// Sizeof bool: 1
// Sizeof struct: 0

We now have everything we need to create our generic set.

First, we create a map which can hold anything and has an empty presence sentinel.

type Set map[interface{}]struct{}
var sentinel = struct{}{}

Then we can implement our set functions. One thing to note is the nice map access function which returns 2 values: the thing you searched for, and a bool indicating whether there was anything there. That makes these functions pretty simple.

func (s *Set) Contains(el interface{}) bool {
	_, found := (*s)[el]
	return found
}

func (s *Set) Add(el interface{}) bool {
	if s.Contains(el) {
		return false
	}
	(*s)[el] = sentinel
	return true
}

func (s *Set) Remove(el interface{}) {
	delete(*s, el)
}

So these are the very basics of a set, but we also need a way to fetch elements back out. The simplest way would be to collect the keys into a slice.

func (s *Set) ToSlice() []interface{} {
	keys := make([]interface{}, 0, len(*s))
	for elem, _ := range *s {
		keys = append(keys, elem)
	}
	return keys
}

This is where we see the downside to this type of generic programming. When we get the elements back out, they lose their types, and we the programmer have to keep track of that ourselves and assert what they are. To actually use these elements, we have to perform a type assertion.

set := make(Set)
set.Add("Hydrogen")
set.Add("Helium")
set.Add("Lithium")

for i, elem := range set.ToSlice() {
	fmt.Printf("Element %d: %s\n", i + 1, elem.(string))
}

Additionally, this assertion will cause a panic if used on any element not of that type.

set := make(Set)
set.Add("Hydrogen")
set.Add("Helium")
set.Add("Lithium")
set.Add(4)

for i, elem := range set.ToSlice() {
	fmt.Printf("Element %d: %s\n", i + 1, elem.(string))
}
// panic: interface conversion: interface {} is int, not string

So this leads us to a (potential) benefit of this approach, which is that you actually can mix types within this container, you just need to make sure you handle them appropriately when you pull them out.

set := make(Set)
set.Add("Hydrogen")
set.Add("Helium")
set.Add("Lithium")
set.Add(4)

for i, elem := range set.ToSlice() {
	switch elem.(type) {
	case string:
		fmt.Printf("Element %d: %s\n", i+1, elem)
	case int:
		fmt.Printf("Element %d: atomic number %d\n", i+1, elem)
	default:
		fmt.Println("I don't know what this is")
	}
}
// Element 1: Lithium
// Element 2: atomic number 4
// Element 3: Hydrogen
// Element 4: Helium

Notice it handles the different types, but there’s something else wrong now. This brings us to the final point I wanted to talk about, which is that the order of keys is not guaranteed. If you ever use a map and the order of keys is important, you’ll need to make sure to explicitly sort them on your own. Go’s sort.Sort may be useful, but isn’t guaranteed to be the answer, especially for cases like this.

As mentioned in the tl;dr at the beginning of this post, the library deckarep/golang-set has implemented this idea to it’s fullest extent. They have support for a bunch of mathematical set operations like Difference and Intersection as well as a thread safe version of all the code. They also don’t have any external libraries, so it’s a pretty light option. I would recommend using this library in basically all cases. This article exists to explain the concept.

Join the discussion
Support the author
  • I'm actually doing fine. There are plenty of others who need help more than me. I'll post a list of causes I endorse here soon.