Don’t Underestimate the Elegant Power of Python Sets 💣 | by Eirik Berge, PhD | Aug, 2022

In Python there are four basic container types:

  • Lists: Lists in Python are mutable sequences that can contain data of different types. An example is dates = ["19-12-1922", "08-02-2005"].
  • Tuples: Tuples in Python are similar to lists, except that they are not mutable. Their internal state does not change over time. An example is seasons = ("Spring", "Summer", "Autumn", "Winter").
  • Dictionaries: Dictionaries in Python are key-value storage where the keys point to a specific value. An example is age = {"Eirik": 27, "Grandma": 86}.
  • Sets: Sets in Python are collections that are unordered and mutable. Most importantly, sets contain only unique values. An example is emails = {"toy_email@gmail.com", "not_a_real_email@hotmail.com"}.

Everyone learns about lists, tuples, and dictionaries in an introductory course on Python. Yet, sets in Python are often neglected. Some junior data professionals assume that sets are more advanced or deep. This is not true!

Sets in Python are super practical and not that hard to learn. In this blog post, I will show you more or less everything you need to know to get started with sets in Python.

As an example before we begin, consider the problem of extracting the unique values from the following list:

languages = ["Python", "R", "R", "SQL", "Python", "C", "SQL"]

You can see that both Python and R are listed twice. To do this manually, you could do the following:

languages = ["Python", "R", "R", "SQL", "Python", "C", "SQL"]# Manually:
unique_languages = []
for language in languages:
if language not in unique_languages:
unique_languages.append(language)

This works fine, but it is way more lines of code than needed. With sets you can easily do this in a single readable line:

languages = ["Python", "R", "R", "SQL", "Python", "C", "SQL"]# Manually:
unique_languages = []
for language in languages:
if language not in unique_languages:
unique_languages.append(language)

# With Sets
unique_languages = list(set(languages))

Awesome, right? If you are more of a visual learner, I’ve made a companion video that covers more or less the same topics 🔥

Note: There are many other container types than the four mentioned here. I have written a blog on dataclasses that you can check out afterwards if you want to!

The first way to create a set in Python is to use curly brackets as follows:

# Basic Syntax for Creating Sets
favorite_foods = {"Pizza", "Pasta", "Taco", "Ice Cream"}

You have now created a set with four strings. Sets can contain different datatypes, but often in applications, the datatypes are the same. Another way you can create sets in Python is to use the set constructor:

# Set Constructor
aweful_foods = set(["Tuna", "Tomatoes", "Beef", "Ice Cream"])

Here you first create a list with four strings and then pass that list into the set constructor. You can imagine that the data for both examples above come from a survey about favourite and least favourite foods.

In general, if you already have the data in a list (or another iterable), then using the set constructor is probably the easiest.

It is essential to understand that sets remove duplicates automatically:

i_love = {"You", "You", "You", "You"}
print(i_love)
Output:
{"You"}

Sweet😍

Note: To create an empty set, you can not assign {} to a variable. This will create an empty dictionary. Instead, use set() to create an empty set.

There are a bunch of methods on sets that are useful. I want to tell you about the four most important ones:

You can add an element to a set by using the method add():

favorite_foods.add("Chocolate")
print(favorite_foods)
Output:
{'Ice Cream', 'Pasta', 'Pizza', 'Chocolate', 'Taco'}

Recall that for lists you have the method append() for appending an element to the end of a list. Sets don’t have an ordering. Hence the developers of Python chose to use the name add() instead of append() to emphasize this.

If you have two sets, then you can combine them by using the union() method:

all_foods = favorite_foods.union(aweful_foods)
print(all_foods)
Output:
{'Tomatoes', 'Ice Cream', 'Tuna', 'Pasta', 'Pizza', 'Chocolate', 'Beef', 'Taco'}

Here all the elements from both sets have been combined into a new set called all_foods. Notice that "Ice Cream" only appears once in all_foods, even though it is in both favorite_foods and aweful_foods. Python sets remove duplicates!

If you want to find the elements common to two sets, then you can use the intersection() method:

dividing_foods = favorite_foods.intersection(aweful_foods)
print(dividing_foods)
Output:
{'Ice Cream'}

As you can see, the intersection() method picks out the elements that are in both sets. This method is really useful if you have two groups of votes and want to find what both groups have voted for.

Given two sets, you often want to find the elements in the first set that is not in the second set. To do this, you can use the difference() method:

likable_foods = favorite_foods.difference(aweful_foods)
print(likable_foods)
Output:
{'Pizza', 'Pasta', 'Taco', 'Chocolate'}

In this code example above, every element in aweful_foods have been removed from the set favorite_foods.

Let’s do another example so that you can see how useful sets are. Say that you have the following two lists of programming learning platforms:

platforms = ["Udemy", "Coursea", "YouTube", "DataCamp", "YouTube"]
paid_platforms = ["Udemy", "Coursea", "Skillshare", "DataCamp"]

Say now that you want to find platforms that are free. How would you do that? With sets, you can simply write:

print("Free: ", set(platforms).difference(set(paid_platforms)))Output:
Free: {'YouTube'}
Photo by Jippe Joosten on Unsplash

I showed you a few cool methods in the previous section. But did you know that there are also short-hands for union(), intersection(), and difference()? The short-hand operator syntax is a lot easier to remember and read. Here you can see a code example with all four operators at once:

print("Union: ", favorite_foods | aweful_foods)
Output:
Union: {'Tomatoes', 'Ice Cream', 'Tuna', 'Pasta', 'Pizza', 'Chocolate', 'Beef', 'Taco'}
print("Intersection: ", favorite_foods & aweful_foods)
Output:
Intersection: {"Ice Cream"}
print("Difference: ", favorite_foods - aweful_foods)
Output:
Difference: {'Pizza', 'Pasta', 'Taco', 'Chocolate'}

Personally, I prefer the short-hand operator syntax. It is also super intuitive! For instance, the symbol & often represents the and boolean combiner in programming languages. The and boolean combiner works in the same way as the intersection. Other analogies are also easily drawn for the other two examples.

Let me ask you a simple question. What kind of objects can you put in a set? You might think that the answer is anything. However, this is not true. Consider what happens if you try to run the following code:

invalid_set = {[1, 2, 3], "Good try, though"}TypeError: unhashable type: 'list'

As you can see, Python throws a TypeError if you try to put a list inside a set. The reason is that a list is not hashable. The most precise (but least helpful) way of explaining hashable is that you can call the hash() function on hashable objects. What you usually need to know in practice is that immutable objects are hashable. However, not all hashable objects are immutable.

So you can be sure that anything that is immutable (e.g. integers, strings, and tuples) can be put into sets. However, placing lists and dictionaries into sets are not possible. You can also not put a set inside another set:

nested_set = {1, set([2]), 3}TypeError: unhashable type: 'set'
Photo by Spencer Bergen on Unsplash

Hopefully, you feel more comfortable with sets in Python. If you are interested in data science, programming, or anything in between, then feel free to add me on LinkedIn and say hi ✋

Like my writing? Check out some of my other posts for more Python content:

Leave a Reply

Your email address will not be published.