Everything You Should Know about Python Zip Function | by Grzegorz Sikora | Aug, 2022

Code examples included

Photo by Tomas Sobek on Unsplash

Although I am not a professional programmer but a mathematician who deals with data analysis, I often use Python. In this article, I’d like to share my advice on the Python built-in zip() function that I quite often use. This function is very often used when we want to manipulate data from different iterables. It allows their elements to be joined together in the order in which they appear. Thanks to its application, we can iterate multiple iterable objects in parallel. Therefore, this function is very often used in for loops. Every time you need to iterate multiple iterables (like lists, dictionaries iterators and so on) definitely you will need the zip() function. In this short story, I will describe the properties of the zip() function along with simple examples.

  1. Definition of zip()

The Python zip() function iterates over several (most often two, but could be any number) iterables in parallel, producing tuples with an item from each one. The output of zip() is the Python object which is an iterator of tuples. So exactly, i-th tuple is a collection of i-th elements of input iterables in order of those input iterables given in zip() function. The zip() is lazy, which causes that elements won’t be processed until the output iterable is iterated on, e.g. by a for loop or by wrapping in a list. Let us check a simple example:

n = [1, 2, 3, 4]
l = ['a', 'b', 'c', 'd']
zip(n, l)
# result:
<zip at 0x7f6fb6b0bd20>

Indeed, the zip() is lazy. Let us try with list() built-in function to see zipped elements:

n = [1, 2, 3, 4]
l = ['a', 'b', 'c', 'd']
list(zip(n, l))
# result:
[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

Equivalently, using the for loop:

n = [1, 2, 3, 4]
l = ['a', 'b', 'c', 'd']
for element in zip(n, l):
print(element)
# result:
(1, 'a')
(2, 'b')
(3, 'c')
(4, 'd')

or:

n = [1, 2, 3, 4]
l = ['a', 'b', 'c', 'd']
for element_1, element_2 in zip(n, l):
print(element_1, element_2)
# result:
1 a
2 b
3 c
4 d

So, we basically know what zip() produces. Let us check zip() iterator behavior:

n = [1, 2, 3, 4]
l = ['a', 'b', 'c', 'd']
zip_object = zip(n, l)
for element in zip_object:
print(element)
list(zip_object)
# result:
(1, 'a')
(2, 'b')
(3, 'c')
(4, 'd')
[]

The same code as before with one extra line list(zip_object) at the end gives us an empty list because zip_object as an iterator was exhausted inside the for loop (printed all elements of this iterator). Let us check the behavior with next() method:

n = [1, 2, 3, 4]
l = ['a', 'b', 'c', 'd']
zip_object = zip(n, l)
print(next(zip_object))
print(next(zip_object))
list(zip_object)
# result:
(1, 'a')
(2, 'b')
[(3, 'c'), (4, 'd')]

We see that the first two elements of zip_object were called by the next() method and printed, and therefore, finally listed zip_object has two last elements. Let us check several next()calls:

n = [1, 2, 3, 4]
l = ['a', 'b', 'c', 'd']
zip_object = zip(n, l)
print(next(zip_object))
print(next(zip_object))
print(next(zip_object))
print(next(zip_object))
next(zip_object)
# result:
(1, 'a')
(2, 'b')
(3, 'c')
(4, 'd')
--------------------------------------------------------------------StopIteration Traceback (most recent call last)<ipython-input-2-b10ae380d83b> in <module>()
6 print(next(zip_object))
7 print(next(zip_object))
----> 8 next(zip_object)
StopIteration:

We got a standard StopIteration error for iterators, after too many next() calls.

2. Length cases

Input iterables passed to zip() could have different lengths. Special cases are:

  • zip() with no arguments, returns an empty iterator:
list(zip())# result:
[]
  • With a single iterable argument, the zip() function returns an iterator of 1-tuples:
list(zip([1, 2, 3, 4))# result:
[(1,), (2,), (3,), (4,)]

Iterables of different lengths could be caused by design, and sometimes because of a bug in the code that prepared these iterables. We can deal with this in three different ways:

  • By default, zip() stops when the shortest input iterable is exhausted. Therefore we lose the remaining items in the longer iterables. The result is cut to the length of the shortest iterable:
n = [1, 2, 3, 4, 5, 6]
l = ['a', 'b', 'c', 'd']
list(zip(n, l))
# result:
[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
  • zip() is often used in cases where the iterables are assumed to be of equal length. In such cases, I recommend to use the strict=True option of the zip() function (since Python 3.10). Unlike the default behavior, it checks that the lengths of iterables are identical, raising a ValueError if they aren’t. Without the strict=True argument, any bug that results in iterables of different lengths will be not noticed, and possibly later become a hard-to-find bug in another part of the code. Let us check an example:
n = [1, 2, 3, 4, 5, 6]
l = ['a', 'b', 'c', 'd']
list(zip(n, l, strict=True))
# result:
--------------------------------------------------------------------ValueError Traceback (most recent call last)<ipython-input-2-b10ae380d83b> in <module>
ValueError: zip() argument 1 is longer than argument 2
  • Shorter iterables can be padded with a constant value to make all the iterables have the same length. This is done by zip_longest() from itertools library. Its fillvalue input parameter is the value extending the shorter iterable. Let us have a check:
from itertools import zip_longestn = [1, 2, 3, 4, 5, 6]
l = ['a', 'b', 'c', 'd']
list(zip_longest(n, l, fillvalue='_'))
# result:
[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, '_'), (6, '_')]

3. Sorting zip object

Sorting the zip object is according to the first input iterable. Let us see an example:

n = [3, 2, 1, 4]
l = ['a', 'b', 'c', 'd']
list(sorted(zip(n, l)))
# result:
[(1, 'c'), (2, 'b'), (3, 'a'), (4, 'd')]

4. Unzipping trick

The zip() function in conjunction with the * operator can be used to unzip a list (or more generally an iterable). Let us see an example:

n = [1, 2, 3, 4]
l = ['a', 'b', 'c', 'd']
zip_object = zip(n, l)
first, second = zip(*zip_object)
print('First iterable in zip:', list(first))
print('Second iterable in zip:', list(second))
# result:
First iterable in zip: [1, 2, 3, 4]
Second iterable in zip: ['a', 'b', 'c', 'd']

5. Bonus: chunking trick

The left-to-right evaluation order of the input iterables is guaranteed in the zip() function. This makes the possibility for clustering a data series into equally-length groups. Let us start with a list:

n = [1, 2, 3, 4, 5, 6, 7, 8]
list(zip(*[n]*2))
# result:
[(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8)]

First, [n]*2 creates a list with two lists (repeated list n). This is equivalent to [n, n]. Next, the operator * unpacks those two lists (the same lists) n and put them as arguments to the zip() function. Therefore, the result is the same as for list(zip(n,n)).

Now the tricky thing. Instead of the list n, we put an iterator iter(n). This repeats the same iterator n times so that each output tuple has the result of n calls to the iterator. But with each call of such iterator we are consecutively losing its initial element and finally get:

n = [1, 2, 3, 4, 5, 6, 7, 8]
list(zip(*[iter(n)]*2))
# result:
[(1, 2), (3, 4), (5, 6), (7, 8)]

The power of this method is when our initial list is very long and we would like to chunk our data with any chunk size. Let us see an example below:

n = list(range(100)
chunk_size = 5
list(zip(*[iter(n)]*chunk_size))
# result:
[(0, 1, 2, 3, 4),
(5, 6, 7, 8, 9),
...,
(95, 96, 97, 98, 99)]

That is all in my short story about the zip() function. I hope that you will apply some of the presented tricks in your coding. Try to play with the zip() function with a different number of iterables and different types of them.

If you are interested in Data Science topics and you think my article is valuable, you can follow me on LinkedIn or Medium. I would be more than happy to discuss any Data Science, Stats, Maths, or ML topic with you. You can also become a Medium member, get unlimited access to all the content, and support all writers using my referral link. Thanks, Greg!

Leave a Reply

Your email address will not be published.