This website uses cookies to collect usage information in order to offer a better browsing experience. By browsing this site or by clicking on the "ACCEPT COOKIES" button you accept our Cookie Policy.

Python Collections Tutorial

TheAutomatic.net

Contributor:
TheAutomatic.net
Visit: TheAutomatic.net

By:

Blogger, TheAutomatic.net, and Senior Data Scientist

In this post, we’ll discuss the underrated Python collections package, which is part of the standard library. Collections allows you to utilize several data structures beyond base Python.

How to get a count of all the elements in a list

One very useful function in collections is the Counter method, which you can use to return a count of all the elements in a list.

nums = [3, 3, 4, 1, 10, 10, 10, 10, 5]
collections.Counter(nums)

The Counter object that gets returned is also modifiable. Let’s define a variable equal to the result above.

counts = collections.Counter(nums)
 
counts[20] += 1

Notice how we can add the number 20 to our Counter object without having to initialize it with a 0 value.

Counter can also be used if you have a list of tuples. This can be handy if you want to count pairs of numbers or other types of objects.

temp = [(2, 3), (2, 3), (4, 5), (2, 3), (1, 1), (4, 5)]
 
collections.Counter(temp)

How to create default values for Python dictionaries

In a previous post, we discussed Python dictionariesdefaultdict provides a modified dictionary structure that lets you assign a default value for any key. This means that if you try referencing a key that you haven’t defined, defaultdict will return some default value rather than a “KeyError”. The default value can also be a data structure. For example, suppose you want to create a dictionary where you expect each element to be a set. defaultdict can automatically create this for you.

sample_map = collections.defaultdict(set)
 
sample_map["a"].add(5)
sample_map["a"].add(10)
sample_map["b"].add(2)
 
sample_map

If you try to reference a non-existent key, Python will return an empty set since that’s our default value.

sample_map["x"] # returns empty set

Also, note how we don’t need to initialize sample_map[“a”] to an empty set since defaultdict takes care of it for us. This works for other data types as well. For example, the code below will create a default dictionary structure for lists.

list_container = collections.defaultdict(list)
 
list_container["x"].append(5)
list_container["x"].append(3)
 
list_container["y"].append(1)

Using defaultdict is one way of creating adjacency lists in Python to represent graphs. It can be useful in situations where you need to build out the adjacency list based off some prior logic (like running DFS, for example).

In addition to sets and lists, defaultdict also supports ints (integers) as a default value.

collections.defaultdict(int)

# create a dictionary where the default value is 5
def_five = collections.defaultdict(lambda: 5)
 
def_five["a"] = 10
 
# "b" is not a key, but running this line
# will return the default value of 5
def_five["b"]

deque

The deque data structure in collections is a double-ended queue. deque is pronounced like “deck”. A deque works similarly to a list, but is more optimized for appending or popping off elements from either the front of the queue or the end. For instance, if you have a very large list in Python and you want to append an element to the start of the list, this operation takes O(n) time since each element in the list needs to be shifted over by one. In other words, if your list has 100,000 elements and you append an element to the front of the list, this requires 100,000 shift operations to move each element over by one in the list. This could become computationally expensive, especially if you need to do this within a loop, for example.

# create a list with 100,000 elements
nums = list(range(100000))
 
# append -10 to the front of the list
# note that this takes O(n) time, or 100,000 operations since 
# each number if shifted over by one in the list
nums.insert(0, -10)

Doubled-ended queues, on the other hand, can append elements to the front or back in approximately O(1) time. Appending elements to the front of the queue can be done using the appendleft method.

queue = collections.deque()
 
queue.appendleft(5)
queue.appendleft(4)
queue.appendleft(3)

Similarly, to pop elements off from the left side, we can use the popleft method. Again, normally if you were to pop an element off from the left side of of a list, it would take O(n) operations, but with a doubled-ended queue, you can do this in O(1) time.

queue.popleft()

Double-ended queues also comes with operations to append multiple elements at once to either the front or back of the queue. For example, Python lists have an extend method that adds a list of other elements to the current list.

nums = [2, 3, 4]
 
nums.extend([5, 6, 7])
 
# nums = [2, 3, 4, 5, 6, 7]

Double-ended queues have an extendleft method and an extend method to be able to do this for the left and right sides of the list, respectively.

queue.extendleft([1, 2, 3, 4, 5])

Named tuples

Another data structure available is the named tuple. If you’ve used R, you may be familiar with the concept of named vectors where you can reference elements in a vector not only by index, but only by a name. Similarly, named tuples offered by collections let you reference tuple elements by name or index.

Info = collections.namedtuple("info", ["a", "b", "c"])
test = Info(a = 10, b = 20, c = 30)
 
# reference elements by index
test[0] + test[1]
 
# reference elements by name
test.a + test.b

Named tuples can also be useful when working with sqlite, as seen here.

Visit TheAutomatic.net to learn more about this topic: http://theautomatic.net/2021/04/14/python-collections-tutorial/

Disclosure: Interactive Brokers

Information posted on IBKR Traders’ Insight that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Traders’ Insight are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from TheAutomatic.net and is being posted with permission from TheAutomatic.net. The views expressed in this material are solely those of the author and/or TheAutomatic.net and IBKR is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

In accordance with EU regulation: The statements in this document shall not be considered as an objective or independent explanation of the matters. Please note that this document (a) has not been prepared in accordance with legal requirements designed to promote the independence of investment research, and (b) is not subject to any prohibition on dealing ahead of the dissemination or publication of investment research.

Any trading symbols displayed are for illustrative purposes only and are not intended to portray recommendations.

trading top