Counting the number of occurrences of each element in a list with Python’s Counter

Money and Business

In Python, the number of all elements in a list or tuple can be obtained using the built-in function len(), and the number of each element (the number of occurrences of each element) can be obtained using the count() method.

In addition, the Counter class of the Python standard library collections can be used to get the elements in order of the number of occurrences.

In this section, we will discuss the following

  • Count the total number of elements:len()
  • Count the number of each element (the number of occurrences of each element):count()
  • Usage.collections.Counter
  • Elements are retrieved in order of frequency of occurrence:most_common()
  • Count the number (type) of non-overlapping elements (unique elements).
  • Count the number of elements that satisfy the condition.

In addition, as a concrete example, the following is explained with sample code.

  • Counts the number of occurrences of a word in a string.
  • Count the number of occurrences of a character in a string.

The sample is a list, but the same processing can be done with tuples.

Count the total number of elements: len()

To count the total number of elements in a list or tuple, use the built-in function len().

l = ['a', 'a', 'a', 'a', 'b', 'c', 'c']

print(len(l))
# 7

Counting the number of each element (the number of occurrences of each element): count() method

To count the number of each element (the number of occurrences of each element), use the count() method for lists, tuples, etc.

If a value that does not exist as an element is passed as an argument, 0 is returned.

l = ['a', 'a', 'a', 'a', 'b', 'c', 'c']

print(l.count('a'))
# 4

print(l.count('b'))
# 1

print(l.count('c'))
# 2

print(l.count('d'))
# 0

If you want to get the number of occurrences of each element at once, the following collection.Counter is useful.

How to use collections.Counter

The Python standard library collections has a Counter class.

Counter() is a subclass of the dictionary type dict, which has data in the form of elements as keys and occurrences as values.

import collections

l = ['a', 'a', 'a', 'a', 'b', 'c', 'c']

c = collections.Counter(l)
print(c)
# Counter({'a': 4, 'c': 2, 'b': 1})

print(type(c))
# <class 'collections.Counter'>

print(issubclass(type(c), dict))
# True

If an element is specified as a key, the number of elements can be obtained. If a value that does not exist as an element is specified, 0 is returned.

print(c['a'])
# 4

print(c['b'])
# 1

print(c['c'])
# 2

print(c['d'])
# 0

You can also use dictionary type methods such as keys(), values(), items(), etc.

print(c.keys())
# dict_keys(['a', 'b', 'c'])

print(c.values())
# dict_values([4, 1, 2])

print(c.items())
# dict_items([('a', 4), ('b', 1), ('c', 2)])

These methods return objects of type dict_keys, etc. They can be used as is if you want to run a for statement. If you want to convert it to a list, use list().

Obtaining elements in order of frequency of appearance: most_common() method

Counter has the most_common() method, which returns a list of tuples of the form (element, number of occurrences) sorted by the number of occurrences.

print(c.most_common())
# [('a', 4), ('c', 2), ('b', 1)]

The element with the highest number of occurrences can be obtained by specifying an index, such as [0] for the highest number of occurrences and [-1] for the lowest number of occurrences. If you want to get only the elements or only the number of occurrences, you can specify the index further.

print(c.most_common()[0])
# ('a', 4)

print(c.most_common()[-1])
# ('b', 1)

print(c.most_common()[0][0])
# a

print(c.most_common()[0][1])
# 4

If you want to sort them in order of decreasing number of occurrences, use the slice with the increment set to -1.

print(c.most_common()[::-1])
# [('b', 1), ('c', 2), ('a', 4)]

If the argument n is specified for the most_common() method, only the n elements with the highest number of occurrences are returned. If it is omitted, all elements.

print(c.most_common(2))
# [('a', 4), ('c', 2)]

If you want a separate list of elements/occurrences ordered by the number of occurrences, rather than a tuple of (element, occurrence count), you can decompose it as follows

values, counts = zip(*c.most_common())

print(values)
# ('a', 'c', 'b')

print(counts)
# (4, 2, 1)

The built-in function zip() is used to transpose a two-dimensional list (in this case, a list of tuples), and then unpack and extract it.

Count the number (type) of non-overlapping elements (unique elements).

To count how many non-overlapping elements (unique elements) there are in a list or tuple (how many types there are), use Counter or set() as described above.

The number of elements in the Counter object is equal to the number of non-overlapping elements in the original list, which can be obtained with len().

l = ['a', 'a', 'a', 'a', 'b', 'c', 'c']
c = collections.Counter(l)

print(len(c))
# 3

You can also use set(), the constructor for the set type set, which is easier if you don't need a Counter object.

The set type is a data type that does not have duplicate elements. Passing a list to set() ignores duplicate values and returns an object of type set with only unique values as elements. The number of elements of this type is obtained by len().

print(set(l))
# {'a', 'c', 'b'}

print(len(set(l)))
# 3

Count the number of elements that satisfy the condition.

To count the number of elements in a list or tuple that satisfy a certain condition, use list comprehension notation or generator expressions.

As an example, count the number of elements with negative values for the following list of numbers

l = list(range(-5, 6))
print(l)
# [-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5]

Applying a conditional expression to each element in list comprehension notation yields a list whose elements are Boolean bools (true, false). The Boolean type bool is a subclass of the integer type int, where true is treated as 1 and false as 0. Therefore, the number of true values (the number of elements that satisfy the condition) can be counted by calculating the sum using sum().

print([i < 0 for i in l])
# [True, True, True, True, True, False, False, False, False, False, False]

print(sum([i < 0 for i in l]))
# 5

If we replace [] in the list comprehension notation with (), we get a generator expression. The list comprehension notation generates a list of all the elements processed, while the generator expression processes the elements sequentially and is therefore more memory efficient.

When the generator expression is the only argument, () can be omitted, so it can be written as in the latter case.

print(sum((i < 0 for i in l)))
# 5

print(sum(i < 0 for i in l))
# 5

If you want to count the number of false values (the number of elements that do not satisfy the condition), use not. Note that > has a higher precedence than not (it is calculated first), so the parentheses () in (i < 0) in the following example are not necessary.

print([not (i < 0) for i in l])
# [False, False, False, False, False, True, True, True, True, True, True]

print(sum(not (i < 0) for i in l))
# 6

Of course, the conditions themselves can be changed.

print(sum(i >= 0 for i in l))
# 6

Some other examples are shown below.

Example of getting the number of odd elements for a list of numbers.

print([i % 2 == 1 for i in l])
# [True, False, True, False, True, False, True, False, True, False, True]

print(sum(i % 2 == 1 for i in l))
# 6

Example of a condition for a list of strings.

l = ['apple', 'orange', 'banana']

print([s.endswith('e') for s in l])
# [True, True, False]

print(sum(s.endswith('e') for s in l))
# 2

Counter is used to count based on the number of occurrences. items() retrieves a tuple of (element, number of occurrences), and the number of occurrences specifies the condition.

The following is an example of extracting elements with two or more occurrences and counting the total number of occurrences. In this example, there are four a's and two c's, for a total of six.

l = ['a', 'a', 'a', 'a', 'b', 'c', 'c']
c = collections.Counter(l)

print(c.items())
# dict_items([('a', 4), ('b', 1), ('c', 2)])

print([i for i in l if c[i] >= 2])
# ['a', 'a', 'a', 'a', 'c', 'c']

print([i[1] for i in c.items() if i[1] >= 2])
# [4, 2]

print(sum(i[1] for i in c.items() if i[1] >= 2))
# 6

The following is an example of extracting the types of elements with two or more occurrences and counting the number of occurrences. In this example, there are two types, a and c.

print([i[0] for i in c.items() if i[1] >= 2])
# ['a', 'c']

print([i[1] >= 2 for i in c.items()])
# [True, False, True]

print(sum(i[1] >= 2 for i in c.items()))
# 2

Counts the number of occurrences of a word in a string.

As a concrete example, let's count the number of occurrences of a word in a string.

First, replace unnecessary commas and periods with an empty string using the replace() method, and then delete them. Then, use the split() method to create a list separated by spaces.

s = 'government of the people, by the people, for the people.'

s_remove = s.replace(',', '').replace('.', '')

print(s_remove)
# government of the people by the people for the people

word_list = s_remove.split()

print(word_list)
# ['government', 'of', 'the', 'people', 'by', 'the', 'people', 'for', 'the', 'people']

If you can make a list, you can get the number of times each word appears, the types of words that appear, and the most_common() of collections.Counter to get the word that appears the most times.

print(word_list.count('people'))
# 3

print(len(set(word_list)))
# 6

c = collections.Counter(word_list)

print(c)
# Counter({'the': 3, 'people': 3, 'government': 1, 'of': 1, 'by': 1, 'for': 1})

print(c.most_common()[0][0])
# the

The above is a very simple process, so it is better to use libraries such as NLTK for more complex natural language processing.

Also, in the case of Japanese text, split() cannot be used to split the text because there is no clear word separation. For example, you can use the Janome library to achieve this.

Count the number of occurrences of a character in a string.

Since strings are also a sequence type, they can be used with the count() method or passed as an argument to the constructor of collections.Counter().

s = 'supercalifragilisticexpialidocious'

print(s.count('p'))
# 2

c = collections.Counter(s)

print(c)
# Counter({'i': 7, 's': 3, 'c': 3, 'a': 3, 'l': 3, 'u': 2, 'p': 2, 'e': 2, 'r': 2, 'o': 2, 'f': 1, 'g': 1, 't': 1, 'x': 1, 'd': 1})

Example of retrieving the top 5 most frequently occurring characters.

print(c.most_common(5))
# [('i', 7), ('s', 3), ('c', 3), ('a', 3), ('l', 3)]

values, counts = zip(*c.most_common(5))

print(values)
# ('i', 's', 'c', 'a', 'l')