Be careful when reading a csv with a comma followed by a space in Python

Money and Business

In Python, you can easily read and write csv files using the standard csv module.

For example, suppose you have the following csv, sample.csv.

11,12,13,14
21,22,23,24
31,32,33,34

This can be read as follows.

import csv

with open('data/src/sample.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)
# ['11', '12', '13', '14']
# ['21', '22', '23', '24']
# ['31', '32', '33', '34']

What you need to be careful of here is when there is a space after the comma. Normally, there should be no unnecessary spaces after the comma, but sometimes I see files with spaces in them.

In such cases, by default, whitespace is not ignored and the file is read as is.

11, 12, 13, 14
21, 22, 23, 24
31, 32, 33, 34

In other words, if you read the above file with a comma followed by a space, the output will be as follows

with open('data/src/sample_space.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)
# ['11', ' 12', ' 13', ' 14']
# ['21', ' 22', ' 23', ' 24']
# ['31', ' 32', ' 33', ' 34']

If you specify the following in csv.reader, the spaces after the comma will be skipped.
skipinitialspace=True

with open('data/src/sample_space.csv', 'r') as f:
    reader = csv.reader(f, skipinitialspace=True)
    for row in reader:
        print(row)
# ['11', '12', '13', '14']
# ['21', '22', '23', '24']
# ['31', '32', '33', '34']

In a simple example like the one above, you can use strip() to remove the whitespace. The problem is when it is surrounded by double quotation marks like the following.

"one,one", "two,two", "three,three"

The part surrounded by double quotation marks should be regarded as a single element, but if skipinitialspace=False (the default), it will look like the following.

with open('data/src/sample_double_quotation.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)
# ['one,one', ' "two', 'two"', ' "three', 'three"']

This can be done by setting skipinitialspace=True.

with open('data/src/sample_double_quotation.csv', 'r') as f:
    reader = csv.reader(f, skipinitialspace=True)
    for row in reader:
        print(row)
# ['one,one', 'two,two', 'three,three']

The same is true when reading a csv file with read_csv() in pandas. If the csv file has a space after the comma, you can do the following.
read_csv(skipinitialspace=True)