Python, split to split a comma-separated string, remove whitespace and convert to a list

Money and Business

When splitting a comma-separated string into a list in Python, if there are no spaces in between, just split() will work. If there are spaces, it is useful to combine it with strip() to remove the extra spaces. In addition, using the list comprehension notation is a smart way to write.

In this section, we first explain the following.

  • Split a string with a specified delimiter and return it as a listsplit()
  • Remove extra characters from the beginning and end of a string.strip()
  • List comprehension notation to apply functions and methods to list elements.

It also shows how to make a list of strings separated by spaces and commas by removing spaces, as shown below.
'one, two, three'

In addition, we will discuss the following

  • How to get it as a list of numbers
  • How to use join() to join a list and make it a string again

split(): Split a string with a specified delimiter and return it as a list

Using the method split() for strings, you can split a string with a specified delimiter and get it as a list (array). The specified delimiter can be specified by the following argument.sep

If the argument sep is omitted and no delimiter is specified, it splits the string by spaces and returns a list. Consecutive spaces and tabs will also split the list, so if you want to make a list of tab-delimited strings, you can use split() without the argument.

s = 'one two three'
l = s.split()
print(l)
# ['one', 'two', 'three']

s = 'one two        three'
l = s.split()
print(l)
# ['one', 'two', 'three']

s = 'one\ttwo\tthree'
l = s.split()
print(l)
# ['one', 'two', 'three']

If a delimiter is specified in the sep argument, it divides the list by that string and returns a list.

s = 'one::two::three'
l = s.split('::')
print(l)
# ['one', 'two', 'three']

In the case of a comma-separated string, if there is no extra white space, there is no problem, but if you run split() with a comma as the delimiter for a string separated by a comma + white space, you will end up with a list of strings with white space left at the beginning.

s = 'one,two,three'
l = s.split(',')
print(l)
# ['one', 'two', 'three']

s = 'one, two, three'
l = s.split(',')
print(l)
# ['one', ' two', ' three']

You can use a comma + space as the delimiter as follows, but it will not work if the number of spaces in the original string is different.', '

s = 'one, two, three'
l = s.split(', ')
print(l)
# ['one', 'two', 'three']

s = 'one, two,  three'
l = s.split(', ')
print(l)
# ['one', 'two', ' three']

The string method strip(), which will be explained next, can be used to deal with two spaces.

strip(): Remove extra characters from the beginning and end of a string.

strip() is a method to remove extra characters from the beginning and end of a string.

If the argument is omitted, a new string is returned with whitespace characters removed. The original string itself is not changed.

s = '  one  '
print(s.strip())
# one

print(s)
#   one  

If a string is specified as an argument, the characters contained in the string will be removed.

s = '-+-one-+-'
print(s.strip('-+'))
# one

In this case, spaces are not removed. Therefore, if you want to remove whitespace as well, pass a string including spaces as an argument, as shown below.'-+ '

s = '-+- one -+-'
print(s.strip('-+'))
#  one 

s = '-+- one -+-'
print(s.strip('-+ '))
# one

strip() handles both ends, but the following functions are also available.

  • lstrip():Process only the beginning
  • rstrip():Process the end of the line only.

List comprehension notation: apply functions and methods to list elements

If you want to apply a function or method to the elements of a list, it is smart to use the list comprehension notation instead of the for loop if you want to get the list in the end.

Here, we apply strip() to the list obtained by splitting the string with split(). The extra whitespace in a comma-separated string containing whitespace can be removed to make a list.

s = 'one, two, three'
l = [x.strip() for x in s.split(',')]
print(l)
# ['one', 'two', 'three']

When this is applied to an empty string, a list with a single empty string as an element can be obtained.

s = ''
l = [x.strip() for x in s.split(',')]
print(l)
print(len(l))
# ['']
# 1

If you want to get an empty list for an empty string, you can set up a conditional branch in the list comprehension notation.

s = ''
l = [x.strip() for x in s.split(',') if not s == '']
print(l)
print(len(l))
# []
# 0

'one, , three'
Also, if a comma-separated element is missing, as described above, the first method will list it as an empty string element.

s = 'one, , three'
l = [x.strip() for x in s.split(',')]
print(l)
print(len(l))
# ['one', '', 'three']
# 3

If you want to ignore the missing parts, you can set up a conditional branch in the list comprehension notation.

s = 'one, ,three'
l = [x.strip() for x in s.split(',') if not x.strip() == '']
print(l)
print(len(l))
# ['one', 'three']
# 2

Get as a list of numbers

If you want to get a comma-separated string of numbers as a list of numbers instead of a string, apply int() or float() to convert the string to a number in the list comprehension notation.

s = '1, 2, 3, 4'
l = [x.strip() for x in s.split(',')]
print(l)
print(type(l[0]))
# ['1', '2', '3', '4']
# <class 'str'>

s = '1, 2, 3, 4'
l = [int(x.strip()) for x in s.split(',')]
print(l)
print(type(l[0]))
# [1, 2, 3, 4]
# <class 'int'>

join(): Merge a list and get it as a string

In the opposite pattern, if you want to join a list and get strings separated by a specific delimiter, use the join() method.

It is easy to make a mistake, but note that join() is a string method, not a list method. The list is specified as an argument.

s = 'one, two,  three'
l = [x.strip() for x in s.split(',')]
print(l)
# ['one', 'two', 'three']

print(','.join(l))
# one,two,three

print('::'.join(l))
# one::two::three

You can write it in one line as follows.

s = 'one, two,  three'
s_new = '-'.join([x.strip() for x in s.split(',')])
print(s_new)
# one-two-three

If you just want to change a fixed delimiter, it is easier to replace it with the replace() method.

s = 'one,two,three'
s_new = s.replace(',', '+')
print(s_new)
# one+two+three