Getting the size of a file or directory (folder) in Python

Money and Business

Using the Python standard library os, you can get the size (capacity) of a file or the total size of the files contained in a directory.

The following three methods are explained. The units of the sizes that can be obtained are all bytes.

  • Get the size of the file:os.path.getsize()
  • Get the size of a directory by combining the following functions (Python 3.5 or later):os.scandir()
  • Combine the following functions to get the size of the directory (Python 3.4 and earlier):os.listdir()

Get the size of the file: os.path.getsize()

The size (capacity) of the file can be obtained with os.path.getsize().

Give the path of the file whose size you want to get as an argument.

import os

print(os.path.getsize('data/src/lena_square.png'))
# 473831

Get the size of a directory (folder): os.scandir()

To calculate the total size of the files contained in a directory (folder), use os.scandir().

This function was added in Python 3.5, so earlier versions use os.listdir(). os.listdir() example is described later.

Define a function as follows.

def get_dir_size(path='.'):
    total = 0
    with os.scandir(path) as it:
        for entry in it:
            if entry.is_file():
                total += entry.stat().st_size
            elif entry.is_dir():
                total += get_dir_size(entry.path)
    return total

print(get_dir_size('data/src'))
# 56130856

os.scandir() returns an iterator of os.DirEntry object.

DirEntry object, use the is_file() and is_dir() methods to determine whether it is a file or a directory. If it is a file, the size is obtained from the st_size attribute of the stat_result object. In the case of a directory, this function is called recursively to add up all the sizes and return the total size.

In addition, by default, is_file() returns TRUE for symbolic links to files. Also, is_dir() returns true for symbolic links to directories. If you want to ignore symbolic links, set the follow_symlinks argument of is_file() and is_dir() to false.

Also, if you don't need to traverse the subdirectories, you can just delete the following part.

            elif entry.is_dir():
                total += get_dir_size(entry.path)

The above function will fail if the path of the file is passed as an argument. If you need a function to return the size of a file or a directory, you can write the following.

def get_size(path='.'):
    if os.path.isfile(path):
        return os.path.getsize(path)
    elif os.path.isdir(path):
        return get_dir_size(path)

print(get_size('data/src'))
# 56130856

print(get_size('data/src/lena_square.png'))
# 473831

Get the size of a directory (folder): os.listdir()

There is no os.scandir() in Python 3.4 or earlier, so use os.listdir().

Define a function as follows.

def get_dir_size_old(path='.'):
    total = 0
    for p in os.listdir(path):
        full_path = os.path.join(path, p)
        if os.path.isfile(full_path):
            total += os.path.getsize(full_path)
        elif os.path.isdir(full_path):
            total += get_dir_size_old(full_path)
    return total

print(get_dir_size_old('data/src'))
# 56130856

The basic idea is the same as in the case of os.scandir().

What can be obtained with os.listdir() is a list of file names (directory names). Each file name or directory name is joined with the path of the parent directory with os.path.join() to create the full path.

If the target is a symbolic link, os.path.isfile() and os.path.isdir() will judge the entity. So, if you want to ignore symbolic links, use conditional judgment in combination with os.path.islink(), which returns true for symbolic links.

As in the case of os.scandir(), if you don't need to traverse the subdirectories, just delete the following part.

        elif os.path.isdir(full_path):
            total += get_dir_size_old(full_path)

The above function will fail if the path of the file is passed as an argument. If you need a function to return the size of a file or a directory, you can write the following.

def get_size_old(path='.'):
    if os.path.isfile(path):
        return os.path.getsize(path)
    elif os.path.isdir(path):
        return get_dir_size_old(path)

print(get_size_old('data/src'))
# 56130856

print(get_size_old('data/src/lena_square.png'))
# 473831