Refactoring machine learning code - namedtuple :: Päpper's Machine Learning Blog — This blog features state of the art applications in machine learning with a lot of PyTorch samples and deep learning code. You will learn about neural network optimization and potential insights for artificial intelligence for example in the medical domain.

Instead of using sometimes confusing indexing in your code, use a namedtuple instead. It’s backwards compatible, so you can still use the index, but you can make your code much more readable.

This is especially helpful when you transform between PIL and numpy based code, where PIL uses a column, row notation while numpy uses a row, column notation.

Let’s consider this piece of code where we want to get the pixel locations of several points which are in the numpy format:

def get_image_values(image_path, locations):
    img = Image.open(image_path).load()
    return [img[point[1], point[0]] for point in locations]

values = get_image_values('my_image.png', [[2, 3], [5, 3], [7, 9]])

To me this code is hard to read, because it’s not clear what these point[1] and point[0] refer to and also why is point[1] before point[0]?

Let’s use a namedtuple to make this much easier to read:

from collections import namedtuple

Point = namedtuple('Point', ['row', 'column'])

def get_image_values(image_path, locations):
    img = Image.open(image_path).load()
    return [img[point.column, point.row] for point in locations]

values = get_image_values('my_image.png', [
    Point(row=2, column=3),
    Point(row=5, column=3),
    Point(row=7, column=9)]
)

Note that you can still use point[0] = point.row and point[1] = point.column.

Refactoring machine learning code - namedtuple

This article covers:

I help you listen through the noise in machine learning: