At the Forge: Python's Mypy

In my last article, I introduced Mypy, a package that enforces type checking in Python programs. Python itself is, and always will remain, a dynamically typed language. However, Python 3 supports "annotations", a feature that allows you to attach an object to variables, function parameters and function return values. These annotations are ignored by Python itself, but they can be used by external tools.

Mypy is one such tool, and it's an increasingly popular one. The idea is that you run Mypy on your code before running it. Mypy looks at your code and makes sure that your annotations correspond with actual usage. In that sense, it's far stricter than Python itself, but that's the whole point.

In my last article, I covered some basic uses for Mypy. Here, I want to expand upon those basics and show how Mypy really digs deeply into type definitions, allowing you to describe your code in a way that lets you be more confident of its stability.

Type Inference

This first defines the variable x, giving it a type annotation of int. It also assigns it to the integer 5. On the next line, it assigns x the string abc. And on the third line, it prints the value of x.

The Python language itself has no problems with the above code. But if you run mypy against it, you'll get an error message:

As the message says, the code declared the variable to have type int, but then assigned a string to it. Mypy can figure this out because, despite what many people believe, Python is a strongly typed language. That is, every object has one clearly defined type. Mypy notices this and then warns that the code is assigning values that are contrary to what the declarations said.

In the above code, you can see that I declared x to be of type int at definition time, but then assigned it to a string, and then I got an error. What if I don't add the annotation at all? That is, what if I run the following code via Mypy:

You might think that Mypy would ignore it, because I didn't add any annotation. But actually, Mypy infers the type of value a variable should contain from the first value assigned to it. Because I assigned an integer to x in the first line, Mypy assumed that x should always contain an integer.

This means that although you can annotate variables, you typically don't have to do so unless you're declaring one type and then might want to use another, and you want Mypy to accept both.

Defining Dictionaries

Python's dict ("dictionary") type is probably the most important in the entire language. It would seem, at first glance, that name-value pairs aren't very exciting or important. But when you think about how often programs use name-value pairs—for variables, namespaces, user name-ID associations—it becomes clear just how necessary this can be.

Dictionaries also are used as small databases, or structures, for keeping track of data. For many people new to Python, it seems natural to define a new class whenever they need a new data type. But for many Python users, it's more natural to use a dictionary. Or if you need a collection of them, a list of dicts.

For example, assume that I want to keep track of prices on various items in a store. I can define the store's price list as a dictionary, in which the keys are the item names and the values are the item prices. For example:

What happens if I accidentally try to add a new item to the menu, but mix up the name and value? For example:

Python doesn't care; as far as it's concerned, you can have any hashable type as a key and absolutely any type as as value. But of course, you do care, and it might be nice to tighten up the code to ensure you don't make this mistake.

Here's a great thing about Mypy: it'll do this for you automatically, without you saying anything else. If I take the above two lines, put them into a Python file, and then check the program with Mypy, I get the following:

In other words, Mypy noticed that the dictionary was (implicitly) set to have strings as keys and ints and values, simply because the initial definition was set that way. It then noticed that it was trying to assign a new key-value pair with different types and pointed to the problem.

Let's say, however, that you want to be explicit. You can do that by using the typing module, which defines annotation-friendly versions of many built-in types, as well as many new types designed for this purpose. Thus, I can say:

In other words, when I define my menu variable, I also give it a type annotation. This type annotation makes explicit what Mypy implied from the dict's definition—namely that keys should be strings and values should be ints. So, I got the following error message from Mypy:

What if I want to raise the price of the soup by 0.5? Then the code looks like this:

As I explained in my last article, you can use a Union to define several different options:

With this in place, Mypy knows that the keys must be strings, but the values can be either ints or floats. So, this silences the complaint about the soup's price being 8.5, but retains the warning about the reversed assignment regarding muffins.

Optional Values

In my last article, I showed how when you define a function, you can annotate not only the parameters, but also the return type. For example, let's say I want to implement a function, doubleget, that takes two arguments: a dictionary and a key. It returns the value associated with the key, but doubled. For example:

This is fine, but what happens if the user passes a key that isn't in the dict? This will end up raising a KeyError exception. I'd like to do what the dict.get method does—namely return None if the key is unknown. So, my implementation will look like this:

From Python's perspective, this is totally fine; it'll get 14 back from the first call and None back from the second. But from Mypy's perspective, there is a problem: this indicated that the function will always return an integer, and now it's returning None:

I should note that Mypy doesn't flag this problem when you call the function. Rather, it notices that you're allowing the function to return a None value in the function definition itself.

One solution is to use a Union type, as I showed earlier, allowing an integer or None to be returned. But that doesn't quite express what the goal is here. What I would like to do is say that it might return an integer, but it might not—meaning, more or less, that the returned integer is optional.

By annotating the function's return type with Optional[int], this is saying that if something is returned, it will be an integer. But, it's also okay to return None.

Optional is useful not only when you're returning values from a function, but also when you're defining variables or object attributes. It's pretty common, for example, for the __init__ method in a class to define all of an object's attributes, even those that aren't defined in __init__ itself. Since you don't yet know what values you want to set, you use the None value. But of course, that then means the attribute might be equal to None, or it might be equal to (for example) an integer. By using Optional when setting the attribute, you signal that it can be either an integer or a None value.

From Python's perspective, there isn't any issue. But you might like to say that both x and y must be integers, except for when y is initialized and set to None. You can do that as follows:

Notice that there are three type annotations here: on the parameter x (int), on the attribute self.x (also int) and on the attribute self.y (which is Optional[int]). Python won't complain if you break these rules, but if you still have the code that was run before:

Sure enough, you now can assign either None or an integer to f.y. But if you try to set any other type, you'll get a warning from Mypy.

Conclusion

Mypy is a huge step forward for large-scale Python applications. It promises to keep Python the way you've known it for years, but with added reliability. If your team is working on a large Python project, it might well make sense to start incorporating Mypy into your integration tests. The fact that it runs outside the language means you can add Mypy slowly over time, making your code increasingly robust.

Resources

You can read more about Mypy here. That site has documentation, tutorials and even information for people using Python 2 who want to introduce mypy via comments (rather than annotations).

About the Author

Reuven Lerner teaches Python, data science and Git to companies around the world. You can subscribe to his free, weekly "better developers" e-mail list, and learn from his books and courses at http://lerner.co.il. Reuven lives with his wife and children in Modi'in, Israel.