mypy
. By
Reuven M. LernerI've been using dynamic languages—Perl, Ruby and Python—for many years. I love the flexibility and expressiveness that such languages provide. For example, I can define a function that sums numbers:
def mysum(numbers):
total = 0
for one_number in numbers:
total += one_number
return total
The above function will work on any iterable that returns numbers. So I can run the above on a list, tuple or set of numbers. I can even run it on a dictionary whose keys are all numbers. Pretty great, right?
Yes, but for my students who are used to static, compiled languages, this is a very hard thing to get used to. After all, how can you make sure that no one passes you a string, or a number of strings? What if you get a list in which some, but not all, of the elements are numeric?
For a number of years, I used to dismiss such worries. After all, dynamic languages have been around for a long time, and they have done a good job. And really, if people are having these sorts of type mismatch errors, then maybe they should be paying closer attention. Plus, if you have enough testing, you'll probably be fine.
But as Python (and other dynamic languages) have been making inroads into large companies, I've become increasingly convinced that there's something to be said for type checking. In particular, the fact that many newcomers to Python are working on large projects, in which many parts need to interoperate, has made it clear to me that some sort of type checking can be useful.
How can you balance these needs? That is, how can you enjoy Python as a dynamically typed language, while simultaneously getting some added sense of static-typing stability?
One of the most popular answers is a system known as mypy
, which
takes advantage of Python 3's type annotations for its own purposes.
Using mypy
means that you can write and run Python in the normal way,
gradually adding static type checking over time and checking it
outside your program's execution.
In this article, I start exploring mypy
and how you can use it to
check for problems in your programs. I've been impressed by
mypy
,
and I believe you're likely to see it deployed in a growing number
of places, in no small part because it's optional, and thus allows
developers to use it to whatever degree they deem necessary,
tightening things up over time, as well.
In Python, users enjoy not only dynamic typing, but also strong typing. "Dynamic" means that variables don't have types, but that values do. So you can say:
>>> x = 100
>>> print(type(x))
int
>>> x = 'abcd'
>>> print(type(x))
str
>>> x = [10, 20, 30]
>>> print(type(x))
list
As you can see, I can run the above code, and it'll work just fine. It's not particularly useful, per se, but it never would pass even a first-pass compilation in a statically compiled language. That's because in such languages, variables have types—meaning that if you try to assign an integer to a string variable, you'll get an error.
In a dynamic language, by contrast, variables don't have types at
all. Running the type
function, as I did above, doesn't actually
return the variable's type, but rather the type of data to which the
variable currently points.
Just because a language is dynamically typed doesn't mean that it's totally loosey-goosey, letting you do whatever you want. (And yes, that is the technical term.) For example, I can try this:
>>> x = 1
>>> y = '1'
>>> print(x+y)
That code will result in an error, because Python doesn't know how to add integers and strings together. It can add two integers (and get an integer result) or two strings (and get a string result), but not a combination of the two.
The mysum
function that you saw earlier assigns 0 to the local
"total" variable, and then adds each of the elements of
numbers
to
it. This means that if numbers
contains any non-numbers, you're
going to be in trouble. Fortunately, mypy
will be able to solve
this problem for you.
Python 3 introduced the idea of "type annotations," and as of Python 3.6, you can annotate variables, not just function parameters and return values. The idea is that you can put a colon (:) and then a type following parameter names. For example:
def hello(name:str):
return f'Hello, {name}'
Here, I've given the name
parameter a type annotation of
str
. If
you've used a statically typed language, you might believe that
this will add an element of type safety. That is, you might think
that if I try to execute:
hello(5)
I will get an error. But in actuality, Python will ignore these type annotations completely. Moreover, you can use any object you want in an annotation; although it's typical to use a type, you actually can use anything.
This might strike you as completely ridiculous. Why introduce such annotations, if you're never going to use them? The basic idea is that coding tools and extensions will be able to use the annotations for their own purposes, including (as you'll see in just a bit) for the purposes of type checking.
This is important, so I'll repeat and stress it: type annotations are
ignored by the Python language, although it does store them in an
attribute called __annotations__
. For example, after defining the
above hello
function, you can look at its annotations, which are
stored as a dictionary:
>>> hello.__annotations__
{'name': <class 'str'>}
The mypy
type checker can be downloaded and installed with the
standard Python pip
package installer. On my system, in a terminal
window, I ran:
$ pip3 install -U mypy
The pip3
reflects that I'm using Python 3, rather than Python 2.
And the -U
option indicates that I'd like to upgrade my installation
of mypy
, if the package has been updated since I last installed it
on my computer. If you're installing this package globally and for
all users, you might well need to run this as root, using sudo
.
Once mypy
is installed, you can run it, naming your file. For
example, let's assume that hello.py looks like this:
def hello(name:str):
return f"Hello, {name}"
print(hello('world'))
print(hello(5))
print(hello([10, 20, 30]))
If I run this program, it'll actually work fine. But I'd like to use that type annotation to ensure that I'm only invoking the function with a string argument. I can thus run, on the command line:
$ mypy ./hello.py
And I get the following output:
hello.py:7: error: Argument 1 to "hello" has incompatible type
↪"int"; expected "str"
hello.py:8: error: Argument 1 to "hello" has incompatible type
↪"List[int]"; expected "str"
Sure enough, mypy
has identified two places in which the
expectation that I've expressed with the type annotation—namely,
that only strings will be passed as arguments to "hello"—has been
violated. This doesn't bother Python, but it should bother you, either
because the type annotation needs to be loosened up, or because (as in
this case), it's calling the function with the wrong type of argument.
In other words, mypy
won't tell you what to do or stop you from
running your program. But it will try to give you warnings, and if you
hook this together with a Git hook and/or with an integration and
testing system, you'll have a better sense of where your program
might be having problems.
Of course, mypy
will check only where there are annotations. If you
fail to annotate something, mypy
won't be able to check it.
For example, I didn't annotate the function's return value. I can fix that, indicating that it returns a string, with:
def hello(name:str) -> str:
return f"Hello, {name}"
Notice that Python introduced a new syntax (the ->
arrow), and
allowed me to stick an annotation before the end-of-line colon, in
order for annotations to work. The annotation dictionary has now
expanded too:
>>> hello.__annotations__
{'name': <class 'str'>, 'return': <class 'str'>}
And in case you're wondering what Python will do if you have a local
variable named return
that conflicts with the return value's
annotation...well, "return" is a reserved word and cannot be used as
a parameter name.
Let's go back to the mysum
function. What will (and won't)
mypy
be able to check? For example, assume the following file:
def mysum(numbers:list) -> int:
output = 0
for one_number in numbers:
output += one_number
return output
print(mysum([10, 20, 30, 40, 50]))
print(mysum((10, 20, 30, 40, 50)))
print(mysum([10, 20, 'abc', 'def', 50]))
print(mysum('abcd'))
As you can see, I've annotated the numbers
parameter to take only
lists and to indicate that the function will always return integers.
And sure enough, mypy
catches the problems:
mysum.py:10: error:
Argument 1 to "mysum" has incompatible type
"Tuple[int, int, int, int, int]"; expected
↪"List[Any]"
mysum.py:12: error:
Argument 1 to "mysum" has incompatible type
"str"; expected "List[Any]"
The good news is that I've identified some problems. But in one case,
I'm calling mysum
with a tuple of numbers, which should be fine, but
is flagged as a problem. And in another case, I'm calling it with a
list of both integers and strings, but that's seen as just fine.
I'm going to need to tell mypy
that I'm willing to accept not just a
list, but any sequence, such as a tuple. Fortunately, Python now has
a typing
module that provides you with objects designed for use in
such circumstances. For example, I can say:
from typing import Sequence
def mysum(numbers:Sequence) -> int:
output = 0
for one_number in numbers:
output += one_number
return output
I've grabbed Sequence
from the typing
module, which includes all
three Python sequence types—strings, lists and tuples. Once I do
that, all of the mypy
problems disappear, because all of the
arguments are sequences.
That went a bit overboard, admittedly. What I really want to say is that I'll accept any sequence whose elements are integers. I can state that by changing my function's annotations to be:
from typing import Sequence
def mysum(numbers:Sequence[int]) -> int:
output = 0
for one_number in numbers:
output += one_number
return output
Notice that I've modified the annotation to be Sequence[int]
. In
the wake of that change, mypy
has now found lots of problems:
mysum.py:13: error: List item 2 has incompatible type "str";
↪expected "int"
mysum.py:13: error: List item 3 has incompatible type "str";
↪expected "int"
mysum.py:14: error: Argument 1 to "mysum" has incompatible type
↪"str"; expected "Sequence[int]"
I'd call this a big success. If someone now tries to use my function with the wrong type of value, it'll call them out on it.
But wait: do I really only want to allow for lists and tuples? What about sets, which also are iterable and can contain integers? And besides, what's this obsession with integers—shouldn't I also allow for floats?
I can solve the first problem by saying that I'll take not a
Sequence[int]
, but Iterable[int]
—meaning, anything that is
iterable and returns integers. In other words, I can say:
from typing import Iterable
def mysum(numbers:Iterable[int]) -> int:
output = 0
for one_number in numbers:
output += one_number
return output
Finally, how can I allow for either integers or strings? I use the
special Union
type, which lets you combine types together in square
brackets:
from typing import Iterable, Union
def mysum(numbers:Iterable[Union[int, float]]) ->
↪Union[int,float]:
output = 0
for one_number in numbers:
output += one_number
return output
But if I run mypy
against this code, and try to call
mysum
with
an iterable containing at least one float, I'll get an error:
mysum.py:9: error: Incompatible types in assignment
↪(expression has type "float", variable has type "int")
What's the problem? Simply put, when I create output
as a
variable, I'm giving it an integer value. And then, when I try to
add a floating-point value to it, I get a warning from mypy
.
So, I can silence that by annotating the variable:
def mysum(numbers:Iterable[Union[int, float]])
↪-> Union[int,float]:
output : Union[int,float] = 0
for one_number in numbers:
output += one_number
return output
Sure enough, the function is now pretty well annotated. I'm too
experienced to know that this will catch and solve all problems, but
if others on my team, who want to use my function, use mypy
to
check the types, they'll get warnings. And that's the whole point
here, to catch problems before they're even close to production.
You can read more about mypy
here. That site
has documentation, tutorials and even information for people using
Python 2 who want to introduce mypy
via comments (rather than
annotations).