At the Forge: Introducing PyInstaller

Want to distribute Python programs to your Python-less clients? PyInstaller is the answer. By Reuven M. Lerner

If you're used to working with a compiled language, the notion that you would need to have a programming language around, not just for development but also for running an application, seems a bit weird. Just because a program was written in C doesn't mean you need a C compiler in order to run it, right?

But of course, interpreted and byte-compiled languages do require the original language, or a version of it, in order to run. True, Java programs are compiled, but they're compiled into bytecodes then executed by the JVM. Similarly, .NET programs cannot run unless the CLR is present.

Even so, many of the students in my Python courses are surprised to discover that if you want to run a Python program, you need to have the Python language installed. If you're running Linux, this isn't a problem. Python has come with every distribution I've used since 1995. Sometimes the Python version isn't as modern as I'd like, but the notion of "this computer can't run Python programs" isn't something I've had to deal with very often.

However, not everyone runs Linux, and not everyone's computer has Python on it. What can you do about that? More specifically, what can you do when your clients don't have Python and aren't interested in installing it? Or what if you just want to write and distribute an application in Python, without bothering your users with additional installation requirements?

In this article, I discuss PyInstaller, a cross-platform tool that lets you take a Python program and distribute it to your users, such that they can treat it as a standalone app. I also discuss what it doesn't do, because many people who think about using PyInstaller don't fully understand what it does and doesn't do.

Running Python Code

Like Java and .NET, Python programs are compiled into bytecodes, high-level commands that don't correspond to the instructions of any actual computer, but that reference something known as a "virtual machine". There are a number of substantial differences between Java and Python though. Python doesn't have an explicit compilation phase; its bytecodes are pretty high level and connected to the Python language itself, and the compiler doesn't do that much in terms of optimization. The correspondence between Python source code and the resulting bytecodes is basically one-to-one; you won't find the bytecode compiler doing fancy things like inlining code or optimizing loops.

However, there's no doubt that Python runs bytecode, rather than your source code. You can see this in a number of different ways, the easiest of which is to create a Python module and then import that module. The module is translated into Python bytecodes and then saved to a file with a .pyc suffix. (In Python 3, this is under a directory called __pycache__, with separate byte-compiled versions for different Python versions and architectures.)

What does this all have to do with PyInstaller? Well, if you want to distribute a Python program, it's not enough to provide the byte-compiled output. You also need to provide a copy of Python, and that turns out to be a pain under certain circumstances, as I mentioned previously.

PyInstaller takes your Python code and byte-compiles it. But, then it also creates an executable application that basically loads Python and runs your program. In other words, each application you distribute with PyInstaller has a complete copy of Python within it, including the libraries needed to run your program. Normally, Python includes the entire standard library, but PyInstaller is smart enough to include only those modules it really needs, thus keeping the distribution size within reason.

Note that the copy of Python you have when using PyInstaller is used to create the distributable package. This means if you are running Python 3.4 on Linux, it's that copy of Python 3.4 for Linux that'll be included in your package. In other words, PyInstaller works across platforms, in that you can run it on Linux, Windows, macOS and other systems, but the resulting package is specifically for one architecture. It also means you need to be a bit careful when using PyInstaller on a computer that has multiple Python versions installed.

Installing PyInstaller

PyInstaller is most easily installed on a computer running Python with the standard pip command:


pip install -U --user pyinstaller

The -U flag indicates that you would like to upgrade PyInstaller, in case you already have installed it and the version on PyPI is newer. The --user flag indicates that you don't want to install it in the system's directories, but rather under your own home directory. Recently, I've become a fan of installing things with --user, largely because it avoids the need to think about permissions. However, it does mean that you need to add the "bin" directory from the --user location to your PATH.

If you're on a computer that has more than one Python version installed, it sometimes can be hard to know just which version is connected to pip. (Although pip --version will tell you which version of Python it's using.) For this reason, I sometimes do things the long way, as follows:


python3.6 -m pip install -U --user pyinstaller

The -m flag is sort of like the import statement in Python; running things in this way ensures that you're using the version you want.

Now that you've installed PyInstaller, let's use it to create a distributable Python application. I've created a new program called (very creatively) myapp.py. Here's the source code:


#!/usr/bin/env python3.6

import sys

print("Hello, and welcome to my app!")

print(f"We're running Python {sys.version}")

for i in range(10):
    print(f"{i} ** 2 = {i**2}, {i} ** 3 = {i**3}")

As you can see, this program imports the sys module, which provides access to the Python environment, as well as its variables and settings. I do this so that I can grab sys.version and ensure that the correct version is really running.

Next, I execute a "for" loop, for no reason other than that it gives me some output that I can see on the screen when the program runs.

In both cases, I use one of my favorite features from Python 3.6, f-strings, which allows me to interpolate expressions inside curly braces. This is, in my mind, far better than the previous ways this was done in Python, using the "%" operator on strings or (more recently) the str.format method.

So, let's assume you want to run this program on a colleague's machine. (Remember that your colleague needs to run the same operating system as you do, because the output from PyInstaller is going to be a binary based on the Python version you've installed.) You can type:


pyinstaller myapp.py

And, you'll get a lot of output. I'm not going to review all of it, but here are some highlights:


468 INFO: PyInstaller: 3.3.1
468 INFO: Python: 3.6.3
470 INFO: Platform:
 ↪Linux-4.4.0-119-generic-x86_64-with-Ubuntu-16.04-xenial
475 INFO: wrote /root/myapp1/myapp.spec

The file "myapp.spec" describes the application you're creating with PyInstaller. You'll find that this file is created automatically when you run PyInstaller. Normally, PyInstaller is smart enough to figure out what files must be included in the resulting distribution, but in some cases, such as data files and shared libraries, you might have to edit the specfile and add them yourself:


491 INFO: Extending PYTHONPATH with paths
['/root/myapp1', '/root/myapp1']

When you say import xyz in Python, the language looks (for starters) in the current directory for "xyz.py". If it doesn't find that (or a bytecoded variation), it looks through the elements of sys.path, one by one, looking for "xyz.py". If you want to tell Python to look in some additional directories, you can set the PYTHONPATH environment variable. Here, PyInstaller is saying that it's modifying PYTHONPATH so that the program can find modules and packages defined in the current directory:


491 INFO: checking Analysis

PyInstaller analyzes your code in order to figure out which modules and packages you want to use. Used modules are included in the final distribution, while unused ones are ignored.

The "dist" Directory

There's a lot more to the output, but after running PyInstaller, you'll find that there's a "dist" directory, and that in that directory is another subdirectory with the name of your new application.

This directory contains your new Python application. Now, you can't just run it like that; it's still a bit more complex than your average executable. The idea is that you'll turn the directory into a zipfile, distribute the software to wherever you need it, unzip it on the destination machine, and then run the top-level program.

But what if you use a module that has not only a Python component, but also a compiled C component? PyInstaller handles that automatically. For example, say you're going to use NumPy in your program, how does PyInstaller handle the C portion, which is compiled?

In this case, PyInstaller noticed that you were using a module with a C component. And if you look in the "dist" directory, you'll now see a bunch of additional shared libraries (*.so files).

PyInstaller can't promise to work with all complex packages, but the authors have tried hard to provide a large degree of compatibility. For example, if you use the Cython package (for implementing Python modules in C or providing type hints), you'll find that PyInstaller handles it fine, including the appropriate files in the "dist" directory.

Conclusion

For years, many of my students and consulting clients have wanted to distribute Python code without needing to run the language itself. That's not possible, but PyInstaller does the next best thing, letting you distribute software in a fairly straightforward way.

About the Author

Reuven Lerner teaches Python, data science and Git to companies around the world. His free, weekly "better developers" email list reaches thousands of developers each week; subscribe here. Reuven lives with his wife and children in Modi'in, Israel.

Reuven Lerner