Do you know my favorite fact about programming? In the end, everything is build from code and you can understand it all – there is absolutely no magic. With enough effort, almost everything you interact with can be dug into, demystified, and explained. I know I often interact with various tools I use as if they were black boxes, either for lack of time, lack of interest, or a fear that I wouldn’t understand them if I tried. But let’s fight back against that.
So for this post, let’s understand what’s going on with python’s virtualenv package.
Let’s start out simple – the purpose and use of virtualenv. Stealing directly from the project’s homepage, “virtualenv is a tool to create isolated Python environments.” Well great…what good is that?
A little backstory
I’ll explain by virtue of a story about my travails with scipy. If you don’t know, installing scipy/numpy on OS X has historically been…challenging. Numerous system-level dependencies, old versions of numpy pre-installed on the machine both complicate what is already a non-trivial installation procedure. This leads to a ton of posts like this or this and even extensive step-by-step guides. Although scipy suffers from some complications surrounding required non-python bits (like fortran compilers), the most frequent problem I’ve had installing is simply having conflicting versions of numpy installed.
That brings us to Virtualenv, and its use case. Macs come helpfully pre-installed with an old and unhelpful version of numpy.
1 2 3
Well that won’t work well with my hope to use the (very cool) data analysis library
pandas. In fact, the pandas installation page kindly points out that it requires numpy 1.6.1 or higher! How can we install pandas without changing the system installed version of numpy? Enter virtualenv.
Our very first virtualenv
I promised I’d start out with a quick example, so let’s show how virtualenv solves our little scipy snafu in a pinch.
Virtualenvs are their own little world – by default they are entirely isolated from your system installed python packages. First lets install virtualenv (the last thing we’ll need to install globally!) and set up a sample env.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Excellent! We installed virtualenv on our system using pip, created a virtualenv called
my_first_env, and finally activated it. This means that python is now entirely isolated from system packages. Let’s prove it to ourselves by trying to import some packages we know are installed on this machine.
1 2 3 4 5 6 7 8
And we can also test the same commands outside our virtualenv to confirm they work!
1 2 3 4
Interesting. And how about installing that fresh new package we had our eyes on?
1 2 3 4 5 6 7 8
And there we go! We can now happily play around with all of the installed packages inside of our virtual environment and we did it without affecting any other users of this computer, or requiring global install privileges.
Well that was magical
If you’re anything like me, your natural first reaction to a new tool like this is to feel a little uncomfortable. You can go through the motions (perhaps copying from some tutorial you found on a blog) and hope things will still work, but there’s no real understanding of how this new tools works. Maybe you even resign yourself to never understanding something and just keep using it the way you were taught, effectively becoming a cargo cult programmer.
Avoiding that behavior is exactly the point of this blogpost, and I think it’s perhaps the most powerful skill for any programmer. Digging into an unfamiliar project and building a mental model of how it works is the essence of programming! If you truly understand how something is put together, you can modify it, improve it, or explain it with ease.
So for the rest of this post, let’s dig in and prove virtualenv isn’t magic. We will both be taking this journey together, as I’ve never dug into its guts either. So let’s see what we can figure out.
Going to the source
So lets crack open
virtualenv and understand what’s actually happening.
There are three main scripts we keep calling to create, activate, and deactivate a virtualenv –
deactivate. Let’s deal with creating the environment with
virtualenv before jumping into activation/deactivation.
Creating a virtualenv
Opening up the file at
which virtualenv drops me into the source:
1 2 3 4 5 6 7 8 9
Well that’s not very helpful. Looks like running
virtualenv is actually just executing this bit of code. Time to see if we can figure out what
load_entry_point is and what it is actually calling.
A little googling later, we find this SO post on the subject. Looks like in nice python packages, entry points are defined in
setup.py and automatically linked to runnable scripts at installation time. We can download the source from pypi and take a look ourselves.
Sure enough, when we open up
setup.py we see the entry_point dict we were promised:
1 2 3 4 5 6 7 8 9 10 11
Indeed they seem to be linking the runnable
virtualenv python script to the
main function of virtualenv.py. Let’s see what that looks like. The content of the main function appears to roughly follow these steps:
- Build an option parser with
- Check if the script was called with the
--pythoninterpreter option and possibly exit.
- Take actions based on various command line options
- Call the
Well now that last one sounds quite relevant! Here’s the full text of the create_environment method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Now we’re getting somewhere! It looks like the basic steps are:
- Get a bunch of path locations based on the
- Install python inside our environment and return a path to the executable
- Install some subset of
- Install the
activatescripts into this new virtualenv
And that’s the essence of what running
virtualenv does: it defines paths for the interpreter, libraries and binaries; installs the interpreter and installation-related python packages; and it installs the
activate script so you can activate it. And we now understand what goes into creating a new virtualenv.
Activating and Deactivating
So that leaves the question of what
deactivate are up to. We can inspect the activate script easily enough by running
The first thing we notice is that a bash function
deactivate is defined immediately. We’ll get back to this later in this section, but this is actually the definition of the
deactivate method we call to leave the virtualenv. The relevant lines are so brief, you might miss them entirely:
1 2 3
Note that we’re saving the old
PATH and making a new one, with our local virtualenv prepended! This means that the next time we run
python, we’ll get the interpreter we installed into our virtualenv, which is pointed at all our own libraries instead of the default system-installed interpreter. With that in mind, let’s look at the
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
The important part here is the resetting of old environment variables (notably
You can notice other details in this file, like the setting/unsetting of your shell prompt to include the name of the currently active virtualenv.
And that’s it – you’ve uncovered the basics of how virtualenv works!
Just the beginning
Just like that, we’ve taken a nontrivial tool and pulled it apart into understandable pieces. I certainly didn’t understand every part of what we found immediately (and that’s perfectly okay and expected!), but through some persistent searching and effort, it all makes sense. And every time I go through this process with a new tool, I find myself understanding more and more of what is going on, and gaining greater familiarity with various python tools.
On the subject of virtualenv in particular, I’ve since discovered this excellent overview of the its guts from PyCon 2011. Take a look if you’re interested in even more detail on the subject (like why does using a particular python interpreter change where I look up system packages).
So go find something you don’t understand! I’ve been elbow deep in learning the various horrors of python packaging lately, so perhaps I’ll continue this series with a look into some aspect of that. Either way, I hope I’ve encouraged you to not be afraid of jumping into unfamiliar territory and transforming code from mysterious to understood.