Verifying PyPI and Conda Packages

Before we get started here, I must confess something, I know very little about package verification, cryptography or security. This is really a brain dump of everything I have learned from browsing the internet over the last few months.

The Question

The first thing to think about when considering packaging verification is not the technical implementation of anything, it's what are we trying to achieve by using crypto to do anything?

Is the package I am installing the one the package maintainer uploaded?

This I think is the question, I am the maintainer of the sunpy package, and if you are installing SunPy and you are security conscious you probably want to be able to check that the thing you are installing on your machine (potentially with root access) is indeed the code I uploaded. This presents an inherit trust in me the maintainer that I am not deliberately going to mess with your computer, but this is probably as good as you are going to get.

So taking this a little further, what does trusting me actually mean? When using PGP (we will get on to this later) people often talk about a web of trust, in that you trust Bob who trusts me so then you can trust me. This model has many flaws, mainly that not enough people use PGP to make it work. It also misses the point, what you are actually trusting is the git repo I am building the source tarball from. You are trusting that I trust the source I am releasing in the first place.

My GitHub Account is what you are Trusting.

This I think is the core point I am making here, what you are trusting is effectively my GitHub account. Someone with my GitHub account could make changes to the SunPy repo and do something malicious that I may not notice when I do a release. (You are also trust all the other people with merge rights, but one thing at a time.)

My GitHub Account is Linked to my PGP key

My Keybase Profile

So you can know that the person who has control over the PGP key with the fingerprint 60BC5C03E6276769 has access to my GitHub account, and therefore can change the source code you are installing. I also sign my commits with the same PGP key, so you know that they all came from the person with control of that key, and that that person has control of the Cadair account.

My Trust Model

I think this can be summarised as: "You trust me to write your software, so you trust me to run it on your computer". Which in terms of PGP equates to: "This key signed these commits and is linked to this GitHub account, I therefore trust a package made with it."

This is subtly but importantly different from the normal trust model of PGP where you are trying to verify that they key belongs to and is controlled by me the real person Stuart Mumford. In this model knowing that it's the same person who has control of the git repo and GitHub account is sufficient.

Using the Trust Model

This is the second stage. We have determined that we are happy to trust a key that we can associate with a GitHub account and some of the commits in a repo, so how do we use this to verify that the same key is the one that signed the package we are about to install?

PyPI (pip install)

Currently when I upload the SunPy release to PyPI I can sign it with my PGP key and upload the signature alongside the source. The problem is that you would have to run the following commands:

curl | gpg --import
pip download --no-deps sunpy
gpg --verify sunpy-0.7.2.tar.gz.asc sunpy-0.7.2.tar.gz
pip install sunpy-0.7.2.tar.gz

To verify that the package you had downloaded was indeed signed by the correct pgp key. Which is silly. There is a little work going on to make this easier e.g. pypa/pip#1035. Various efforts like this seem to be stalling over worries about the trust model and lulling users into a false sense of security.

Conda Package Builds

Recently I started using the excellent conda forge project to start building conda binaries for SunPy. This makes it much easier for people to install and use SunPy, but now you not only have to trust the source file, but you have to trust the binary built package as well.

What does trusting the binary look like? Do you trust that the build bot has been honest and built the package from source correctly? Should conda-forge have a PGP key with which the build bot signs the packages so you know it was indeed built by the build bot? How can conda forge verify the integrity of the source file, and how does it pass that trust onto the end user?

I don't know the answers to any of these questions, and the trust model gets much more tricky when you start considering a build bot being run on CI services where even more people could interfere with the process. My current opinion is that conda forge could do gpg verification on the source from PyPI and then sign the binaries, so that you know that conda forge trusted the original source and that you have downloaded a binary built on the conda-forge build bots. Is that enough?


TL;DR: Python package signing is hard, but I think possible from the pip perspective, once you quantify the minimum you want from a trust model. Conda binaries are a whole different problem, and I can't see an easy solution.

Please direct comments to @stuartmumford.


Contents © 2018 Stuart Mumford Creative Commons License - Powered by Nikola