diff options
author | Matthew Sotoudeh <matthewsot@outlook.com> | 2020-08-19 19:39:48 -0700 |
---|---|---|
committer | Matthew Sotoudeh <matthewsot@outlook.com> | 2020-08-19 19:39:48 -0700 |
commit | 5edcf3b97c4c77b654af177bfa27558d9b88b52f (patch) | |
tree | 8c2cb868709b422894b16cb83135eefb2a390a13 | |
parent | 98afe57511c41b28be6b128bf78e2bc5e780f450 (diff) |
Add Bazel_Python source
-rw-r--r-- | BUILD | 10 | ||||
-rw-r--r-- | README.md | 143 | ||||
-rw-r--r-- | WORKSPACE | 1 | ||||
-rw-r--r-- | bazel_python.bzl | 120 | ||||
-rwxr-xr-x | pywrapper.sh | 4 | ||||
-rwxr-xr-x | setup_python.sh | 43 |
6 files changed, 320 insertions, 1 deletions
@@ -0,0 +1,10 @@ +exports_files([ + "._dummy_.py", + "pywrapper.sh", +]) + +sh_library( + name = "pywrapper", + srcs = ["pywrapper.sh"], + visibility = ["//:__subpackages__"], +) @@ -1,2 +1,143 @@ # bazel_python -Support for reproducibly running Python scripts using Bazel. +A simple way to use Python reproducibly within Bazel. + +## One-Time Setup +First, install the packages necessary to build Python with commonly-used +modules. On Ubuntu to get `pip`, `zlib`, and `bz2` modules, this looks like: +```bash +sudo apt install build-essential zlib1g-dev libssl-dev libbz2-dev +``` + +**NOTE:** if you do not have OpenSSL/`libssl-dev` installed, `pip` package +installation will **not** work and you **will** get unexplained errors about +missing Python dependencies. + +Use the `setup_python.sh` script to install a global copy of Python. DARG uses +Python 3.7.4, so you can execute: +```bash +./setup_python.sh 3.7.4 $HOME/.bazel_python +``` +You may append `--enable-optimizations` to enable Python build-time +optimizations, however be warned that this can add significantly to the install +time. You may run this script multiple times to install different versions of +Python, however you should always use the same install target directory (e.g., +`$HOME/.bazel_python` above). Each version will be placed in its own +subdirectory of that target. + +## Per-Project Usage +1. Add a `requirements.txt` with the pip requirements you need. +2. In your `WORKSPACE` add: +```python +load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository") + +git_repository( + name = "bazel_python", + commit = "{COMMIT_GOES_HERE}", + remote = "https://github.com/95616ARG/bazel_python.git", +) + +load("@bazel_python//:bazel_python.bzl", "bazel_python") + +bazel_python() +``` +3. In your root `BUILD` file add: +```python +load("@bazel_python//:bazel_python.bzl", "bazel_python_interpreter") + +bazel_python_interpreter( + python_version = "3.7.4", + requirements_file = "requirements.txt", +) +``` + +## Known Issues +### Missing Modules +If you get errors about missing modules (e.g., `pytest not found`), please +triple-check that you have installed OpenSSL libraries. On Ubuntu this looks +like `apt install libssl-dev`. + +### Breaking The Sandbox +Even if you don't use these `bazel_python` rules, you may notice that +`py_binary` rules can include Python libraries that are not explicitly depended +on. This is due to the fact that Bazel creates its sandbox using symbolic +links, and Python will _follow symlinks_ when looking for a package. + +### Bazel-Provided Python Packages +Many Bazel packages come "helpfully" pre-packaged with relevant Python code, +which Bazel will then add to the `PYTHONPATH`. For example, when you depend on +a Python GRPC-Protobuf rule, it will automatically add a copy of the GRPC +Python library to your `PYTHONPATH`. This is normally fine, except that GRPC +Python library is likely outdated and for the wrong Python version. The way to +fix this is to depend on `grpc` in your `requirements.txt`, then remove the +offending parts of `sys.path` before importing `grpc` like so: +```python +import sys +sys.path = [path for path in sys.path if "/com_github_grpc_grpc/" not in path] +import grpc +``` +Note this might cause problems if the path to the current repository contains +`/com_github_grpc_grpc/`. We are on the lookout for a better solution +long-term. + +### Non-Hermetic Builds +Although this process ensures everyone is using the same _version_ of Python, +it does not make assurances about the _configuration_ of each of those Python +instances. For example, someone who ran the `setup_python.sh` script with +`--enable-optimizations` might see different performance numbers. You can +check the output of `setup_python.sh` to see which optional modules were not +installed. + +### Duplicates in `~/.bazelrc` +After building Python, `setup_python.sh` will append to your `~/.bazelrc` file +a pointer to the path to the python parent directory provided. If you +call `setup_python.sh` multiple times (e.g. to install multiple versions or +re-install a single version), then multiple copies of that will be added to +`~/.bazelrc`. These duplicates can be removed safely. + +### `:` Characters in Path +Python's venv hard-codes a number of paths in a way that Bazel violates by +moving everything around all the time. We resolve this by replacing those +hard-coded paths with a relative one that should work at run time in the Bazel +sandbox. However, this find-and-replace is currently done with `sed` using a +`:` character as the delimiter. This means that *If the path to Bazel's +internal sandbox directory has a `:` character, our find and replace will +fail.* If you notice errors that are otherwise unexplained, it may be worth +double-checking that you don't have paths with question marks in them. + +### Installs Twice +For some reason, Bazel seems to enjoy running the pip-installation script +twice, an extra time with the note "for host." I'm not entirely sure why this +is, but it doesn't seem to cause any problems other than slowing down the first +build. + +### Custom Name +Need to support custom directory naming in pywrapper. + +## Tips +### Using Python in a Genrule +To use the interpreter in a genrule, depend on it in the tools and make sure to +source the venv before calling `python3`: +```python +genrule( + cmd = """ + PYTHON_VENV=$(location //:bazel_python_venv) + pushd $$PYTHON_VENV/.. + source bazel_python_venv_installed/bin/activate + popd + + python3 ... + """, + tools = ["//:bazel_python_venv"], +) +``` + +Note that the `activate` script currently assumes you are calling it from right +above `bazel_python_venv_installed`, hence you must change to that directory +first. + +## Tested Operating Systems +We have tested these rules on the following operating systems: +* Ubuntu 20.04 +* Ubuntu 18.04 +* Ubuntu 16.04 +* macOS Catalina diff --git a/WORKSPACE b/WORKSPACE new file mode 100644 index 0000000..167651d --- /dev/null +++ b/WORKSPACE @@ -0,0 +1 @@ +workspace(name = "bazel_python") diff --git a/bazel_python.bzl b/bazel_python.bzl new file mode 100644 index 0000000..53e67b0 --- /dev/null +++ b/bazel_python.bzl @@ -0,0 +1,120 @@ +load("@bazel_tools//tools/python:toolchain.bzl", "py_runtime_pair") + +def bazel_python(venv_name = "bazel_python_venv"): + """Workspace rule setting up bazel_python for a repository. + + Arguments + ========= + @venv_name should match the 'name' argument given to the + bazel_python_interpreter call in the BUILD file. + """ + native.register_toolchains("//:" + venv_name + "_toolchain") + +def bazel_python_interpreter( + python_version, + name = "bazel_python_venv", + requirements_file = None, + **kwargs): + """BUILD rule setting up a bazel_python interpreter (venv). + + Arguments + ========= + @python_version should be the Python version string to use (e.g. 3.7.4 is + the standard for DARG projects). You must run the setup_python.sh + script with this version number. + @name is your preferred Bazel name for referencing this. The default should + work unless you run into a name conflict. + @requirements_file should be the name of a file in the repository to use as + the pip requirements. + @kwargs are passed to bazel_python_venv. + """ + bazel_python_venv( + name = name, + python_version = python_version, + requirements_file = requirements_file, + **kwargs + ) + + # https://stackoverflow.com/questions/47036855 + native.py_runtime( + name = name + "_runtime", + files = ["//:" + name], + interpreter = "@bazel_python//:pywrapper.sh", + python_version = "PY3", + ) + + # https://github.com/bazelbuild/rules_python/blob/master/proposals/2019-02-12-design-for-a-python-toolchain.md + native.constraint_value( + name = name + "_constraint", + constraint_setting = "@bazel_tools//tools/python:py3_interpreter_path", + ) + + native.platform( + name = name + "_platform", + constraint_values = [ + ":python3_constraint", + ], + ) + + py_runtime_pair( + name = name + "_runtime_pair", + py3_runtime = name + "_runtime", + ) + + native.toolchain( + name = name + "_toolchain", + target_compatible_with = [], + toolchain = "//:" + name + "_runtime_pair", + toolchain_type = "@bazel_tools//tools/python:toolchain_type", + ) + +def _bazel_python_venv_impl(ctx): + """A Bazel rule to set up a Python virtual environment. + + Also installs requirements specified by @ctx.attr.requirements_file. + """ + if "BAZEL_PYTHON_DIR" not in ctx.var: + fail("You must run setup_python.sh for " + ctx.attr.python_version) + python_parent_dir = ctx.var.get("BAZEL_PYTHON_DIR") + python_version = ctx.attr.python_version + python_dir = python_parent_dir + "/" + python_version + + # TODO: Fail if python_dir does not exist. + venv_dir = ctx.actions.declare_directory("bazel_python_venv_installed") + inputs = [] + command = """ + export PATH={py_dir}/bin:$PATH + export PATH={py_dir}/include:$PATH + export PATH={py_dir}/lib:$PATH + export PATH={py_dir}/share:$PATH + export PYTHON_PATH={py_dir}:{py_dir}/bin:{py_dir}/include:{py_dir}/lib:{py_dir}/share + python3 -m venv {out_dir} + source {out_dir}/bin/activate + """ + if ctx.attr.requirements_file: + command += "pip3 install -r " + ctx.file.requirements_file.path + inputs.append(ctx.file.requirements_file) + command += ctx.attr.run_after_pip + command += """ + REPLACEME=$PWD/'{out_dir}' + REPLACEWITH='$PWD/bazel_python_venv_installed' + # This prevents sed from trying to modify the directory. We may want to + # do a more targeted sed in the future. + rm -rf {out_dir}/bin/__pycache__ + sed -i'' -e s:$REPLACEME:$REPLACEWITH:g {out_dir}/bin/* + """ + ctx.actions.run_shell( + command = command.format(py_dir = python_dir, out_dir = venv_dir.path), + inputs = inputs, + outputs = [venv_dir], + ) + return [DefaultInfo(files = depset([venv_dir]))] + +bazel_python_venv = rule( + implementation = _bazel_python_venv_impl, + attrs = { + "python_version": attr.string(), + "requirements_file": attr.label(allow_single_file = True), + "run_after_pip": attr.string(), + }, +) diff --git a/pywrapper.sh b/pywrapper.sh new file mode 100755 index 0000000..a1ea558 --- /dev/null +++ b/pywrapper.sh @@ -0,0 +1,4 @@ +#!/bin/bash + +source bazel_python_venv_installed/bin/activate +python $@ diff --git a/setup_python.sh b/setup_python.sh new file mode 100755 index 0000000..80f2d37 --- /dev/null +++ b/setup_python.sh @@ -0,0 +1,43 @@ +#!/bin/bash + +if [ $# -ne 1 ] && [ $# -ne 2 ]; then + echo "This is $(basename $0). Usage:" + echo "$(basename $0) [version] [/path/to/install/parent/directory] [python configure flags]" + echo "Example:" + echo "$(basename $0) 3.7.4 $HOME/.bazel_python --enable-optimizations" + exit 1 +fi + +version=$1 +shift +install_parent_dir=$1 +shift +install_dir=$install_parent_dir/$version + +read -p "Installing Python $1. This will *OVERWRITE* $install_dir. Continue? [y/N] " -r +if [[ $REPLY =~ ^[Yy]$ ]] +then + rm -rf $install_dir + mkdir -p $install_dir + cd $install_dir + + curl -OL https://www.python.org/ftp/python/$version/Python-$version.tgz + tar -xzf Python-$version.tgz + cd Python-$version + + ./configure --prefix=$install_dir $@ + + make -j + make install + cd $install_dir + rm -rf Python-$version + rm -rf Python-$version.tgz + + echo "Success!" + echo "Writing Installation Directory to $HOME/.bazelrc" + echo "If you have run this script multiple times, you may safely remove duplicate lines from $HOME/.bazelrc" + echo "build --define BAZEL_PYTHON_DIR=$install_parent_dir" >> $HOME/.bazelrc + echo "run --define BAZEL_PYTHON_DIR=$install_parent_dir" >> $HOME/.bazelrc +else + echo "Aborting." +fi |