summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMatthew Sotoudeh <matthewsot@outlook.com>2020-08-19 19:39:48 -0700
committerMatthew Sotoudeh <matthewsot@outlook.com>2020-08-19 19:39:48 -0700
commit5edcf3b97c4c77b654af177bfa27558d9b88b52f (patch)
tree8c2cb868709b422894b16cb83135eefb2a390a13
parent98afe57511c41b28be6b128bf78e2bc5e780f450 (diff)
Add Bazel_Python source
-rw-r--r--BUILD10
-rw-r--r--README.md143
-rw-r--r--WORKSPACE1
-rw-r--r--bazel_python.bzl120
-rwxr-xr-xpywrapper.sh4
-rwxr-xr-xsetup_python.sh43
6 files changed, 320 insertions, 1 deletions
diff --git a/BUILD b/BUILD
new file mode 100644
index 0000000..dbb524c
--- /dev/null
+++ b/BUILD
@@ -0,0 +1,10 @@
+exports_files([
+ "._dummy_.py",
+ "pywrapper.sh",
+])
+
+sh_library(
+ name = "pywrapper",
+ srcs = ["pywrapper.sh"],
+ visibility = ["//:__subpackages__"],
+)
diff --git a/README.md b/README.md
index df8168f..daad245 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,143 @@
# bazel_python
-Support for reproducibly running Python scripts using Bazel.
+A simple way to use Python reproducibly within Bazel.
+
+## One-Time Setup
+First, install the packages necessary to build Python with commonly-used
+modules. On Ubuntu to get `pip`, `zlib`, and `bz2` modules, this looks like:
+```bash
+sudo apt install build-essential zlib1g-dev libssl-dev libbz2-dev
+```
+
+**NOTE:** if you do not have OpenSSL/`libssl-dev` installed, `pip` package
+installation will **not** work and you **will** get unexplained errors about
+missing Python dependencies.
+
+Use the `setup_python.sh` script to install a global copy of Python. DARG uses
+Python 3.7.4, so you can execute:
+```bash
+./setup_python.sh 3.7.4 $HOME/.bazel_python
+```
+You may append `--enable-optimizations` to enable Python build-time
+optimizations, however be warned that this can add significantly to the install
+time. You may run this script multiple times to install different versions of
+Python, however you should always use the same install target directory (e.g.,
+`$HOME/.bazel_python` above). Each version will be placed in its own
+subdirectory of that target.
+
+## Per-Project Usage
+1. Add a `requirements.txt` with the pip requirements you need.
+2. In your `WORKSPACE` add:
+```python
+load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")
+
+git_repository(
+ name = "bazel_python",
+ commit = "{COMMIT_GOES_HERE}",
+ remote = "https://github.com/95616ARG/bazel_python.git",
+)
+
+load("@bazel_python//:bazel_python.bzl", "bazel_python")
+
+bazel_python()
+```
+3. In your root `BUILD` file add:
+```python
+load("@bazel_python//:bazel_python.bzl", "bazel_python_interpreter")
+
+bazel_python_interpreter(
+ python_version = "3.7.4",
+ requirements_file = "requirements.txt",
+)
+```
+
+## Known Issues
+### Missing Modules
+If you get errors about missing modules (e.g., `pytest not found`), please
+triple-check that you have installed OpenSSL libraries. On Ubuntu this looks
+like `apt install libssl-dev`.
+
+### Breaking The Sandbox
+Even if you don't use these `bazel_python` rules, you may notice that
+`py_binary` rules can include Python libraries that are not explicitly depended
+on. This is due to the fact that Bazel creates its sandbox using symbolic
+links, and Python will _follow symlinks_ when looking for a package.
+
+### Bazel-Provided Python Packages
+Many Bazel packages come "helpfully" pre-packaged with relevant Python code,
+which Bazel will then add to the `PYTHONPATH`. For example, when you depend on
+a Python GRPC-Protobuf rule, it will automatically add a copy of the GRPC
+Python library to your `PYTHONPATH`. This is normally fine, except that GRPC
+Python library is likely outdated and for the wrong Python version. The way to
+fix this is to depend on `grpc` in your `requirements.txt`, then remove the
+offending parts of `sys.path` before importing `grpc` like so:
+```python
+import sys
+sys.path = [path for path in sys.path if "/com_github_grpc_grpc/" not in path]
+import grpc
+```
+Note this might cause problems if the path to the current repository contains
+`/com_github_grpc_grpc/`. We are on the lookout for a better solution
+long-term.
+
+### Non-Hermetic Builds
+Although this process ensures everyone is using the same _version_ of Python,
+it does not make assurances about the _configuration_ of each of those Python
+instances. For example, someone who ran the `setup_python.sh` script with
+`--enable-optimizations` might see different performance numbers. You can
+check the output of `setup_python.sh` to see which optional modules were not
+installed.
+
+### Duplicates in `~/.bazelrc`
+After building Python, `setup_python.sh` will append to your `~/.bazelrc` file
+a pointer to the path to the python parent directory provided. If you
+call `setup_python.sh` multiple times (e.g. to install multiple versions or
+re-install a single version), then multiple copies of that will be added to
+`~/.bazelrc`. These duplicates can be removed safely.
+
+### `:` Characters in Path
+Python's venv hard-codes a number of paths in a way that Bazel violates by
+moving everything around all the time. We resolve this by replacing those
+hard-coded paths with a relative one that should work at run time in the Bazel
+sandbox. However, this find-and-replace is currently done with `sed` using a
+`:` character as the delimiter. This means that *If the path to Bazel's
+internal sandbox directory has a `:` character, our find and replace will
+fail.* If you notice errors that are otherwise unexplained, it may be worth
+double-checking that you don't have paths with question marks in them.
+
+### Installs Twice
+For some reason, Bazel seems to enjoy running the pip-installation script
+twice, an extra time with the note "for host." I'm not entirely sure why this
+is, but it doesn't seem to cause any problems other than slowing down the first
+build.
+
+### Custom Name
+Need to support custom directory naming in pywrapper.
+
+## Tips
+### Using Python in a Genrule
+To use the interpreter in a genrule, depend on it in the tools and make sure to
+source the venv before calling `python3`:
+```python
+genrule(
+ cmd = """
+ PYTHON_VENV=$(location //:bazel_python_venv)
+ pushd $$PYTHON_VENV/..
+ source bazel_python_venv_installed/bin/activate
+ popd
+
+ python3 ...
+ """,
+ tools = ["//:bazel_python_venv"],
+)
+```
+
+Note that the `activate` script currently assumes you are calling it from right
+above `bazel_python_venv_installed`, hence you must change to that directory
+first.
+
+## Tested Operating Systems
+We have tested these rules on the following operating systems:
+* Ubuntu 20.04
+* Ubuntu 18.04
+* Ubuntu 16.04
+* macOS Catalina
diff --git a/WORKSPACE b/WORKSPACE
new file mode 100644
index 0000000..167651d
--- /dev/null
+++ b/WORKSPACE
@@ -0,0 +1 @@
+workspace(name = "bazel_python")
diff --git a/bazel_python.bzl b/bazel_python.bzl
new file mode 100644
index 0000000..53e67b0
--- /dev/null
+++ b/bazel_python.bzl
@@ -0,0 +1,120 @@
+load("@bazel_tools//tools/python:toolchain.bzl", "py_runtime_pair")
+
+def bazel_python(venv_name = "bazel_python_venv"):
+ """Workspace rule setting up bazel_python for a repository.
+
+ Arguments
+ =========
+ @venv_name should match the 'name' argument given to the
+ bazel_python_interpreter call in the BUILD file.
+ """
+ native.register_toolchains("//:" + venv_name + "_toolchain")
+
+def bazel_python_interpreter(
+ python_version,
+ name = "bazel_python_venv",
+ requirements_file = None,
+ **kwargs):
+ """BUILD rule setting up a bazel_python interpreter (venv).
+
+ Arguments
+ =========
+ @python_version should be the Python version string to use (e.g. 3.7.4 is
+ the standard for DARG projects). You must run the setup_python.sh
+ script with this version number.
+ @name is your preferred Bazel name for referencing this. The default should
+ work unless you run into a name conflict.
+ @requirements_file should be the name of a file in the repository to use as
+ the pip requirements.
+ @kwargs are passed to bazel_python_venv.
+ """
+ bazel_python_venv(
+ name = name,
+ python_version = python_version,
+ requirements_file = requirements_file,
+ **kwargs
+ )
+
+ # https://stackoverflow.com/questions/47036855
+ native.py_runtime(
+ name = name + "_runtime",
+ files = ["//:" + name],
+ interpreter = "@bazel_python//:pywrapper.sh",
+ python_version = "PY3",
+ )
+
+ # https://github.com/bazelbuild/rules_python/blob/master/proposals/2019-02-12-design-for-a-python-toolchain.md
+ native.constraint_value(
+ name = name + "_constraint",
+ constraint_setting = "@bazel_tools//tools/python:py3_interpreter_path",
+ )
+
+ native.platform(
+ name = name + "_platform",
+ constraint_values = [
+ ":python3_constraint",
+ ],
+ )
+
+ py_runtime_pair(
+ name = name + "_runtime_pair",
+ py3_runtime = name + "_runtime",
+ )
+
+ native.toolchain(
+ name = name + "_toolchain",
+ target_compatible_with = [],
+ toolchain = "//:" + name + "_runtime_pair",
+ toolchain_type = "@bazel_tools//tools/python:toolchain_type",
+ )
+
+def _bazel_python_venv_impl(ctx):
+ """A Bazel rule to set up a Python virtual environment.
+
+ Also installs requirements specified by @ctx.attr.requirements_file.
+ """
+ if "BAZEL_PYTHON_DIR" not in ctx.var:
+ fail("You must run setup_python.sh for " + ctx.attr.python_version)
+ python_parent_dir = ctx.var.get("BAZEL_PYTHON_DIR")
+ python_version = ctx.attr.python_version
+ python_dir = python_parent_dir + "/" + python_version
+
+ # TODO: Fail if python_dir does not exist.
+ venv_dir = ctx.actions.declare_directory("bazel_python_venv_installed")
+ inputs = []
+ command = """
+ export PATH={py_dir}/bin:$PATH
+ export PATH={py_dir}/include:$PATH
+ export PATH={py_dir}/lib:$PATH
+ export PATH={py_dir}/share:$PATH
+ export PYTHON_PATH={py_dir}:{py_dir}/bin:{py_dir}/include:{py_dir}/lib:{py_dir}/share
+ python3 -m venv {out_dir}
+ source {out_dir}/bin/activate
+ """
+ if ctx.attr.requirements_file:
+ command += "pip3 install -r " + ctx.file.requirements_file.path
+ inputs.append(ctx.file.requirements_file)
+ command += ctx.attr.run_after_pip
+ command += """
+ REPLACEME=$PWD/'{out_dir}'
+ REPLACEWITH='$PWD/bazel_python_venv_installed'
+ # This prevents sed from trying to modify the directory. We may want to
+ # do a more targeted sed in the future.
+ rm -rf {out_dir}/bin/__pycache__
+ sed -i'' -e s:$REPLACEME:$REPLACEWITH:g {out_dir}/bin/*
+ """
+ ctx.actions.run_shell(
+ command = command.format(py_dir = python_dir, out_dir = venv_dir.path),
+ inputs = inputs,
+ outputs = [venv_dir],
+ )
+ return [DefaultInfo(files = depset([venv_dir]))]
+
+bazel_python_venv = rule(
+ implementation = _bazel_python_venv_impl,
+ attrs = {
+ "python_version": attr.string(),
+ "requirements_file": attr.label(allow_single_file = True),
+ "run_after_pip": attr.string(),
+ },
+)
diff --git a/pywrapper.sh b/pywrapper.sh
new file mode 100755
index 0000000..a1ea558
--- /dev/null
+++ b/pywrapper.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+
+source bazel_python_venv_installed/bin/activate
+python $@
diff --git a/setup_python.sh b/setup_python.sh
new file mode 100755
index 0000000..80f2d37
--- /dev/null
+++ b/setup_python.sh
@@ -0,0 +1,43 @@
+#!/bin/bash
+
+if [ $# -ne 1 ] && [ $# -ne 2 ]; then
+ echo "This is $(basename $0). Usage:"
+ echo "$(basename $0) [version] [/path/to/install/parent/directory] [python configure flags]"
+ echo "Example:"
+ echo "$(basename $0) 3.7.4 $HOME/.bazel_python --enable-optimizations"
+ exit 1
+fi
+
+version=$1
+shift
+install_parent_dir=$1
+shift
+install_dir=$install_parent_dir/$version
+
+read -p "Installing Python $1. This will *OVERWRITE* $install_dir. Continue? [y/N] " -r
+if [[ $REPLY =~ ^[Yy]$ ]]
+then
+ rm -rf $install_dir
+ mkdir -p $install_dir
+ cd $install_dir
+
+ curl -OL https://www.python.org/ftp/python/$version/Python-$version.tgz
+ tar -xzf Python-$version.tgz
+ cd Python-$version
+
+ ./configure --prefix=$install_dir $@
+
+ make -j
+ make install
+ cd $install_dir
+ rm -rf Python-$version
+ rm -rf Python-$version.tgz
+
+ echo "Success!"
+ echo "Writing Installation Directory to $HOME/.bazelrc"
+ echo "If you have run this script multiple times, you may safely remove duplicate lines from $HOME/.bazelrc"
+ echo "build --define BAZEL_PYTHON_DIR=$install_parent_dir" >> $HOME/.bazelrc
+ echo "run --define BAZEL_PYTHON_DIR=$install_parent_dir" >> $HOME/.bazelrc
+else
+ echo "Aborting."
+fi
generated by cgit on debian on lair
contact matthew@masot.net with questions or feedback