Skip to main content

Fetching External Dependencies

Before we can work with the code in this repository, we need the toolchains and third-party dependencies that it relies on.

By the end of this section, you should be able to run bazel fetch to download these for the language you pick (Java, JavaScript, Go, or Python).

If you're stuck, ask the instructor or teaching assistants for help!

Concepts

Definitions

A "Bazel module" is a Bazel project that can have multiple versions, each of which publishes metadata about other modules that it depends on.

"Starlark" is a dialect of Python used to configure Bazel, as well as some other tools.

A "Starlark module" is a different concept, representing a .bzl file we can load from.

Introduced in Bazel 6.0, "bzlmod" is the package manager for Bazel modules. Read more in the documentation: https://bazel.build/build/bzlmod#modules

Try it: bazel fetch

Create a MODULE.bazel file.

Let's find a dependency to add. Go to the Web UI for the Bazel Central Registry: https://registry.bazel.build

Search for "bazel-lib", which is a basic library with some simple building blocks.

bazel-lib on the registry

Use the button to copy the text from the "Install" codeblock and paste it in MODULE.bazel.

Now you can ask Bazel to fetch that package:

% bazel fetch @aspect_bazel_lib//lib:all

This tutorial lets you pick the languages you want to work with. Repeat this for other dependencies you'd like to use. You can search for "jvm", "go", "python", "ts", etc.

Language dependencies

Bzlmod delegates to other language-specific package managers, such as pip for Python, pnpm for JavaScript, Coursier for Java, and so on. One exception: C++ doesn't really have a popular "package manager", so the Bazel Central Registry is accidentally becoming one.

In most cases, under Bazel we'll still use the canonical files for declaring these dependencies, though under Bazel they should always be "pinned" for reproducible builds. We want to preserve interoperability with existing tools as much as possible, such as editors and static analysis tools, and these understand the idiomatic files for the language.

note
  • "pinned" means direct and transitive dependency versions are always exactly specified
  • can include integrity hashes for supply chain security

The steps to do this vary a bit between languages, but they all have the following rough outline:

  1. You may leave the developer's constraints alone
    • Use semver ranges as appropriate. For example our frontend/package.json allows any version of http-server and our requirements.txt allows any version of requests.
  2. Pin transitive dependencies to a constant version
    • These are generally written to a separate "lock" file.
  3. Mirror that dependency list into Starlark
    • This allows Bazel to manage the dependencies itself.
  4. Add code to expose external repositories for use by BUILD targets
    • The instructions for each language should tell you how to do this.
caution

In practice, you'll find that not all rules do a good job of documenting bzlmod usage yet. You can get a hint by finding the tests for a ruleset.

On https://registry.bazel.build, click the "View registry source" link for a module, and open the presubmit.yml file. You'll find a path to some subfolder where a test lives. These are executable examples, so they give us a clue how the module is used.

For example,

bcr_test_module:
module_path: 'e2e/bzlmod'

Then you'd navigate to the /e2e/bzlmod folder in the ruleset repo, and there will be something that is guaranteed to work.

Try it: pin the transitive dependencies

Bazel's reproducibility can only be as good as the information it's given. Each external package manager has a feature to pin the dependencies.

Your goal is to produce the following files, for the languages you care about:

  • go.mod -> go.sum
  • package.json -> pnpm-lock.yaml
  • requirements.txt -> requirements_lock.txt
  • Java sources -> maven_install.json

You'll have to read the documentation for the ruleset you use to figure out an approach to do this.

Try it: Mirror dependency list into Starlark

danger

As of January 2023 these rulesets don't do a good job documenting usage with Bzlmod. Welcome to the bleeding edge.

Go

See https://github.com/bazelbuild/bazel-gazelle#update-repos.

However, instead of adding go_repository rules to WORKSPACE, we need to add go_deps.module calls to MODULE.bazel.

Java

See @maven//:pin

Python

See pip_parse

JavaScript

See npm_translate_lock

note

rules_js can read the pnpm-lock.yaml file directly and do this at runtime. However if you want, you can have Bazel update the pnpm-lock.yaml file for you, then you'll check in a dependency file.

Caching fetches: the repository cache

Fetching external dependencies can be slow. In a big monorepo, you'll download many large files for hermetic toolchains.

Bazel caches these in the $(bazel info repository_cache) folder.

  • Caches the downloaded files.
  • Always give the integrity hash, that's the key
  • There's no cache for external repositories created by repository rules!! Frequent de-optimization

Configuring the downloader

Bazel's downloader is full-featured, and you can use it to block undesired network access, fetch via your corporate proxy or artifact repository, and more.

Eager fetches

Developers shouldn't need to fetch things they don't use. For example, a developer in one language shouldn't be blocked waiting to download toolchains for some other language.

load is eager

The load statement in Starlark happens eagerly during the Loading phase, and causes things to be eagerly fetched.

This de-optimization is easily introduced, and typically is only diagnosed when developers complain about "slow initial builds".

In WORKSPACE/MODULE.bazel

These happen for every single build regardless of the dependency graph or which targets the user requests. Bazel must evaluate the complete WORKSPACE and MODULE.bazel files to understand what third-party dependencies exist for the build. Let's say the WORKSPACE file contains this content:

WORKSPACE.bazel
load("@rules_python//python:pip.bzl", "pip_parse")

pip_parse(
name = "my_deps",
requirements_lock = "//path/to:requirements_lock.txt",
)

load("@my_deps//:requirements.bzl", "install_deps")
install_deps()

Because the highlighted line has a load statement, the my_deps repository is requested at loading time, and so the pip_parse implementation will run. If it uses a hermetic python interpreter, then that interpreter must be built or fetched for any build.

In BUILD.bazel

In this example, a BUILD file loads from @npm:

BUILD.bazel
load("@npm//@bazel/typescript:index.bzl", "ts_project")

package(default_visibility = ["//visibility:public"])

ts_project(
name = "a",
srcs = glob(["*.ts"]),
declaration = True,
tsconfig = "//:tsconfig.json",
deps = [
"@npm//@types/node",
"@npm//tslib",
],
)

filegroup(name = "b")

Even if a developer only asks Bazel to build the filegroup b, the load statement means that the @npm repository must be fetched.

Try it: fetch 3p packages

Let's add one of our language-specific package files and fetch again. Bazel's dependency graph should always determine which dependencies are fetched lazily as needed. For example, a third-party package that isn't used anywhere in the repository should never be fetched.

note

Bazel also has a sync command, but this is rarely useful and not covered here.

Go

% bazel fetch @org_golang_google_grpc//... @org_golang_google_protobuf//...

Python

% bazel fetch @pip//:requests_pkg

Java

% bazel fetch @maven//...

JavaScript

The release notes for a rules_js release say how to fetch the npm packages.

This will require adding a BUILD.bazel file in the project root.

% touch BUILD.bazel
% bazel fetch @npm//:all