Skip to main content
Version: 5.10.x

Remote Cache & Execution

Remote resources

Every workflows deployment includes a remote cache compliant with the Bazel Remote Execution Protocol v2. This means that the CI runners instantiated by Workflows are able to take advantage of previous work done, leading to significantly faster warm builds. All the traffic between the CI runners and the remote cache takes place solely within the VPC of the cloud provider, meaning there is no data leakage or egress, and only the builds the CI runners provision can read from/write to the cache.

Additionally, remote execution is available with minimal additional configuration needed. This allows for the creation of specially tailored runners for individual jobs, configured throughout the build tree. This also means that jobs can be parallelized effectively, where pending actions can unblock many parallel runners at once without repeated work. As a result, cost can be effectively managed by only provisioning large workers for large actions, which can then be spun down more readily while the smaller actions continue on smaller, less powerful workers. Remote execution supports both arm64 and amd64 architectures simultaneously, meaning that a remote execution cluster can be provisioned that supports cross-compilation/publishing or any other cross-platform use case, with minimal additional configuration.

Finally, Workflows supports the provisioning of external-facing resources for use in an office/team environment. With a secure OpenID Connect (OIDC)-based or HTTP Basic Auth-based authentication scheme, only authenticated users will be able to access the externally facing remote resource cluster. With an external cluster, a team can make better use of off-site resources to work faster and more effectively, reducing thrash caused by individual machine settings and resource allocation. Configuration of the external cluster is discussed in detail below.

Enable remote execution

By default, remote execution is not enabled in Workflows remote resource clusters. This is because remote execution requires a special focus on sandboxing and reproducibility beyond what may be required on an ephemeral runner or a user's machine. That said, remote execution can be enabled as simply as adding the following to the Workflows remote cluster configuration.

From within your Aspect Workflows module definition, add the following code:

remote = {
remote_execution = {
default = {
image = "<some Docker image>"
}
}
}

This configuration spins up a new set of runners that pick up work using the Docker image specified. All the scaling and provisioning is otherwise handled automatically, and the endpoint is the same as the one for the remote cache. To use this in Bazel, a new platform needs to be added that looks something like the following.

In a BUILD file, e.g. <WORKSPACE root>/platforms/BUILD.bazel

platform(
name = "my_remote_platform",
exec_properties = {
"OSFamily": "linux",
"container-image": "docker://<some Docker image>",
},
)

Then, the .bazelrc file needs to be updated to point to the new platform for a given configuration, e.g. --config rbe:

build:rbe --extra_execution_platforms=@<WORKSPACE name>//<path to platform>:my_remote_platform
# local fallback allows genrule to be executed locally if requested explicitly
build:rbe --genrule_strategy=remote,local
build:rbe --host_platform=@<WORKSPACE name>//<path to platform>:my_remote_platform
build:rbe --jobs=32
build:rbe --remote_timeout=3600

Note: the above is simply an example. Please adjust as needed to a given use case.

Once completed, jobs should be able to run on the remote executors seamlessly, provided the configuration points to the right place. Additional build failures may be encountered initially that did not occur before; as stated previously, remote execution requires greater attention to detail in the structure of the build tree.

Further configuration of the remote executors, including but not limited to direct provisioning of the underlying compute, can be found in the Workflows configuration documentation.

Enable an external remote resource cluster

For customers that have team workflows that require an externalized remote cluster, one can be bootstrapped with minimal additional configuration. Please note that this does not externalize the remote cluster used by the CI runners. This creates a separate cluster explicitly for external use cases, with the necessary authentication to permit use by the Bazel command-line tool on individual machines.

Before enabling the external remote cache, a Route53 public hosted zone is required for the domain that fronts the external cache. If using another DNS provider such as Cloudflare to provision DNS, this must point your external DNS to Route53 after creation. For Cloudflare, follow this guide. Provisioning instructions for Route53 and the external cache follow in the next section.

Also, pass vpc_subnets_public to the Aspect Workflows module, in order for the remote cluster to be exposed to the public Internet.

Set up Route53 hosted zone

To provision a new Route53 public hosted zone, add the following Terraform to your existing code and apply. You do not need to create any additional records inside this zone. The Aspect Workflows module adds all required A records to make the external remote cache functional and discoverable.

module "remote-cache-dns" {
source = "terraform-aws-modules/route53/aws//modules/zones"
version = "2.10.2"

zones = {
"<DNS name from your provider, e.g. remote-cache.aspect.build>" = {
domain_name = "<DNS name>"
comment = "<DNS name>"
tags = {
Name = "<DNS name>"
}
}
}
}

Enable the external remote cluster

From within your Aspect Workflows module definition, add the following code:

module "aspect_workflows" {
...
external_remote = {
public_hosted_zone_id = module.remote-cache-dns.route53_zone_zone_id["<DNS name>"]
}
...
}

As with the CI runner remote cluster, you can customize the external remote cluster to suit your team's needs. When you apply the workflows module, a new, Internet-facing load balancer is spun up with either an HTTP Basic Auth or an OpenID Connect (OIDC) scheme over HTTPS/TLS. Instructions for invocation and use follow in the next section.

Enabling OIDC for the external remote cluster

If organizational rules require it, the external remote cluster can be configured to use OIDC as the authentication scheme. This is considered more secure than HTTP Basic Auth, which has a single shared key that is rotated every month. To enable this functionality, a customer must provide all the OIDC configuration options. Some guides for setting up OIDC with popular IdPs are included below.

oidc = {
issuer = "https://<endpoint>" // example
auth_endpoint = "https://<endpoint>/auth" // example
token_endpoint = "https://<endpoint>/token" // example
user_info_endpoint = "https://<endpoint>/userInfo" // example
client_id = "<id>"
client_secret = "<sensitive-secret>" // this should be stored in a sensitive Terraform value
session_timeout_seconds = 604800 // 7 days in seconds, the default
}

Identity Provider Guides:

There are some important caveats to consider when using OIDC. First, all authentication concerns are outsourced to the specific IdP. This means that if the IdP sets controls over timeouts of credentials, the remote cluster has no control over those settings, and cannot override them. The remote cluster also has no concept of "log out", and so will continue to allow access until credentials expire. Finally, because of how OIDC works, the cluster will cache the access token for a user on sign in, and then use the refresh token to refresh it until the session token expires (as configured above). This means that if a user's access to the IdP is revoked while the access token is still active, their session will still be valid, but their refresh event will fail as soon as the access token expires. This window is typically small, but is wholly at the discretion of the IdP.

Connecting bazel to the external remote cluster

Depending on the authentication scheme used, there are different processes for connecting bazel to the external remote cluster. Once configured, the external remote cluster should be just as performant as the CI runner cluster, and can be tuned to meet any team workload requirements.

Common settings

Regardless of underlying authentication scheme, the following settings need to be added to a user's .bazelrc file to enable connectivity to the remote resource cluster.

# if using the remote cache
build --remote_cache="grpcs://aw-cache-ext.<DNS name>:8980" --remote_accept_cached --remote_upload_local_results
# if using the remote executor
build --remote_executor="grpcs://aw-cache-ext.<DNS name>:8980"

The above lines can be collapsed into one at the user's discretion. The last two parameters for the remote cache are by default True, and can be omitted if no configuration exists that overrides those defaults.

OIDC

In order for bazel to get up-to-the-second valid credentials for a given IdP and OIDC configuration, a special utility called a credential helper is used. For Workflows, Aspect has developed a purpose-built credential helper designed to work with Workflows-instantiated remote clusters. They will not work with any other remote cluster by any other provider. First, download the correct credential helper for a given platform.

Once downloaded, unzip the file, which should provide the credential-helper binary. This binary should be moved to somewhere accessible from the user's $PATH (meaning it can be invoked directly from a terminal). It can optionally be renamed, e.g. to aspect-credential-helper if there are other helpers on a user's machine. This binary can be reused for all Aspect external remote clusters, independent of underlying OIDC providers per cluster.

Once downloaded and placed in the correct path, the following lines must be added to the .bazelrc file to point the remote cluster to the credential helper:

build --credential_helper="aw-cache-ext.<DNS name>"=aspect_credential_helper

Once the configuration is complete, a user must log in to their IdP by running the following command on the command line:

credential-helper login aw-cache-ext.<DNS name>

This will save the user's credentials in a local keychain for retrieval on each Bazel build. When the underlying session token expires, the user will have to complete the same command. So long as they are continually signed in to their IdP in the background, they will not need to sign in more frequently than that, as the refresh token will retrieve up-to-date credentials in the backend.

HTTP Basic Auth

The workflows module stores the HTTP Basic Auth username and password combination in AWS Systems Manager Parameter Store.

The SecureString parameter name is aw_external_cache_auth_header. A quick link to the parameter is: https://<AWS_REGION>.console.aws.amazon.com/systems-manager/parameters/aw_external_cache_auth_header/description?tab=Table.

Once the parameter value is retrieved, it can be added to the bazel command as follows:

--remote_header="Authorization=Basic <INSERT AUTH KEY FROM SSM HERE>"

Optional: accessing the CI runner cache from a non-Aspect runner instance

It may be desirable to access the CI runner remote cache from a runner that is not managed by Aspect, and thus outside the VPC in which the remote cache is contained. For instance, there may be pre-existing hardware that needs to share the cache with the CI runners.

To accomplish this, peer your VPC to the Aspect VPC. Then, allow ingress from the VPC to the CI runner remote cache by adding a new rule to its security group. An example of this follows:

data "aws_vpc" "vpc" {
id = "<Aspect VPC ID>"
}

data "aws_security_group" "aw_cache_alb_sg" {
vpc_id = data.aws_vpc.vpc.id

filter {
name = "tag:Name"
values = ["aw-cache-service"]
}
}

resource "aws_security_group_rule" "legacy_vpc_ingress" {
type = "ingress"
description = "gRPC from legacy VPC"
security_group_id = data.aws_security_group.aw_cache_alb_sg.id
cidr_blocks = ["<Your VPC CIDR blocks>"]
protocol = "TCP"
from_port = 8980
to_port = 8980
}

Once the security group has been updated, download the self-signed certificate tied to the load balancer for the CI runner cache and attach it to Bazel calls. First, retrieve the cache endpoint and its certificate, either from AWS SSM, or from two outputs from the Aspect Workflows module:

  • external_remote_cache_endpoint: the CI runner remote cache endpoint
  • internal_remote_cache_certificate: the certificate tied to the CI runner remote cache load balancer

When using BuildBarn's bb-clientd, add to the configuration file as follows:

grpc: {
tls: {
server_certificate_authorities: "<internal_remote_cache_certificate contents>"
}
}

Note that this block will need to be added wherever gRPC configuration is present in the configuration file.

When using the Bazel command-line tool directly, add the following flags:

--remote_cache=grpcs://<external_remote_cache_endpoint>:8980 --tls_certificate=<file containing internal_remote_cache_certificate>