Remote Cache & Execution
Remote resources
Every workflows deployment includes a remote cache compliant with the Bazel Remote Execution Protocol v2. This means that the CI runners instantiated by Workflows are able to take advantage of previous work done, leading to significantly faster warm builds. All the traffic between the CI runners and the remote cache takes place solely within the VPC of the cloud provider, meaning there is no data leakage or egress, and only the builds the CI runners provision can read from/write to the cache.
Additionally, remote execution is available with minimal additional configuration needed. This allows for the creation of specially tailored runners for individual jobs, configured throughout the build tree. This also means that jobs can be parallelized effectively, where pending actions can unblock many parallel runners at once without repeated work. As a result, cost can be effectively managed by only provisioning large workers for large actions, which can then be spun down more readily while the smaller actions continue on smaller, less powerful workers. Remote execution supports both arm64 and amd64 architectures simultaneously, meaning that a remote execution cluster can be provisioned that supports cross-compilation/publishing or any other cross-platform use case, with minimal additional configuration.
Finally, Workflows supports the provisioning of external-facing resources for use in an office/team environment. With a secure OpenID Connect (OIDC)-based or HTTP Basic Auth-based authentication scheme, only authenticated users will be able to access the externally facing remote resource cluster. With an external cluster, a team can make better use of off-site resources to work faster and more effectively, reducing thrash caused by individual machine settings and resource allocation. Configuration of the external cluster is discussed in detail below.
Enable remote execution
By default, remote execution is not enabled in Workflows remote resource clusters. This is because remote execution requires a special focus on sandboxing and reproducibility beyond what may be required on an ephemeral runner or a user's machine. That said, remote execution can be enabled as simply as adding the following to the Workflows remote cluster configuration.
From within your Aspect Workflows module definition, add the following code:
remote = {
remote_execution = {
default = {
image = "<some Docker image>"
}
}
}
This configuration spins up a new set of runners that pick up work using the Docker image specified. All the scaling and provisioning is otherwise handled automatically, and the endpoint is the same as the one for the remote cache. To use this in Bazel, a new platform needs to be added that looks something like the following.
In a BUILD file, e.g. <WORKSPACE root>/platforms/BUILD.bazel
platform(
name = "my_remote_platform",
exec_properties = {
"OSFamily": "linux",
"container-image": "docker://<some Docker image>",
},
)
Then, the .bazelrc
file needs to be updated to point to the new platform for a given configuration, e.g. --config rbe
:
build:rbe --extra_execution_platforms=@<WORKSPACE name>//<path to platform>:my_remote_platform
# local fallback allows genrule to be executed locally if requested explicitly
build:rbe --genrule_strategy=remote,local
build:rbe --host_platform=@<WORKSPACE name>//<path to platform>:my_remote_platform
build:rbe --jobs=32
build:rbe --remote_timeout=3600
Note: the above is simply an example. Please adjust as needed to a given use case.
Once completed, jobs should be able to run on the remote executors seamlessly, provided the configuration points to the right place. Additional build failures may be encountered initially that did not occur before; as stated previously, remote execution requires greater attention to detail in the structure of the build tree.
Further configuration of the remote executors, including but not limited to direct provisioning of the underlying compute, can be found in the Workflows configuration documentation.
Enable an external remote resource cluster
For customers that have team workflows that require an externalized remote cluster, one can be bootstrapped with minimal additional configuration. Please note that this does not externalize the remote cluster used by the CI runners. This creates a separate cluster explicitly for external use cases, with the necessary authentication to permit use by the Bazel command-line tool on individual machines.
Before enabling the external remote cache, a Route53 public hosted zone is required for the domain that fronts the external cache. If using another DNS provider such as Cloudflare to provision DNS, this must point your external DNS to Route53 after creation. For Cloudflare, follow this guide. Provisioning instructions for Route53 and the external cache follow in the next section.
Also, pass vpc_subnets_public
to the Aspect
Workflows module, in order for the remote cluster to be exposed to the public Internet.
Set up Route53 hosted zone
To provision a new Route53 public hosted zone, add the following Terraform to your existing code and apply. You do not need to create any additional records inside this zone. The Aspect Workflows module adds all required A records to make the external remote cache functional and discoverable.
module "remote-cache-dns" {
source = "terraform-aws-modules/route53/aws//modules/zones"
version = "2.10.2"
zones = {
"<DNS name from your provider, e.g. remote-cache.aspect.build>" = {
domain_name = "<DNS name>"
comment = "<DNS name>"
tags = {
Name = "<DNS name>"
}
}
}
}
Enable the external remote cluster
From within your Aspect Workflows module definition, add the following code:
module "aspect_workflows" {
...
external_remote = {
public_hosted_zone_id = module.remote-cache-dns.route53_zone_zone_id["<DNS name>"]
}
...
}
As with the CI runner remote cluster, you can customize the external remote cluster to suit your team's needs. When you apply the workflows module, a new, Internet-facing load balancer is spun up with either an HTTP Basic Auth or an OpenID Connect (OIDC) scheme over HTTPS/TLS. Instructions for invocation and use follow in the next section.
Enabling OIDC for the external remote cluster
If organizational rules require it, the external remote cluster can be configured to use OIDC as the authentication scheme. This is considered more secure than HTTP Basic Auth, which has a single shared key that is rotated every month. To enable this functionality, a customer must provide all the OIDC configuration options. Some guides for setting up OIDC with popular IdPs are included below.
oidc = {
issuer = "https://<endpoint>" // example
auth_endpoint = "https://<endpoint>/auth" // example
token_endpoint = "https://<endpoint>/token" // example
user_info_endpoint = "https://<endpoint>/userInfo" // example
client_id = "<id>"
client_secret = "<sensitive-secret>" // this should be stored in a sensitive Terraform value
session_timeout_seconds = 604800 // 7 days in seconds, the default
}
Identity Provider Guides:
There are some important caveats to consider when using OIDC. First, all authentication concerns are outsourced to the specific IdP. This means that if the IdP sets controls over timeouts of credentials, the remote cluster has no control over those settings, and cannot override them. The remote cluster also has no concept of "log out", and so will continue to allow access until credentials expire. Finally, because of how OIDC works, the cluster will cache the access token for a user on sign in, and then use the refresh token to refresh it until the session token expires (as configured above). This means that if a user's access to the IdP is revoked while the access token is still active, their session will still be valid, but their refresh event will fail as soon as the access token expires. This window is typically small, but is wholly at the discretion of the IdP.
Connecting bazel
to the external remote cluster
Depending on the authentication scheme used, there are different processes for connecting bazel
to the
external remote cluster. Once configured, the external remote cluster should be just as performant as the CI runner
cluster, and can be tuned to meet any team workload requirements.
Common settings
Regardless of underlying authentication scheme, the following settings need to be added to a user's .bazelrc
file to
enable connectivity to the remote resource cluster.
# if using the remote cache
build --remote_cache="grpcs://aw-cache-ext.<DNS name>:8980" --remote_accept_cached --remote_upload_local_results
# if using the remote executor
build --remote_executor="grpcs://aw-cache-ext.<DNS name>:8980"
The above lines can be collapsed into one at the user's discretion. The last two parameters for the remote cache are
by default True
, and can be omitted if no configuration exists that overrides those defaults.
OIDC
In order for bazel
to get up-to-the-second valid credentials for a given IdP and OIDC configuration, a special utility
called a credential helper is used.
For Workflows, Aspect has developed a purpose-built credential helper designed to work with Workflows-instantiated
remote clusters. They will not work with any other remote cluster by any other provider. First, download the
correct credential helper for a given platform.
Once downloaded, unzip the file, which should provide the credential-helper
binary. This binary should be moved to
somewhere accessible from the user's $PATH
(meaning it can be invoked directly from a terminal). It can optionally be
renamed, e.g. to aspect-credential-helper
if there are other helpers on a user's machine. This binary can be reused
for all Aspect external remote clusters, independent of underlying OIDC providers per cluster.
Once downloaded and placed in the correct path, the
following lines must be added to the .bazelrc
file to point the remote cluster to the credential helper:
build --credential_helper="aw-cache-ext.<DNS name>"=aspect_credential_helper
Once the configuration is complete, a user must log in to their IdP by running the following command on the command line:
credential-helper login aw-cache-ext.<DNS name>
This will save the user's credentials in a local keychain for retrieval on each Bazel build. When the underlying session token expires, the user will have to complete the same command. So long as they are continually signed in to their IdP in the background, they will not need to sign in more frequently than that, as the refresh token will retrieve up-to-date credentials in the backend.
HTTP Basic Auth
The workflows module stores the HTTP Basic Auth username and password combination in AWS Systems Manager Parameter Store.
The SecureString
parameter name is aw_external_cache_auth_header
.
A quick link to the parameter is: https://<AWS_REGION>.console.aws.amazon.com/systems-manager/parameters/aw_external_cache_auth_header/description?tab=Table
.
Once the parameter value is retrieved, it can be added to the bazel
command as follows:
--remote_header="Authorization=Basic <INSERT AUTH KEY FROM SSM HERE>"
Optional: accessing the CI runner cache from a non-Aspect runner instance
It may be desirable to access the CI runner remote cache from a runner that is not managed by Aspect, and thus outside the VPC in which the remote cache is contained. For instance, there may be pre-existing hardware that needs to share the cache with the CI runners.
To accomplish this, peer your VPC to the Aspect VPC. Then, allow ingress from the VPC to the CI runner remote cache by adding a new rule to its security group. An example of this follows:
data "aws_vpc" "vpc" {
id = "<Aspect VPC ID>"
}
data "aws_security_group" "aw_cache_alb_sg" {
vpc_id = data.aws_vpc.vpc.id
filter {
name = "tag:Name"
values = ["aw-cache-service"]
}
}
resource "aws_security_group_rule" "legacy_vpc_ingress" {
type = "ingress"
description = "gRPC from legacy VPC"
security_group_id = data.aws_security_group.aw_cache_alb_sg.id
cidr_blocks = ["<Your VPC CIDR blocks>"]
protocol = "TCP"
from_port = 8980
to_port = 8980
}
Once the security group has been updated, download the self-signed certificate tied to the load balancer for the CI runner cache and attach it to Bazel calls. First, retrieve the cache endpoint and its certificate, either from AWS SSM, or from two outputs from the Aspect Workflows module:
external_remote_cache_endpoint
: the CI runner remote cache endpointinternal_remote_cache_certificate
: the certificate tied to the CI runner remote cache load balancer
When using BuildBarn's bb-clientd, add to the configuration file as follows:
grpc: {
tls: {
server_certificate_authorities: "<internal_remote_cache_certificate contents>"
}
}
Note that this block will need to be added wherever gRPC configuration is present in the configuration file.
When using the Bazel command-line tool directly, add the following flags:
--remote_cache=grpcs://<external_remote_cache_endpoint>:8980 --tls_certificate=<file containing internal_remote_cache_certificate>