workflows
Requirements
The following requirements are needed by this module:
Providers
The following providers are used by this module:
- aws (>= 4.67.0, < 6.0.0)
Modules
The following Modules are called:
alerting
Source: ./alerting
Version:
api
Source: ./api
Version:
bessie
Source: ./services/bessie
Version:
bk
Source: ./bk
Version:
cci
Source: ./cci
Version:
configuration
Source: ./utils/configuration_properties
Version:
core
Source: ./core
Version:
dashboards
Source: ./alerting/alarms/cloudwatch_dashboards
Version:
delivery
Source: ./delivery
Version:
external_remote
Source: ./remote
Version:
gh_api_token_secret
Source: ./utils/aws_secret
Version:
gha
Source: ./gha
Version:
gl
Source: ./gl
Version:
grafana
Source: ./telemetry/grafana
Version:
kilgore
Source: ./services/kilgore
Version:
logging
Source: ./logging
Version:
remote
Source: ./remote
Version:
telemetry
Source: ./telemetry
Version:
token_auth
Source: ./services/token_auth
Version:
warming
Source: ./warming
Version:
webapp
Source: ./webapp
Version:
Resources
The following resources are used by this module:
- aws_s3_object.runner_otel_config (resource)
- aws_caller_identity.default (data source)
- aws_ecr_authorization_token.token (data source)
- aws_partition.default (data source)
- aws_region.default (data source)
Required Inputs
The following input variables are required:
aspect_artifacts_bucket
Description: S3 bucket where Aspect delivers workflows assets
Type: string
customer_id
Description: Unique, human-readable customer identifier provided by Aspect
Type: string
hosts
Description: ####################################### CI host configuration options #
Type: list(string)
remote
Description: Configuration for the Bazel remote endpoint (cache and execution), specifically the ALB.
Type:
object({
debug_tools = optional(bool, false)
storage = object({
# Number of shards for the remote cache storage service
num_shards = optional(number, 3)
instance_type = optional(string)
instance_image_id = optional(string)
mirror = optional(bool)
ecs_agent_memory_mb = optional(number, 256)
})
frontend = optional(object({
cpu = optional(number, 1024)
memory = optional(number, 2048)
max_scaling = optional(number, 20)
min_scaling = optional(number)
}), {
cpu = 1024
memory = 2048
max_scaling = 20
})
downloader = optional(object({
frontend = optional(object({
cpu = optional(number, 1024)
memory = optional(number, 2048)
max_scaling = optional(number, 5)
min_scaling = optional(number, 1)
}), {
cpu = 1024
memory = 2048
max_scaling = 10
min_scaling = 1
})
}))
remote_execution = optional(object({
executors = map(object({
platform = optional(string)
image = string
additional_platform_properties = optional(map(string), {})
workers = optional(list(object({
scaling = optional(object({
minimum = optional(number)
maximum = optional(number)
warm = optional(number)
fast = optional(object({
scale_in = optional(object({
target = optional(number)
magnitude = optional(number)
minimum = optional(number)
cooldown = optional(number)
}))
scale_out = optional(object({
target = optional(number)
magnitude = optional(number)
cooldown = optional(number)
}))
}))
policy = optional(object({
target = optional(number, 100)
scale_in_cooldown = optional(number, 60)
scale_out_cooldown = optional(number, 60)
}), {
target = 100
scale_in_cooldown = 60
scale_out_cooldown = 60
})
rules = optional(map(object({
schedule = string
timezone = optional(string)
minimum = optional(number)
maximum = optional(number)
})))
}), {})
isolated_actions = optional(object({
cpu = optional(number, 1024)
memory = optional(number)
}))
network = optional(bool, true)
max_concurrency = optional(number)
max_download_concurrency = optional(number, 1000000)
max_upload_concurrency = optional(number, 1000000)
ec2 = optional(object({
instance_type = optional(string)
instance_image = optional(string)
docker_group = optional(string)
}))
ecs = optional(object({
architecture = optional(string)
cpu = optional(number, 1024)
memory = optional(number, 2048)
}))
runner = optional(object({
kill_processes = optional(bool)
set_tmp_dir_env_variable = optional(bool)
clean_tmp_directories = optional(list(string))
}))
docker_user = optional(string, null)
})), [{}])
}))
use_size_class_cache = optional(bool, false)
worker_cloudwatch_agent = optional(bool, true)
ecs_agent_memory_mb = optional(number, 256)
storage = optional(object({
num_shards = optional(number)
instance_type = optional(string)
instance_image_id = optional(string)
upstream_fallback = optional(bool, true)
mirror = optional(bool)
}))
}))
})
support
Description: Set of properties that allow Aspect to provide oncall support for Workflows
Type:
object({
# If true, alerts generated by Workflows will be reported back to Aspect.
# Depending on the severity of the alert, this may result in an oncall
# engineer being paged depending on the level of support included with this
# Workflows install.
alert_aspect = optional(bool, true)
# A set of secret IDs that can be overriden if required.
secrets = optional(object({
# Override the secret ID used for fetching the PagerDuty routing key from Aspects AWS account.
aspect_pagerduty_routing_key_id = optional(string)
# Override the secret ID used for fetching the Slack token from Aspects AWS account.
aspect_slack_token_secret_id = optional(string)
# Override the KMS ID used to decrypt the support secrets from Aspect.
aspect_support_secret_kms_id = optional(string)
}), {})
# Role ARN that allows support level access for Aspect.
support_role_name = optional(string, null)
# Role ARN that allows extended support access for Aspect.
# This role will have write access to various areas Workflows infrastructure,
# however it can only be assumed by a subset of Aspect oncall engineers.
operator_role_name = optional(string, null)
# Add policies that allow access to CI infrastructure instances via SSM
enable_ssm_access = optional(bool, false)
})
vpc_id
Description: ID of the VPC in which to deploy
Type: string
vpc_subnets
Description: List of subnet IDs to use for VM infrastructure
Type: list(string)
Optional Inputs
The following input variables are optional (have default values):
account_id
Description: Account ID of the AWS Account where CloudWatch alarms reside
Type: string
Default: null
aw_grafana_org_role
Description: The role to assign to users in the Aspect Grafana organization
Type: string
Default: "Viewer"
bk_runner_groups
Description: Mapping of Buildkite runner group name to settings for that runner group
Type:
map(object({
# Common settings for all CI hosts
agent_idle_timeout_min = number
max_runners = number
min_runners = optional(number, 0)
min_free_runners = optional(number, 0)
policy_documents = optional(map(object({ json : string })), {})
policies = optional(map(string), {})
queue = string
resource_type = string
scale_out_factor = optional(number, 1)
scaling_polling_frequency = optional(number, 1) # polls per minute
reaper_sleep_minutes = optional(number, 10)
security_groups = optional(map(string), {})
warming = optional(bool, false)
warming_set = optional(string, "default")
exclude_oncall_alerts = optional(list(string), [])
tags = optional(map(string), {})
build_logs_bucket = optional(string, "BUCKET_PLACEHOLDER")
}))
Default: {}
cci_runner_groups
Description: Mapping of CircleCI runner group name to settings for that runner group
Type:
map(object({
# Common settings for all CI hosts
agent_idle_timeout_min = number
max_runners = number
min_runners = optional(number, 0)
min_free_runners = optional(number, 0)
policy_documents = optional(map(object({ json : string })), {})
policies = optional(map(string), {})
resource_type = string
scale_out_factor = optional(number, 1)
scaling_polling_frequency = optional(number, 1)
reaper_sleep_minutes = optional(number, 10)
security_groups = optional(map(string), {})
warming = optional(bool, false)
warming_set = optional(string, "default")
exclude_oncall_alerts = optional(list(string), [])
tags = optional(map(string), {})
# Settings specific to CircleCI
circleci_api_url = optional(string, "https://circleci.com")
circleci_runner_api_url = optional(string, "https://runner.circleci.com")
job_max_run_time_min = optional(number, 360)
}))
Default: {}
cost_allocation_tag
Description: (deprecated) The tag name used for cost tagging
Type: string
Default: "CreatedBy"
cost_allocation_tag_value
Description: (deprecated) The value of the cost tag
Type: string
Default: null
create_security_groups
Description: Whether to create security groups automatically for all resources. Set to false to use the values set in security_group_ids.
Type: bool
Default: true
create_vpc_endpoints
Description: Whether to create VPC endpoints automatically.
Type: bool
Default: true
delivery_enabled
Description: If delivery infrastructure is enabled for Aspect Workflows
Type: bool
Default: true
deployment_id
Description: Unique, human-readable deployment identifier provided by Aspect
Type: string
Default: "default"
external_remote
Description: Configuration for the externalized Bazel remote endpoint (cache and execution), specifically the ALB.
Type:
object({
debug_tools = optional(bool, false)
dns = object({
hosted_zone_id = optional(string, null)
hosted_zone_name = optional(string, null)
})
storage = object({
# Number of shards for the remote cache storage service
num_shards = optional(number, 3)
instance_type = optional(string)
instance_image_id = optional(string)
mirror = optional(bool)
ecs_agent_memory_mb = optional(number, 256)
})
frontend = optional(object({
cpu = optional(number, 1024)
memory = optional(number, 2048)
max_scaling = optional(number, 20)
min_scaling = optional(number)
}), {
cpu = 1024
memory = 2048
max_scaling = 20
})
downloader = optional(object({
frontend = optional(object({
cpu = optional(number, 1024)
memory = optional(number, 2048)
max_scaling = optional(number, 5)
min_scaling = optional(number, 1)
}), {
cpu = 1024
memory = 2048
max_scaling = 10
min_scaling = 1
})
}))
oidc = optional(object({
issuer = string
auth_endpoint = string
token_endpoint = string
user_info_endpoint = string
client_id = string
client_secret = string
session_timeout_seconds = optional(number, null)
}), null)
remote_execution = optional(object({
executors = map(object({
platform = optional(string)
image = string
additional_platform_properties = optional(map(string), {})
workers = optional(list(object({
scaling = optional(object({
minimum = optional(number)
maximum = optional(number)
warm = optional(number)
fast = optional(object({
scale_in = optional(object({
target = optional(number)
magnitude = optional(number)
minimum = optional(number)
cooldown = optional(number)
}))
scale_out = optional(object({
target = optional(number)
magnitude = optional(number)
cooldown = optional(number)
}))
}))
policy = optional(object({
target = optional(number, 100)
scale_in_cooldown = optional(number, 60)
scale_out_cooldown = optional(number, 60)
}), {
target = 100
scale_in_cooldown = 60
scale_out_cooldown = 60
})
rules = optional(map(object({
schedule = string
timezone = optional(string)
minimum = optional(number)
maximum = optional(number)
})))
}), {})
isolated_actions = optional(object({
cpu = optional(number, 1024)
memory = optional(number)
}))
network = optional(bool, true)
max_concurrency = optional(number)
max_download_concurrency = optional(number, 1000000)
max_upload_concurrency = optional(number, 1000000)
ec2 = optional(object({
instance_type = optional(string)
instance_image = optional(string)
docker_group = optional(string)
}))
ecs = optional(object({
architecture = optional(string)
cpu = optional(number, 1024)
memory = optional(number, 2048)
}))
runner = optional(object({
kill_processes = optional(bool)
set_tmp_dir_env_variable = optional(bool)
clean_tmp_directories = optional(list(string))
}))
docker_user = optional(string, null)
})), [{}])
}))
use_size_class_cache = optional(bool, false)
worker_cloudwatch_agent = optional(bool, true)
ecs_agent_memory_mb = optional(number, 256)
storage = optional(object({
num_shards = optional(number)
instance_type = optional(string)
instance_image_id = optional(string)
upstream_fallback = optional(bool, true)
mirror = optional(bool)
}))
}))
})
Default: null
gha_runner_groups
Description: Mapping of GitHub Actions runner group name to settings for that runner group
Type:
map(object({
# Common settings for all CI hosts
agent_idle_timeout_min = number
max_runners = number
min_runners = optional(number, 0)
min_free_runners = optional(number, 0)
policy_documents = optional(map(object({ json : string })), {})
policies = optional(map(string), {})
queue = string
resource_type = string
scale_out_factor = optional(number, 1)
scaling_polling_frequency = optional(number, 1)
reaper_sleep_minutes = optional(number, 10)
security_groups = optional(map(string), {})
warming = optional(bool, false)
warming_set = optional(string, "default")
exclude_oncall_alerts = optional(list(string), [])
tags = optional(map(string), {})
# Settings specific to GitHub Actions
gh_repo = string
gha_workflow_ids = optional(list(string), [])
}))
Default: {}
gl_runner_groups
Description: Mapping of GitLab runner group name to settings for that runner group
Type:
map(object({
# Common settings for all CI hosts
agent_idle_timeout_min = number
max_runners = number
min_runners = optional(number, 0)
min_free_runners = optional(number, 0)
policy_documents = optional(map(object({ json : string })), {})
policies = optional(map(string), {})
queue = string
resource_type = string
scale_out_factor = optional(number, 1)
scaling_polling_frequency = optional(number, 1)
reaper_sleep_minutes = optional(number, 10)
security_groups = optional(map(string), {})
warming = optional(bool, false)
warming_set = optional(string, "default")
exclude_oncall_alerts = optional(list(string), [])
tags = optional(map(string), {})
# Settings specific to GitLab
gitlab_url = optional(string, "https://gitlab.com")
project_id = string
}))
Default: {}
internal_use_only
Description: For Aspect internal development use only
Type:
object({
# Whether to disable deletion protection on resources
disable_deletion_protection = optional(bool, false)
# Custom install bucket
install_bucket = optional(object({
id = string
arn = string
bucket = string
}), null)
# Use dev service images at the specified tag
dev_image_tag = optional(string, "unspecified")
})
Default: {}
partition
Description: The partition to configure services in, if not commercial
Type: string
Default: null
region
Description: The default region to setup services in
Type: string
Default: null
repository_urls
Description: The repository URLs for the Docker images used by this module. Meant to be used in concert with the ecr_images submodule.
Type: map(string)
Default:
{
"adot_exporter": "public.ecr.aws/aws-observability/aws-otel-collector",
"alert_manager": "quay.io/prometheus/alertmanager:v0.27.0",
"aws_cli": "public.ecr.aws/aws-cli/aws-cli",
"bash": "public.ecr.aws/docker/library/bash",
"bb_browser": "ghcr.io/buildbarn/bb-browser:20240613T055327Z-f0fbe96",
"bb_remote_asset": "ghcr.io/buildbarn/bb-remote-asset:20241014t200011z-5a41232",
"bb_replicator": "ghcr.io/buildbarn/bb-replicator:20241120T153311Z-7c63b92",
"bb_runner_installer": "ghcr.io/buildbarn/bb-runner-installer:20240716t044555z-9850e82",
"bb_scheduler": "ghcr.io/buildbarn/bb-scheduler:20240716t044555z-9850e82",
"bb_storage": "ghcr.io/buildbarn/bb-storage:20250226t091209z-85aafcb",
"bb_worker": "ghcr.io/buildbarn/bb-worker:20240716t044555z-9850e82",
"busybox": "public.ecr.aws/docker/library/busybox",
"curl_jq": "registry.gitlab.com/gitlab-ci-utils/curl-jq:3.0.0",
"ecs_exporter": "quay.io/prometheuscommunity/ecs-exporter:v0.4.0",
"grafana": "docker.io/grafana/grafana-enterprise:11.6.0-ubuntu",
"otel_collector_contrib": "ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.115.1",
"prometheus": "quay.io/prometheus/prometheus:v2.52.0"
}
resource_types
Description: Mapping of resource types name to settings for that type. Reference the name of a resource type in resource_type fields.
Type:
map(object({
# The ID of the AMI to use for this resource
image_id = string
# A list of instance types that are acceptable in the ASG
instance_types = list(string)
# The size of the root EBS volume in GB
root_volume_size_gb = optional(number, 64)
# Tags to apply to this resource
tags = optional(map(string), {})
# Defines if spot instances should be used for this resource
use_spot = optional(bool, false)
# When using spot instances, allows further customization over the spot vs on-demand allocation
instance_policy = optional(object({
on_demand_base_capacity = optional(number, 0)
on_demand_percentage_above_base_capacity = optional(number, 0)
spot_allocation_strategy = optional(string, "price-capacity-optimized")
spot_max_price = optional(string, "")
spot_instance_pools = optional(number, 2)
}), {})
}))
Default: {}
scaling_function_memory_mb
Description: The amount of memory to assign to scaling functions in MB
Type: number
Default: 512
security_group_ids
Description: Optional security group ID substitutions for Workflows resources.
Type:
object({
remote = optional(object({
adot = optional(string)
frontend = optional(string)
scheduler = optional(string)
storage = optional(string)
asg = optional(string)
asg_worker = optional(string)
alb = optional(string)
vpce = optional(string)
browser = optional(string)
downloader = optional(string)
}))
external_remote = optional(object({
auth = optional(string)
adot = optional(string)
frontend = optional(string)
scheduler = optional(string)
storage = optional(string)
asg = optional(string)
asg_worker = optional(string)
alb = optional(string)
vpce = optional(string)
browser = optional(string)
downloader = optional(string)
}))
workflows_services = optional(object({
bessie = optional(string)
bessie-rds = optional(string)
kilgore = optional(string)
grafana = optional(string)
otel = optional(string)
prometheus = optional(string)
alert-manager = optional(string)
alb = optional(string)
}))
delivery = optional(map(string))
bk = optional(object({
scaler = optional(string)
}))
cci = optional(object({
scaler = optional(string)
}))
gha = optional(object({
scaler = optional(string)
}))
gl = optional(object({
scaler = optional(string)
}))
})
Default: {}
tags
Description: Tags to add to every resource Aspect Workflows creates
Type: map(string)
Default: {}
telemetry
Description: Configuration options for Workflows telemetry
Type:
object({
# Configuration for where Workflows telemetry data gets exported
destinations = optional(object({
# Which exporters to set up.
honeycomb = optional(object({
# Honeycome dataset to set for exports to this destination.
dataset = optional(string)
# Honeycomb Team secret reference, used for authentication.
team_secret = object({
id = string
arn = string
})
}))
datadog = optional(object({
# Datadog agent ingest site.
site = string
# Datadog API key secret reference, used for authentication.
key_secret = object({
id = string
arn = string
})
}))
generic_otlp = optional(object({
# Endpoint where to export telemetry to.
endpoint = string
}))
}), {})
})
Default: {}
token
Description: The token auth configuration for use with Workflows.
Type:
object({
admin_users = list(string)
prefix = string
header = string
expiry = number
cognito_user_pool_arn = string
cognito_user_pool_id = string
cognito_client_id = string
amazon_verified_permissions_policy_store_id = string
amazon_verified_permissions_policy_store_arn = string
clusters = map(object({
base_permissions = object({
read = bool
write = bool
execute = bool
})
}))
eventbridge_scheduler_group_arn = string
eventbridge_scheduler_group_name = string
kms_key_arn = string
kms_key_id = string
})
Default: null
vpc_subnets_public
Description: List of subnet IDs to use for public facing VM infrastructure
Type: list(string)
Default: []
warming_sets
Description: Mapping of warming set to settings for that set
Type:
map(object({
additional_paths = optional(string)
}))
Default: {}
webapp
Description: n/a
Type:
object({
dns = object({
hosted_zone_id = optional(string, null)
hosted_zone_name = optional(string, null)
hosted_zone_record_name = optional(string, "app")
})
oidc = optional(object({
issuer = string
auth_endpoint = string
token_endpoint = string
user_info_endpoint = string
client_id = string
client_secret = string
session_timeout_seconds = optional(number, null)
}), null)
})
Default: null
Outputs
The following outputs are exported:
alarms_sns_topic_arn
Description: SNS topic ARN that provides notifications of all Workflows alarms
api_secret_id
Description: Secret for the Aspect API
bk_agent_token_secret_ids
Description: Mapping of Buildkite runner name to Buildkite agent token secret ID
bk_api_token_secret_ids
Description: Mapping of Buildkite runner name to Buildkite API token secret ID
bk_git_ssh_key_secret_ids
Description: Mapping of Buildkite runner name to ssh key secret ID
buildkite_agent_hooks_buckets
Description: Name of the bucket for storing custom Buildkite Agent hooks
cost_allocation_tag
Description: (deprecated) Name of the cost allocation tag to use
cost_allocation_tag_value
Description: (deprecated) The value of the cost allocation tag
external_remote_cache_endpoint
Description: The endpoint of the Internet-facing remote cache, if enabled.
external_remote_certificate
Description: The ACM certificate in use by the external remote cluster, if present.
external_remote_load_balancer
Description: The load balancer configuration for the external remote cluster, if present.
gha_lambda_webhook_secret_ids
Description: Mapping of GitHub Actions runner name and repo key to the ID's of the secrets containing the webhook token that the scaling lambda will use to verify the event came from GitHub
gha_runner_secret_ids
Description: Optional mapping of GitHub Actions runner name and repo key to runner secret ID
gha_secret_ids
Description: Mapping of GitHub Actions runner name and repo key to secret ID
github_token_secret_id
Description: Secret ID for a GitHub token used for making readonly calls to GitHub during a build
gl_secret_ids
Description: Mapping of Gitlab runner name and repo key to secret ID
grafana_ecs_task_exec_role
Description: The IAM role ARN for the ECS task execution
grafana_ecs_task_role
Description: The IAM role ARN for the ECS task
internal_remote_cache_certificate
Description: The CA certificate for the VPC-facing remote cache.
internal_remote_cache_endpoint
Description: The endpoint of the VPC-facing remote cache.
internal_remote_cache_private_ca_cert
Description: The CA Certificate used for the VPC-facing remote cache. This should only be needed when wiring up legacy runners to your cache.
internal_remote_load_balancer
Description: The load balancer configuration for the external remote cluster, if present.
managed_prometheus_endpoint
Description: The endpoint of the Amazon Managed Prometheus (AMP) endpoint
runner_secret_ids
Description: Mapping of CircleCI runner name to secret ID
security_group_rules
Description: Security group rules for the Workflows module
tags
Description: Tags that will be added on all resources
warming_management_policies
Description: n/a
webapp_certificate
Description: The ACM certificate in use by the external remote cluster, if present.
webapp_load_balancer
Description: The load balancer configuration for the external remote cluster, if present.