Setup on Amazon Web Services
We recommend creating a new subaccount, so that AWS billing gives an easy insight into the costs, and also so our engineers don't have overly broad access to your other infrastructure.
Amazon's documentation for subaccounts is at https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_accounts_create.html.
Aspect will need to share our terraform module and base AMI with your org. We'll need
- your AWS account ID
- the region you plan to deploy to
- the ARN of the role(s) you use to run terraform operations like
plan
andapply
.
Granting permissions
First, we need to grant permission for Aspect engineers to perform setup and maintenance.
Create a role to hold our policies. You can do this in Terraform, or in the AWS console:
Navigate to IAM > Roles > Create role
- Select "trusted entity"
- Trusted entity type: AWS account
- Another AWS account:
302232432727
- Require MFA: enable (our engineers are required to use multi-factor auth)
- Name, review, and create
- One choice for the name is
aspect-workflows-comaintainers
. We'll need to know the name in order to assume the role.
- One choice for the name is
If Aspect engineers will perform the terraform apply
then we need more permissions:
Permissions required for terraform apply
autoscaling:CreateAutoScalingGroup
autoscaling:DeleteAutoScalingGroup
autoscaling:DescribeAutoScalingGroups
autoscaling:DescribeScalingActivities
autoscaling:SetInstanceProtection
autoscaling:UpdateAutoScalingGroup
cloudformation:CreateStack
cloudformation:DeleteStack
cloudformation:DescribeStacks
cloudformation:GetTemplate
cloudwatch:DeleteAlarms
cloudwatch:DescribeAlarms
cloudwatch:ListTagsForResource
cloudwatch:PutMetricAlarm
ec2:AuthorizeSecurityGroupEgress
ec2:AuthorizeSecurityGroupIngress
ec2:CreateLaunchTemplate
ec2:CreateSecurityGroup
ec2:DeleteLaunchTemplate
ec2:DeleteSecurityGroup
ec2:DescribeImages
ec2:DescribeLaunchTemplates
ec2:DescribeLaunchTemplateVersions
ec2:DescribeNetworkAcls
ec2:DescribeNetworkInterfaces
ec2:DescribeRouteTables
ec2:DescribeSecurityGroups
ec2:DescribeSubnets
ec2:DescribeVpcAttribute
ec2:DescribeVpcClassicLink
ec2:DescribeVpcClassicLinkDnsSupport
ec2:DescribeVpcs
ec2:RevokeSecurityGroupEgress
elasticloadbalancing:CreateListener
elasticloadbalancing:CreateLoadBalancer
elasticloadbalancing:CreateTargetGroup
elasticloadbalancing:DeleteListener
elasticloadbalancing:DeleteLoadBalancer
elasticloadbalancing:DeleteTargetGroup
elasticloadbalancing:DescribeListeners
elasticloadbalancing:DescribeLoadBalancerAt
elasticloadbalancing:DescribeLoadBalancers
elasticloadbalancing:DescribeTags
elasticloadbalancing:DescribeTargetGroupAt
elasticloadbalancing:DescribeTargetGroups
elasticloadbalancing:DescribeTargetHealth
elasticloadbalancing:ModifyLoadBalancerAt
elasticloadbalancing:ModifyTargetGroupAt
events:DeleteRule
events:DescribeRule
events:ListTagsForResource
events:ListTargetsByRule
events:PutRule
events:PutTargets
events:RemoveTargets
iam:AddRoleToInstanceProfile
iam:AttachRolePolicy
iam:CreateInstanceProfile
iam:CreatePolicy
iam:CreateRole
iam:DeleteInstanceProfile
iam:DeletePolicy
iam:DeleteRole
iam:DetachRolePolicy
iam:GetInstanceProfile
iam:GetPolicy
iam:GetPolicyVersion
iam:GetRole
iam:ListAttachedRolePolicies
iam:ListInstanceProfilesForRole
iam:ListPolicyVersions
iam:ListRolePolicies
iam:RemoveRoleFromInstanceProfile
lambda:AddPermission
lambda:CreateFunction
lambda:DeleteFunction
lambda:GetFunction
lambda:GetFunctionCodeSigningConfig
lambda:GetPolicy
lambda:ListVersionsByFunction
lambda:RemovePermission
logs:CreateLogGroup
logs:DeleteLogGroup
logs:DescribeLogGroups
logs:ListTagsLogGroup
logs:PutRetentionPolicy
memorydb:CreateCluster
memorydb:CreateSubnetGroup
memorydb:DeleteCluster
memorydb:DeleteSubnetGroup
memorydb:DescribeClusters
memorydb:DescribeSubnetGroups
memorydb:ListTags
s3:CreateBucket
s3:DeleteBucket
s3:DeleteBucketPolicy
s3:DeleteObject
s3:GetAccelerateConfiguration
s3:GetBucketAcl
s3:GetBucketCors
s3:GetBucketLogging
s3:GetBucketPolicy
s3:GetBucketPublicAccessBlock
s3:GetBucketRequestPayment
s3:GetBucketTagging
s3:GetBucketVersioning
s3:GetBucketWebsite
s3:GetEncryptionConfiguration
s3:GetLifecycleConfiguration
s3:GetObject
s3:GetObjectAttributes
s3:GetObjectTagging
s3:GetObjectVersion
s3:GetObjectVersionAttributes
s3:GetReplicationConfiguration
s3:ListAllMyBuckets
s3:ListBucket
s3:ListObjects
s3:PutBucketLogging
s3:PutBucketPolicy
s3:PutBucketPublicAccessBlock
s3:PutEncryptionConfiguration
s3:PutLifecycleConfiguration
s3:PutObject
ssm:DeleteParameter
ssm:DescribeParameters
ssm:GetParameter
ssm:ListTagsForResource
ssm:PutParameter
sts:AssumeRole
sts:GetCallerIdentity
Create an Amazon Machine Image (AMI) (optional)
Aspect Workflows runs on EC2 instances, not Kubernetes pods. Therefore, the base image uses an AMI. We provide one that has our dependencies, and if your build is fully hermetic, this will work fine.
This bit of terraform can be used to locate our AMI at plan
/apply
-time:
# Find the AMI shared from the Aspect account.
data "aws_ami" "aspect_worker_ami" {
most_recent = true
# Owner is aspect-build AWS org
owners = ["302232432727"]
filter {
name = "name"
# Replace 5-X-X with the Aspect Workflows version
# NB: for Buildkite, use aspect-ci-bk-runner-5-X-X
values = ["aspect-ci-runner-5-X-X"]
}
}
If your build is not hermetic, for example some C++ dynamic-linked library needs to be on the machine, you can use Packer to make a reproducible build of a custom AMI. See Building Machine Images.
Add the terraform module
Our terraform module is currently delivered in an S3 bucket. You add it to your existing Terraform setup.
Here's an example:
module "aspect_workflows" {
# Replace 5.x.x with an actual version:
source = "s3::https://aspect-artifacts.s3.us-east-2.amazonaws.com/5.x.x/workflows/terraform-aws-aspect-workflows.zip"
customer_id = "MyCorp"
vpc_id = data.terraform_state.outputs.circleci_vpc.vpc_id
vpc_subnets = [data.terraform_state.outputs.circleci_vpc.private_subnets[0]]
# Replace XXX with one of gha, cci, bk
hosts = ["XXX"]
# Define Bazel states we know how to warm up
warming_sets = {
default = {}
}
resource_types = {
"default" = {
# One or more instance types can be specified via 'instance_types'.
# Specifying multiple instance types allows Workflows to scale out when demand
# for a particular instance is high, or can not be fulfilled.
instance_types = ["i4i.xlarge", "i4i.2xlarge"]
image_id = data.aws_ami.aspect_worker_ami.id
}
}
# Replace XXX with one of gha, cci, bk
XXX_runner_groups = {
default = {
max_runners = 10
min_runners = 0
resource_type = "default" # Corresponds to a resource_types entry above
warming = true
warming_set = "default" # Corresponds to a warming_sets entry above
...
}
default-warming = {
max_runners = 1
min_runners = 0
resource_type = "default" # Corresponds to a resource_types entry above
policies = {
# "default" key in warming_management_policies corresponds to a warming_sets entry above
warming_manage : module.aspect_workflows.warming_management_policies["default"].arn
}
warming_set = "default" # Corresponds to a warming_sets entry above
...
}
}
}
Applying custom security groups to runners
It might be required to add custom security groups to the runners that are managed by Aspect workflows.
This can be achieved by setting the security_groups
attribute on the runners or queue configuration object.
This is a map from string
-> AWS Security Group ID (the name is not currently used by Workflows).
runners = {
default = {
...
security_groups = {
vpn_access : aws_security_group.vpc_access.id
}
...
}
}
Allowing Aspect read-only support access
The Workflows module takes an optional support.support_role_name
configuration option that Workflows will attach policies to,
which provide read-only access to key logs, metrics and configuration values.
The policies are only created and attached if the role is given, Workflows will not create a role automatically to add these policies too.
Specifically, the policy defined in this document allows:
- Read / List on all
/aw
SSM parameter store keys - Describe on all ASGs and their associated instances and the scaling activity
- Get on log streams and log events with the
aw_
prefix - SSM access to running instances and port forwarding for Grafana
For example:
resource "aws_iam_role" "support" {
name = "AspectWorkflowsSupport"
...
}
module "workflows" {
...
support = {
support_role_name = aws_iam_role.support.id
}
...
}
Allowing Aspect privileged support access
The Workflows module takes an optional support.operator_role_name
configuration option that Workflows will attach policies to, that provide privileged access to key resources.
This role will be a superset of the above read-only support access role.
The policies are only created and attached if the role is given, Workflows will not create a role automatically to add these policies too.
Specifically, the policy defined in this document allows:
- Manage Aspect build runner EC2 hosts, specifically by rebooting, stopping, and terminating.
- Delete S3 objects, only in specific Aspect-managed buckets.
- Manage the Redis cache, including updating/deleting the cluster, and creating snapshots.
For example, it could be extended via:
resource "aws_iam_role" "operator" {
name = "AspectWorkflowsOperator"
...
}
module "aspect_workflows" {
...
support = {
operator_role_name = aws_iam_role.operator.id
}
...
}
In addition, Workflows can also enable SSM access to key resources which will be available via the operator role only.
To enable SSM access, set the following property in the support
configuration. By default, SSM access is disabled.
module "aspect_workflows" {
...
support = {
enable_ssm_access = true
}
...
}
Alerting
When configuring the aspect_workflows terraform module, set the support.pagerduty_integration_key
key to
the value that has been provided by Aspect.
module "aspect_workflows" {
...
support = {
pagerduty_integration_key = "123abc"
}
...
}
It's possible to exclude certain alarms for the various configured runner groups from being monitored by Aspect's oncall Workflows engineers. Note that excluded alarms will still show as "in alarm" in the CloudWatch dashboard, but they will not notify ASpect Oncall of an issue. This can be useful if a particular runner group is used for canarying runners, or running experiments.
To exclude an alarm, set the exclude_oncall_alerts
attribute on the runner group:
default = {
exclude_oncall_alerts = ["Runner Alarms"]
}
Possible values for the exclusion list:
- Runner Alarms: Excludes alarms generated by runners, such as from bootstrap.
Cost allocation tagging
To tag all resources created by the Workflows module with cost allocation tags, default tags can be set on the AWS provider that is passed to the module. Workflows also supports overriding the default cost allocation tag, and its value.
provider "aws" {
alias = "workflows"
default_tags {
tags = {
(module.workflows.cost_allocation_tag) = module.workflows.cost_allocation_tag_value
}
}
}
module "workflows" {
providers = {
aws = aws.workflows
}
# To override the values of the cost allocation tag and tag value
cost_allocation_tag = "MyCustomCostAllocationTag"
cost_allocation_tag_value = "MyCustomCostAllocationTagValue"
}
To apply additional tags to build resources, see the section below.
Adding custom tags to build resources
It may be required to add custom tags to the instances that are running builds, for example for security auditing or cost tracking.
To add additional tags to any EC2 resources Workflows creates from the ASGs to run builds, set the tags
attribute on the resource:
resource_types = {
"default" = {
...
tags = {
CustomTag : "CustomValue"
}
}
}
Adding additional tags to builds resource is not yet supported on Buildkite agents
These tags will always propagate to the runners, in addition to the existing cost allocation tag settings.
Apply
Run terraform apply
, or use whatever automation you already use for your Infrastructure-as-code such as Atlantis.
You'll get a resulting infrastructure like the following:
Next steps
Continue by choosing which CI platform you plan to interact with, and follow the corresponding installation steps.