Skip to main content
Version: 5.14.x

Infrastructure Alerting

Workflows provides direct integration with Firehydrant and Slack for alarms and notifications generated for the deployed infrastructure.

Based on the severity, the alert is routed to the appropriate service. For critical issues, Workflows routes the alert to Firehydrant, which in turn notifies Aspect's oncall engineers.

Setup for alerting differs between Workflows versions and cloud provider.

Workflows automatically sets-up the required credentials and routing during the Terraform apply, no further action is required.

To opt out of sending alerts to Aspect, set the following property to False on the Workflows Terraform module.

module "aspect_workflows" {
support = {
alert_aspect = False
}
}

Excluding runner groups

To exclude certain alarms for the various configured runner groups from being monitored by Firehydrant, and therefore Aspect's on-call Workflows engineers. This can be useful if a particular runner group is used for canary runners, or running experiments.

note

Excluded alarms still appear as "in alarm" in the CloudWatch or Google Cloud Alerts dashboard, but they do not notify Firehydrant of an issue.

To exclude an alarm, set the exclude_oncall_alerts attribute on the runner group:

default = {
exclude_oncall_alerts = ["Runner Alarms"]
}

Possible values for the exclusion list:

  • Runner Alarms: Excludes alarms generated by runners, such as from bootstrap.

Oncall support

To provide better support during incidents, Workflows can apply permissions to a given IAM role that allows Aspect's oncall engineer to access the Workflows infrastructure deployed in your account.

This access is scoped specifically to the resources that Workflows creates and owns, strictly no access is granted to other resources in the account.

note

It is not required to grant Aspect either of these roles, however granting the support role greatly aids in speeding up investigations and support during incidents and outages.

The policies are only created and attached if the role is given; Workflows does not create a role automatically to add these policies too.

Access levels

Support

Provides read only access to Workflows resources such as logs, metrics and configuration values.

The policy defined in this document allows:

  • Read / List on all /aw SSM parameter store keys.
  • Describe on all ASGs and their associated instances and the scaling activity.
  • Get on log streams and log events with the aw_ prefix.

To allow support level access, provide a IAM role resource to the support_role_name configuration property on the Terraform module.

resource "aws_iam_role" "support" {
name = "AspectWorkflowsSupport"
...
}

module "aspect_workflows" {
support = {
support_role_name = aws_iam_role.support.name
}
}
Operator

This role is a super-set of the preceding read-only support access role.

The policy defined in this document allows:

  • SSM access to running instances and port forwarding for Grafana
  • Manage Aspect build runner EC2 hosts, specifically by rebooting, stopping, and terminating.
  • Delete S3 objects and tags, only in specific Aspect-managed buckets.
  • Manage the Redis cache, including updating/deleting the cluster, and creating snapshots.

To allow operator level access, provide a IAM role resource to the support_role_name configuration property on the Terraform module.

resource "aws_iam_role" "operator" {
name = "AspectWorkflowsOperator"
...
}

module "aspect_workflows" {
support = {
operator_role_name = aws_iam_role.operator.name
}
}

SSM access

In addition, Workflows can also enable SSM access to key resources which is available via the operator role only. To enable SSM access, set the following property in the support configuration. By default, SSM access is turned off.

module "aspect_workflows" {
support = {
enable_ssm_access = true
}
}
Co-Maintainer

This role is a super-set of the preceding read-only support access and operator roles and provides the ability for Aspect engineers to apply the Terraform workspace for your installation.

note

This scoped role is only available for Google Cloud installations at this time. AWS support coming soon.

To allow comaintainer level access, create an IAM membership resource as follows.

Membership in the group is managed via access request approval for on-call engineers.

resource "google_project_iam_member" "workflows_comaintainer_access" {
project = local.project
role = module.aspect_workflows.aspect_comaintainer.comaintainer_role
member = "group:workflows-comaintainer@aspect.build"
}