Creating CloudWatch Alarms for Redshift using Terraform

Monitor your data warehouse utilization with IAC

Published in

Data Engineer Things

3 min readMar 6, 2024

“Help us! None of our dashboards are working and we have a client meeting in 30 minutes!” This was the start of my day.

A quick look at our redshift cluster showed CPU usage at nearly 100% for the past 12 hours. After identifying and cancelling an erroneous query responsible for throttling the cluster, adequate performance resumed.

After review of the incident, it became immediately clear that sufficient alarms on our redshift cluster would’ve mitigated (and perhaps prevented) the emergency.

CloudWatch Alarms for Redshift

AWS allows you to configure alarms for redshift through the redshift console. You can also create alarms for redshift directly through Cloudwatch: Monitoring Amazon Redshift using CloudWatch metrics.

You can also configure metric alarms through their terraform provider — which I’ll be focusing on in this article.

🧠 Their terraform documentation doesn’t offer great examples for redshift alarms, hence me writing this guide.

CloudWatch Alarm for High CPU Usage in Redshift Nodes

To create an alarm for high CPU Utilization among compute nodes, you will need to create a custom metric query using aws’ math syntax.

💡 If you select CPUUtilization without filtering based on NodeID, you will receive data for the average of all nodes, including the leader. Usually, the leader should be ignored — since an overwhelmed cluster will have ~100% utilization on the compute nodes only, and the leader will be okay.

The query we’ll be using is:

SELECT
  AVG(CPUUtilization)
FROM
  SCHEMA("AWS/Redshift", ClusterIdentifier, NodeID)
WHERE
  ClusterIdentifier = 'my_cluster'
  AND NodeID != 'Leader'

A few notes here:

The expression may look like SQL but does not support most SQL expressions. For example, I wanted to use NodeId IS LIKE "Compute-%" but IS, LIKE, and wildcard expressions are not supported.
Double quotes are used for AWS entities
Single quotes are used for identifiers (i.e. “name of things”). In other words, use single quotes in your WHERE clause.

All available metrics and dimensions for redshift can be found here.

Here’s how you create the alarm in terraform:

# Assuming you already have this
resource "aws_redshift_cluster" "my_cluster" {
  ...
}

# This is the entity you'll want to create
resource "aws_cloudwatch_metric_alarm" "cpu-utilization-98" {
  alarm_name                = "Redshift-CPU-Utilization-above-98"
  comparison_operator       = "GreaterThanOrEqualToThreshold"
  evaluation_periods        = 2
  threshold                 = 98
  alarm_description         = "CPU usage is greater than 90% for redshift cluster '${aws_redshift_cluster.my_cluster.cluster_identifier}'"
  insufficient_data_actions = []

  alarm_actions = [aws_sns_topic.sns.arn]  # The SNS topic for your alarm notifs

  metric_query {
    id          = "m1"  # doesn't really matter what you put here
    expression  = <<EOH
      SELECT
        AVG(CPUUtilization)
      FROM SCHEMA("AWS/Redshift", ClusterIdentifier, NodeID)
        WHERE
          ClusterIdentifier = '${aws_redshift_cluster.my_cluster.cluster_identifier}'
          AND NodeID != 'Leader'
      EOH
    label       = "CPUUtilization for redshift nodes (excluding leader)"
    period      = 600 # 10 minutes
    return_data = "true"
  }

}

There you have it!

Conclusion:

Creating CloudWatch alarms for Redshift is a recommended way to monitor your data warehouse utilization. Now you have an easy way to do it in Terraform!

Resources:

Terraform documentation: aws_redshift_cluster
Terraform documentation: aws_cloudwatch_metric_alarm
All available metrics and dimensions for redshift can be found here.