Setting Up Monitoring and Alerts in OCI for Your Resources

 


Oracle Cloud Infrastructure (OCI) provides powerful Monitoring and Alarms features to help you gain visibility into the health and performance of your cloud resources. In this blog post, we’ll walk through the step-by-step process of setting up monitoring and alerts for CPU, Memory, and other important metrics using OCI’s native tools.

✅ Whether you’re managing Compute Instances, Autonomous Databases, Load Balancers, or other services, OCI Monitoring helps you stay ahead of performance issues by automatically collecting and analyzing metrics.


๐ŸŽฏ What We Will Cover

  1. Understanding OCI Monitoring and Metrics

  2. Enabling Monitoring for Resources

  3. Setting Up Alarms (CPU, Memory, etc.)

  4. Viewing Metrics in the OCI Console

  5. Notifications Using OCI Notifications Service

  6. Bonus: Monitoring Using CLI & SDK (Optional for automation)

 

๐Ÿง  1. What is OCI Monitoring?

OCI Monitoring allows you to collect, view, and analyze metrics for your cloud resources in near real-time. It includes:

  • Metrics: Time-series data points related to a resource's performance (CPU, memory, IOPS, etc.)

  • Alarms: Rules that trigger when a metric crosses a threshold.

  • Namespaces: Logical groups of metrics. E.g., oci_computeagent, oci_autonomous_database, etc.

๐Ÿ› ️ 2. Enabling Monitoring for Your Resources

Most OCI services have monitoring enabled by default (e.g., Compute, Autonomous DB). For custom applications, you can publish custom metrics using the OCI SDK.

✅ Compute Instance Monitoring

To monitor compute metrics like CPU, memory, disk, etc., ensure that the OCI Compute Agent is running.

Steps:

  1. Log in to the OCI Console.

  2. Navigate to Compute > Instances.

  3. Click on your instance.

  4. In the Resources section, click Metrics.

  5. If no metrics appear:

    • Make sure the instance has Monitoring agent enabled.

    • SSH into the instance and run:

sudo systemctl status oracle-cloud-agent

✅ Autonomous Database Monitoring

Metrics like CPU, Storage, Sessions, etc., are automatically collected.

  • Navigate to Autonomous Database > [Your DB] > Metrics.

  • Choose from metrics like CpuUtilization, StorageUtilization, etc.

๐Ÿ”” 3. Setting Up Alarms in OCI

Let’s walk through creating an Alarm for CPU Utilization on a compute instance.

๐Ÿ“ Step-by-Step: Alarm for CPU Utilization

  1. Go to Observability & Management > Alarms.

  2. Click Create Alarm.

  3. Basic Info:

    • Name: High-CPU-Alarm

    • Compartment: Choose your compartment

    • Metric namespace: oci_computeagent

  4. Metric Details:

    • Metric name: CpuUtilization

    • Resource group: (Optional)

    • Dimensions:

      • resourceId: Select your compute instance OCID.

    • Statistic: mean

    • Interval: 1 minute

  5. Alarm Trigger Rule:

    • Trigger when: CpuUtilization > 80%

    • For: 3 out of 5 minutes

  6. Notification:

    • Select a Notification Topic (create one if needed).

    • Example: HighCPUAlertTopic

  7. Message Format: JSON or Raw Text

  8. Click Create Alarm.

Repeat similar steps for:

  • MemoryUtilization (if agent is running)

  • DiskIORead, DiskIOWrite

  • NetworkBytesIn, NetworkBytesOut



๐Ÿ“ˆ 4. Viewing Metrics in the OCI Console

  1. Navigate to Observability & Management > Metrics Explorer.

  2. Choose the:

    • Compartment

    • Namespace (e.g., oci_computeagent)

    • Metric name (e.g., CpuUtilization)

  3. Set dimensions (like instance ID).

  4. Select visualization type (line, bar, etc.)

  5. Use filters and time range for specific insights.

๐Ÿ“ฃ 5. Setting Up Notifications

๐Ÿ“ Steps to Create a Notification Topic

  1. Go to Observability & Management > Notifications > Topics.

  2. Click Create Topic.

    • Name: HighCPUAlertTopic

    • Compartment: Choose the same as your resource

  3. After creating the topic, click Create Subscription.

    • Protocol: Email

    • Endpoint: Your email address

  4. Confirm the email subscription (check your inbox).

Your alarms will now notify you whenever thresholds are crossed!


๐Ÿงช Bonus: Monitoring with CLI (Optional for Automation)

You can also create alarms and monitor metrics using the OCI CLI for automation purposes.

Sample CLI to Get CPU Metrics:


oci monitoring metric-data summarize-metrics-data \
  --namespace oci_computeagent \
  --query-text "CpuUtilization[1m].mean()" \
  --start-time 2025-05-25T00:00:00Z \
  --end-time 2025-05-25T23:59:59Z \
  --compartment-id ocid1.compartment.oc1..xxxxx \
  --resource-id ocid1.instance.oc1.iad.xxxxx

๐Ÿ” Understanding Metric Namespaces and Dimensions

OCI organizes monitoring data into namespaces and dimensions:

  • Namespace: A logical group of metrics for a service.

    • Examples:

      • oci_computeagent – Compute instance agent metrics

      • oci_autonomous_database – Autonomous DB metrics

      • oci_blockstore – Block volume metrics

  • Dimensions: Key-value pairs that help filter metric data.

    • Example: resourceId, availabilityDomain, instanceId, etc.

๐Ÿ’ก Tip: Use dimensions effectively to narrow down metrics for a specific resource or group.


๐Ÿ“Š Common Metrics to Monitor by Resource Type

๐Ÿ–ฅ️ Compute Instances

Metric Name                    Description
CpuUtilization%                 CPU in use
MemoryUtilization%            RAM used (requires agent)
DiskIORead/WriteI/O            operations per second
NetworkBytesIn/Out              Incoming/Outgoing traffic

๐Ÿ—„️ Block Volumes

MetricDescription
VolumeReadOps/sec      Read ops per second
VolumeWriteOps/secWrite ops per second

๐Ÿง  Autonomous Databases

Metric NameDescription
CpuUtilization% CPU used
StorageUtilization% storage used
SessionCountNumber of active DB sessions

๐Ÿ›ก️ Best Practices for OCI Monitoring & Alerts

  1. Use Dynamic Thresholds: Instead of static thresholds, evaluate historical trends to define meaningful alerts.

  2. Group Alerts by Environment: Separate alerts for dev/test/prod to avoid false alarms.

  3. Integrate with Incident Management Tools: Use OCI’s integration with PagerDuty, Slack, or Opsgenie.

  4. Audit Your Alarms: Periodically review active alarms and remove unnecessary ones.

  5. Tag Resources: Use tagging to easily filter and manage metrics.

๐Ÿ” Automatically Respond to Alerts

OCI Alarms can be used not only for notifications but also to trigger Functions or start automation workflows.

Example use cases:

  • Scale up compute instances when CPU > 80%

  • Restart a DB if storage utilization exceeds a threshold

  • Send logs to an external system (like Splunk)

⚙️ Automation with Terraform (Optional Section)

If you're managing OCI infrastructure using Terraform, you can create alarms as Infrastructure-as-Code.

Example Terraform Snippet for CPU Alarm


resource "oci_monitoring_alarm" "cpu_alarm" {
  compartment_id = var.compartment_ocid
  display_name   = "High CPU Alarm"
  metric_query   = "CpuUtilization[1m].mean() > 80"
  severity       = "CRITICAL"
  body           = "CPU usage is above threshold"
  is_enabled     = true
  message_format = "TEXT"
  repeat_notification_duration = "PT10M"
  destinations = [oci_ons_notification_topic.cpu_topic.id]
  query = <<EOT
    CpuUtilization[1m].mean() > 80
  EOT
  metric_compartment_id = var.compartment_ocid
  namespace             = "oci_computeagent"
  resource_group        = ""
}

๐Ÿ” Integration with Logging and Events

Combine Monitoring, Logging, and Events for complete observability.

  • Use Logging Search to investigate issues flagged by Alarms.

  • Use Event Rules to trigger remediation or audits.

๐Ÿงต Wrapping Up

With the power of OCI Monitoring and Alarms, you can:

✅ Stay ahead of system issues
✅ Get real-time notifications
✅ Automate incident response
✅ Maintain high availability and performance

By leveraging the steps above, you’ll have a robust observability setup for your OCI workloads. Don't forget to regularly audit your alarms and notification settings to align with evolving performance requirements.


๐Ÿ“š Additional Resources

Comments

Popular posts from this blog

Introduction to Oracle Vector Search – Concepts, Requirements & Use Cases