Setting Up Monitoring and Alerts in OCI for Your Resources
Oracle Cloud Infrastructure (OCI) provides powerful Monitoring and Alarms features to help you gain visibility into the health and performance of your cloud resources. In this blog post, we’ll walk through the step-by-step process of setting up monitoring and alerts for CPU, Memory, and other important metrics using OCI’s native tools.
✅ Whether you’re managing Compute Instances, Autonomous Databases, Load Balancers, or other services, OCI Monitoring helps you stay ahead of performance issues by automatically collecting and analyzing metrics.
๐ฏ What We Will Cover
-
Understanding OCI Monitoring and Metrics
-
Enabling Monitoring for Resources
-
Setting Up Alarms (CPU, Memory, etc.)
-
Viewing Metrics in the OCI Console
-
Notifications Using OCI Notifications Service
-
Bonus: Monitoring Using CLI & SDK (Optional for automation)
๐ง 1. What is OCI Monitoring?
OCI Monitoring allows you to collect, view, and analyze metrics for your cloud resources in near real-time. It includes:
-
Metrics: Time-series data points related to a resource's performance (CPU, memory, IOPS, etc.)
-
Alarms: Rules that trigger when a metric crosses a threshold.
-
Namespaces: Logical groups of metrics. E.g.,
oci_computeagent
,oci_autonomous_database
, etc.
๐ ️ 2. Enabling Monitoring for Your Resources
Most OCI services have monitoring enabled by default (e.g., Compute, Autonomous DB). For custom applications, you can publish custom metrics using the OCI SDK.
✅ Compute Instance Monitoring
To monitor compute metrics like CPU, memory, disk, etc., ensure that the OCI Compute Agent is running.
Steps:
-
Log in to the OCI Console.
-
Navigate to Compute > Instances.
-
Click on your instance.
-
In the Resources section, click Metrics.
-
If no metrics appear:
-
Make sure the instance has Monitoring agent enabled.
-
SSH into the instance and run:
✅ Autonomous Database Monitoring
Metrics like CPU, Storage, Sessions, etc., are automatically collected.
-
Navigate to Autonomous Database > [Your DB] > Metrics.
-
Choose from metrics like
CpuUtilization
,StorageUtilization
, etc.
๐ 3. Setting Up Alarms in OCI
Let’s walk through creating an Alarm for CPU Utilization on a compute instance.
๐ Step-by-Step: Alarm for CPU Utilization
-
Go to Observability & Management > Alarms.
-
Click Create Alarm.
-
Basic Info:
-
Name:
High-CPU-Alarm
-
Compartment: Choose your compartment
-
Metric namespace:
oci_computeagent
-
-
Metric Details:
-
Metric name:
CpuUtilization
-
Resource group: (Optional)
-
Dimensions:
-
resourceId
: Select your compute instance OCID.
-
-
Statistic:
mean
-
Interval:
1 minute
-
-
Alarm Trigger Rule:
-
Trigger when:
CpuUtilization
>80%
-
For:
3 out of 5 minutes
-
-
Notification:
-
Select a Notification Topic (create one if needed).
-
Example:
HighCPUAlertTopic
-
-
Message Format: JSON or Raw Text
-
Click Create Alarm.
Repeat similar steps for:
-
MemoryUtilization
(if agent is running) -
DiskIORead
,DiskIOWrite
-
NetworkBytesIn
,NetworkBytesOut
๐ 4. Viewing Metrics in the OCI Console
-
Navigate to Observability & Management > Metrics Explorer.
-
Choose the:
-
Compartment
-
Namespace (e.g.,
oci_computeagent
) -
Metric name (e.g.,
CpuUtilization
)
-
-
Set dimensions (like instance ID).
-
Select visualization type (line, bar, etc.)
-
Use filters and time range for specific insights.
๐ฃ 5. Setting Up Notifications
๐ Steps to Create a Notification Topic
-
Go to Observability & Management > Notifications > Topics.
-
Click Create Topic.
-
Name:
HighCPUAlertTopic
-
Compartment: Choose the same as your resource
-
-
After creating the topic, click Create Subscription.
-
Protocol:
Email
-
Endpoint: Your email address
-
-
Confirm the email subscription (check your inbox).
Your alarms will now notify you whenever thresholds are crossed!
๐งช Bonus: Monitoring with CLI (Optional for Automation)
You can also create alarms and monitor metrics using the OCI CLI for automation purposes.
Sample CLI to Get CPU Metrics:
๐ Understanding Metric Namespaces and Dimensions
OCI organizes monitoring data into namespaces and dimensions:
-
Namespace: A logical group of metrics for a service.
-
Examples:
-
oci_computeagent
– Compute instance agent metrics -
oci_autonomous_database
– Autonomous DB metrics -
oci_blockstore
– Block volume metrics
-
-
-
Dimensions: Key-value pairs that help filter metric data.
-
Example:
resourceId
,availabilityDomain
,instanceId
, etc.
-
๐ก Tip: Use dimensions effectively to narrow down metrics for a specific resource or group.
๐ Common Metrics to Monitor by Resource Type
๐ฅ️ Compute Instances
Metric Name | Description |
---|
CpuUtilization | % CPU in use |
MemoryUtilization | % RAM used (requires agent) |
DiskIORead/Write | I/O operations per second |
NetworkBytesIn/Out | Incoming/Outgoing traffic |
๐️ Block Volumes
Metric | Description |
---|---|
VolumeReadOps/sec | Read ops per second |
VolumeWriteOps/sec | Write ops per second |
๐ง Autonomous Databases
Metric Name | Description |
---|---|
CpuUtilization | % CPU used |
StorageUtilization | % storage used |
SessionCount | Number of active DB sessions |
๐ก️ Best Practices for OCI Monitoring & Alerts
-
Use Dynamic Thresholds: Instead of static thresholds, evaluate historical trends to define meaningful alerts.
-
Group Alerts by Environment: Separate alerts for dev/test/prod to avoid false alarms.
-
Integrate with Incident Management Tools: Use OCI’s integration with PagerDuty, Slack, or Opsgenie.
-
Audit Your Alarms: Periodically review active alarms and remove unnecessary ones.
-
Tag Resources: Use tagging to easily filter and manage metrics.
๐ Automatically Respond to Alerts
OCI Alarms can be used not only for notifications but also to trigger Functions or start automation workflows.
Example use cases:
-
Scale up compute instances when CPU > 80%
-
Restart a DB if storage utilization exceeds a threshold
-
Send logs to an external system (like Splunk)
⚙️ Automation with Terraform (Optional Section)
If you're managing OCI infrastructure using Terraform, you can create alarms as Infrastructure-as-Code.
Example Terraform Snippet for CPU Alarm
๐ Integration with Logging and Events
Combine Monitoring, Logging, and Events for complete observability.
-
Use Logging Search to investigate issues flagged by Alarms.
-
Use Event Rules to trigger remediation or audits.
๐งต Wrapping Up
With the power of OCI Monitoring and Alarms, you can:
✅ Stay ahead of system issues
✅ Get real-time notifications
✅ Automate incident response
✅ Maintain high availability and performance
By leveraging the steps above, you’ll have a robust observability setup for your OCI workloads. Don't forget to regularly audit your alarms and notification settings to align with evolving performance requirements.
๐ Additional Resources
Comments
Post a Comment