This is an old revision of the document!
Monitoring Azure AI APIs is critical for performance, usage tracking, quota management, and troubleshooting. Azure provides multiple built-in and extensible options to monitor its AI services (like Azure OpenAI, Cognitive Services, and Azure Machine Learning). Here's a breakdown of the available monitoring options:
Azure Monitor provides a centralized platform for collecting, analyzing, and acting on telemetry from Azure resources.
Each AI service exposes its own set of metrics in Azure Monitor:
| Metric | Description |
|---|---|
| Total Calls | Total number of API calls |
| Successful Calls | Count of HTTP 200 responses |
| Failed Calls | Count of 4xx/5xx errors |
| Latency | Response time percentiles (P50, P90, P95, etc.) |
| Throttled Calls | Requests blocked due to quota limits |
You can find these under:
Azure Portal β Monitor β Metrics β Select your AI resource
You can configure Diagnostic Settings on each Azure AI resource to send logs and metrics to:
Logs may include:
Enable via:
Resource β Monitoring β Diagnostic settings
If you're calling Azure AI APIs from your own application, you can use Application Insights to:
Integrates well with web apps, functions, and APIs
For services like Azure OpenAI and Cognitive Services:
You can set up alerts when usage approaches or exceeds thresholds.
If you're deploying models via Azure ML:
Studio β Endpoints β Monitoring tab
You can build wrappers or proxies around API calls to:
Use: