How to troubleshoot missing logs in Azure Log Analytics workspace

Understanding Missing Logs in Log Analytics

Azure Log Analytics workspace missing logs is a common issue because resource logs are not collected by default. You must create a diagnostic setting for each resource. This guide covers every reason logs might be missing and how to fix it.

Why This Problem Matters in Production

In enterprise Azure environments, missing logs in Azure Log Analytics workspace issues rarely occur in isolation. They typically surface during peak usage periods, complex deployment scenarios, or when multiple services interact under load. Understanding the underlying architecture helps you move beyond symptom-level fixes to root cause resolution.

Before diving into the diagnostic commands below, it is important to understand the service’s operational model. Azure distributes workloads across multiple fault domains and update domains. When problems arise, they often stem from configuration drift between what was deployed and what the service runtime expects. This mismatch can result from ARM template changes that were not propagated, manual portal modifications that bypassed your infrastructure-as-code pipeline, or service-side updates that changed default behaviors.

Production incidents involving missing logs in Azure Log Analytics workspace typically follow a pattern: an initial trigger event causes a cascading failure that affects dependent services. The key to efficient troubleshooting is isolating the blast radius early. Start by confirming whether the issue is isolated to a single resource instance, affects an entire resource group, or spans the subscription. This scoping exercise determines whether you are dealing with a configuration error, a regional service degradation, or a platform-level incident.

The troubleshooting approach in this guide follows the industry-standard OODA loop: Observe the symptoms through metrics and logs, Orient by correlating findings with known failure patterns, Decide on the most likely root cause and remediation path, and Act by applying targeted fixes. This structured methodology prevents the common anti-pattern of random configuration changes that can make the situation worse.

Service Architecture Background

To troubleshoot missing logs in Azure Log Analytics workspace effectively, you need a mental model of how the service operates internally. Azure services are built on a multi-tenant platform where your resources share physical infrastructure with other customers. Resource isolation is enforced through virtualization, network segmentation, and quota management. When you experience performance degradation or connectivity issues, understanding which layer is affected helps you target your diagnostics.

The control plane handles resource management operations such as creating, updating, and deleting resources. The data plane handles the runtime operations that your application performs, such as reading data, processing messages, or serving requests. Control plane and data plane often have separate endpoints, separate authentication requirements, and separate rate limits. A common troubleshooting mistake is diagnosing a data plane issue using control plane metrics, or vice versa.

Azure Resource Manager (ARM) orchestrates all control plane operations. When you create or modify a resource, the request flows through ARM to the resource provider, which then provisions or configures the underlying infrastructure. Each step in this chain has its own timeout, retry policy, and error reporting mechanism. Understanding this chain helps you interpret error messages and identify which component is failing.

Why Logs Are Missing

Cause	How to Check	Resolution
No diagnostic setting created	`az monitor diagnostic-settings list`	Create a diagnostic setting
Ingestion latency (up to 90 min)	Check `_LogOperation` table	Wait for data pipeline
Resource not generating events	Activity on the resource?	Generate test traffic
Wrong log category selected	Check diagnostic setting config	Enable correct categories
Workspace daily cap reached	Check workspace settings	Increase or remove daily cap
Data export/routing error	Check `_LogOperation`	Re-create diagnostic setting
Log Analytics workspace unhealthy	Check Resource Health	Check Azure status page
Inactive resource backoff	Up to 2-hour delay after 7 days	Generate activity on resource

Step 1: Check Diagnostic Settings

# List diagnostic settings for a resource
az monitor diagnostic-settings list \
  --resource "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Web/sites/{app}"

# If empty, no logs are being collected — create one
az monitor diagnostic-settings create \
  --name "send-to-workspace" \
  --resource "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Web/sites/{app}" \
  --workspace "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.OperationalInsights/workspaces/{workspace}" \
  --logs '[{"categoryGroup":"allLogs","enabled":true}]' \
  --metrics '[{"category":"AllMetrics","enabled":true}]'

Common Resource Types and Log Categories

Resource Type	Key Log Categories
App Service	AppServiceHTTPLogs, AppServiceConsoleLogs, AppServiceAppLogs
Azure SQL	SQLInsights, Errors, Timeouts, Blocks, Deadlocks
Key Vault	AuditEvent
Storage	StorageRead, StorageWrite, StorageDelete
Function App	FunctionAppLogs
API Management	GatewayLogs, WebSocketConnectionLogs
Cosmos DB	DataPlaneRequests, QueryRuntimeStatistics
Event Hubs	ArchiveLogs, OperationalLogs, AutoScaleLogs

# Use category groups (allLogs captures everything)
az monitor diagnostic-settings create \
  --name "all-logs" \
  --resource "{resource-id}" \
  --workspace "{workspace-id}" \
  --logs '[{"categoryGroup":"allLogs","enabled":true}]'

# Or select specific categories
az monitor diagnostic-settings create \
  --name "specific-logs" \
  --resource "{resource-id}" \
  --workspace "{workspace-id}" \
  --logs '[{"category":"AuditEvent","enabled":true},{"category":"AllMetrics","enabled":true}]'

Step 2: Check Ingestion Latency

// Check when data was last ingested
union *
| where TimeGenerated > ago(24h)
| summarize LastLog = max(TimeGenerated) by Type
| order by LastLog desc

// Check for ingestion delays
_LogOperation
| where TimeGenerated > ago(24h)
| where Level == "Warning" or Level == "Error"
| project TimeGenerated, Operation, Detail, _ResourceId
| order by TimeGenerated desc

// Ingestion latency analysis
let startTime = ago(1h);
let endTime = now();
union *
| where TimeGenerated between(startTime .. endTime)
| extend IngestionDelay = ingestion_time() - TimeGenerated
| summarize P50 = percentile(IngestionDelay, 50), 
    P95 = percentile(IngestionDelay, 95),
    P99 = percentile(IngestionDelay, 99) by Type
| order by P95 desc

Step 3: Check Workspace Health

# Check workspace status
az monitor log-analytics workspace show \
  --resource-group "my-rg" \
  --workspace-name "my-workspace" \
  --query "{name:name, sku:sku.name, retentionInDays:retentionInDays, dailyQuotaGb:workspaceCapping.dailyQuotaGb}"

# Check if daily cap is enabled and reached
az monitor log-analytics workspace show \
  --resource-group "my-rg" \
  --workspace-name "my-workspace" \
  --query "workspaceCapping"

// Check workspace daily volume
Usage
| where TimeGenerated > ago(7d)
| summarize DataGB = sum(Quantity) / 1024 by bin(TimeGenerated, 1d), DataType
| order by TimeGenerated desc

// Check if daily cap was hit
_LogOperation
| where TimeGenerated > ago(7d)
| where Operation == "Data collection stopped"
| project TimeGenerated, Detail

Correlation and Cross-Service Diagnostics

Modern Azure architectures involve multiple services working together. A problem in missing logs in Azure Log Analytics workspace may actually originate in a dependent service. For example, a database timeout might be caused by a network security group rule change, a DNS resolution failure, or a Key Vault access policy that prevents secret retrieval for the connection string.

Use Azure Resource Graph to query the current state of all related resources in a single query. This gives you a snapshot of the configuration across your entire environment without navigating between multiple portal blades. Combine this with Activity Log queries to build a timeline of changes that correlates with your incident window.

Application Insights and Azure Monitor provide distributed tracing capabilities that follow a request across service boundaries. When a user request touches multiple Azure services, each service adds its span to the trace. By examining the full trace, you can see exactly where latency spikes or errors occur. This visibility is essential for troubleshooting in microservices architectures where a single user action triggers operations across dozens of services.

For complex incidents, consider creating a war room dashboard in Azure Monitor Workbooks. This dashboard should display the key metrics for all services involved in the affected workflow, organized in the order that a request flows through them. Having this visual representation during an incident allows the team to quickly identify which service is the bottleneck or failure point.

Step 4: Verify Agent-Based Collection

# For VMs using Azure Monitor Agent (AMA)
# Check if agent is installed and healthy
az vm extension show \
  --vm-name "my-vm" \
  --resource-group "my-rg" \
  --name "AzureMonitorLinuxAgent"  # or AzureMonitorWindowsAgent

# Check data collection rules (DCR)
az monitor data-collection rule list \
  --resource-group "my-rg" \
  -o table

# Check DCR association
az monitor data-collection rule association list \
  --resource "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Compute/virtualMachines/{vm}"

// Check Heartbeat from monitored VMs
Heartbeat
| where TimeGenerated > ago(1h)
| summarize LastHeartbeat = max(TimeGenerated) by Computer, OSType
| order by LastHeartbeat desc

// Missing computers (no heartbeat in last 30 minutes)
Heartbeat
| summarize LastHeartbeat = max(TimeGenerated) by Computer
| where LastHeartbeat < ago(30m)
| order by LastHeartbeat asc

Step 5: Common Pitfalls

Storage Account Behind VNet

# Diagnostic settings can't write to storage behind VNet
# unless "Allow trusted Microsoft services" is checked

az storage account update \
  --name "mydiagnosticsstorage" \
  --resource-group "my-rg" \
  --bypass "AzureServices"

Multiple Diagnostic Settings

Each resource can have up to 5 diagnostic settings. If you have multiple settings sending to different destinations, verify each one is configured correctly:

# List all diagnostic settings for a resource
az monitor diagnostic-settings list --resource "{resource-id}" -o table

# Delete and recreate a broken setting
az monitor diagnostic-settings delete \
  --name "broken-setting" \
  --resource "{resource-id}"

Non-ASCII Resource IDs

Resources with non-ASCII characters in their IDs may cause diagnostic settings to silently fail. Rename resources to use only ASCII characters.

Step 6: Key Metrics to Monitor

# Monitor workspace health metrics
az monitor metrics list \
  --resource "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.OperationalInsights/workspaces/{workspace}" \
  --metric "AvailabilityRate_Query" \
  --interval PT1H

# Available metrics:
# AvailabilityRate_Query — query success rate
# Ingestion Time — time to ingest data
# Ingestion Volume — data volume ingested
# Query failure count — failed queries

Troubleshooting Decision Tree

Does the resource have a diagnostic setting? → If no, create one
Was the setting created recently (< 90 min)? → Wait for initial ingestion
Is the resource generating traffic? → Test with actual operations
Check _LogOperation for errors → Fix routing issues
Is the workspace daily cap reached? → Increase cap or wait for reset
Check Resource Health → workspace might be degraded
Disable and re-enable the diagnostic setting → fixes stuck routing

Monitoring and Alerting Strategy

Reactive troubleshooting is expensive. For every hour spent diagnosing a production issue, organizations lose revenue, customer trust, and engineering productivity. A proactive monitoring strategy for missing logs in Azure Log Analytics workspace should include three layers of observability.

The first layer is metric-based alerting. Configure Azure Monitor alerts on the key performance indicators specific to this service. Set warning thresholds at 70 percent of your limits and critical thresholds at 90 percent. Use dynamic thresholds when baseline patterns are predictable, and static thresholds when you need hard ceilings. Dynamic thresholds use machine learning to adapt to your workload's natural patterns, reducing false positives from expected daily or weekly traffic variations.

The second layer is log-based diagnostics. Enable diagnostic settings to route resource logs to a Log Analytics workspace. Write KQL queries that surface anomalies in error rates, latency percentiles, and connection patterns. Schedule these queries as alert rules so they fire before customers report problems. Consider implementing a log retention strategy that balances diagnostic capability with storage costs, keeping hot data for 30 days and archiving to cold storage for compliance.

The third layer is distributed tracing. When missing logs in Azure Log Analytics workspace participates in a multi-service transaction chain, distributed tracing via Application Insights or OpenTelemetry provides end-to-end visibility. Correlate trace IDs across services to pinpoint exactly where latency or errors originate. Without this correlation, troubleshooting multi-service failures becomes a manual, time-consuming process of comparing timestamps across different log streams.

Beyond alerting, implement synthetic monitoring that continuously tests critical user journeys even when no real users are active. Azure Application Insights availability tests can probe your endpoints from multiple global locations, detecting outages before your users do. For missing logs in Azure Log Analytics workspace, create synthetic tests that exercise the most business-critical operations and set alerts with a response time threshold appropriate for your SLA.

Operational Runbook Recommendations

Document the troubleshooting steps from this guide into your team's operational runbook. Include the specific diagnostic commands, expected output patterns for healthy versus degraded states, and escalation criteria for each severity level. When an on-call engineer receives a page at 2 AM, they should be able to follow a structured decision tree rather than improvising under pressure.

Consider automating the initial diagnostic steps using Azure Automation runbooks or Logic Apps. When an alert fires, an automated workflow can gather the relevant metrics, logs, and configuration state, package them into a structured incident report, and post it to your incident management channel. This reduces mean time to diagnosis (MTTD) by eliminating the manual data-gathering phase that often consumes the first 15 to 30 minutes of an incident response.

Implement a post-incident review process that captures lessons learned and feeds them back into your monitoring and runbook systems. Each incident should result in at least one improvement to your alerting rules, runbook procedures, or service configuration. Over time, this continuous improvement cycle transforms your operations from reactive fire-fighting to proactive incident prevention.

Finally, schedule regular game day exercises where the team practices responding to simulated incidents. Azure Chaos Studio can inject controlled faults into your environment to test your monitoring, alerting, and runbook effectiveness under realistic conditions. These exercises build muscle memory and identify gaps in your incident response process before real incidents expose them.

Summary

Missing logs in Log Analytics are almost always caused by missing diagnostic settings — resource logs are not collected by default. Create a diagnostic setting with categoryGroup: allLogs for each resource you want to monitor. After creation, expect up to 90 minutes for initial data flow. Check _LogOperation for ingestion errors, verify the workspace daily cap hasn't been reached, and ensure agent-based VMs have the Azure Monitor Agent installed with an associated data collection rule.

For more details, refer to the official documentation: Overview of Log Analytics in Azure Monitor, Design a Log Analytics workspace architecture.

Zeeshan

My technology den