How to fix Azure Search indexing failures and data source errors

Understanding Azure AI Search Indexing Failures

Azure AI Search (formerly Azure Cognitive Search) provides full-text search, vector search, and AI enrichment capabilities over your data. The indexing pipeline — which reads data from sources, transforms it through skillsets, and writes it to the search index — is where most failures occur. Data source connectivity errors, document parsing failures, field mapping mismatches, and skill execution timeouts can all prevent your data from being indexed.

This guide covers every indexing failure category with exact error messages, diagnostic steps, and fixes.

Diagnostic Context

When encountering Azure Search indexing failures and data source, the first step is understanding what changed. In most production environments, errors do not appear spontaneously. They are triggered by a change in configuration, code, traffic patterns, or the platform itself. Review your deployment history, recent configuration changes, and Azure Service Health notifications to identify potential triggers.

Azure maintains detailed activity logs for every resource operation. These logs capture who made a change, what was changed, when it happened, and from which IP address. Cross-reference the timeline of your error reports with the activity log entries to establish a causal relationship. Often, the fix is simply reverting the most recent change that correlates with the error onset.

If no recent changes are apparent, consider external factors. Azure platform updates, regional capacity changes, and dependent service modifications can all affect your resources. Check the Azure Status page and your subscription’s Service Health blade for any ongoing incidents or planned maintenance that coincides with your issue timeline.

Common Pitfalls to Avoid

When fixing Azure service errors under pressure, engineers sometimes make the situation worse by applying changes too broadly or too quickly. Here are critical pitfalls to avoid during your remediation process.

First, avoid making multiple changes simultaneously. If you change the firewall rules, the connection string, and the service tier all at once, you cannot determine which change actually resolved the issue. Apply one change at a time, verify the result, and document what worked. This disciplined approach builds reliable operational knowledge for your team.

Second, do not disable security controls to bypass errors. Opening all firewall rules, granting overly broad RBAC permissions, or disabling SSL enforcement might eliminate the error message, but it creates security vulnerabilities that are far more dangerous than the original issue. Always find the targeted fix that resolves the error while maintaining your security posture.

Third, test your fix in a non-production environment first when possible. Azure resource configurations can be exported as ARM or Bicep templates and deployed to a test resource group for validation. This extra step takes minutes but can prevent a failed fix from escalating the production incident.

Fourth, document the error message exactly as it appears, including correlation IDs, timestamps, and request IDs. If you need to open a support case with Microsoft, this information dramatically speeds up the investigation. Azure support engineers can use correlation IDs to trace the exact request through Microsoft’s internal logging systems.

Indexer Execution Flow

  1. Data source — Connects to your data (Blob, SQL, Cosmos DB, etc.)
  2. Document cracking — Extracts content from document formats (PDF, DOCX, etc.)
  3. Field mappings — Maps source fields to index fields
  4. Skillset — AI enrichment (OCR, entity extraction, custom skills)
  5. Output field mappings — Maps skillset outputs to index fields
  6. Index — Writes documents to the search index

Checking Indexer Status

# Check indexer status
az search indexer show \
  --name myIndexer \
  --service-name mySearchService \
  --resource-group myRG

# Show indexer execution history
az search indexer show-status \
  --name myIndexer \
  --service-name mySearchService \
  --resource-group myRG
// REST API: Get indexer status
GET https://[service].search.windows.net/indexers/[indexer]/status?api-version=2025-09-01
api-key: [admin-key]

Data Source Connectivity Errors

Firewall Blocking: HTTP 403 Forbidden

Error: "Access denied to Virtual Network/Firewall rules configured on the data source"

Fix

# For Azure SQL: Add search service IP to SQL firewall
az sql server firewall-rule create \
  --server mysqlserver \
  --resource-group myRG \
  --name AllowSearchService \
  --start-ip-address SEARCH_SERVICE_IP \
  --end-ip-address SEARCH_SERVICE_IP

# For Blob Storage: Add the AzureCognitiveSearch service tag
# OR use trusted service exception (preferred)
az storage account update \
  --name mystorageaccount \
  --resource-group myRG \
  --bypass AzureServices

Invalid Credentials

Error: "Credentials provided in the connection string are invalid or have expired"

Verify the connection string in the data source definition. For managed identity authentication:

# Grant search service managed identity access to data source
SEARCH_IDENTITY=$(az search service show \
  --name mySearchService \
  --resource-group myRG \
  --query identity.principalId -o tsv)

# For Blob Storage
az role assignment create \
  --assignee-object-id $SEARCH_IDENTITY \
  --assignee-principal-type ServicePrincipal \
  --role "Storage Blob Data Reader" \
  --scope /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Storage/storageAccounts/myStorageAccount

Cosmos DB Throttling

Error: {"Errors":["Request rate is large"]}

Your Cosmos DB container doesn’t have enough RU/s to serve both the application and the indexer. Solutions:

  • Increase RU/s temporarily during indexing
  • Schedule indexer during off-peak hours
  • Use Cosmos DB change feed with autoscale
  • Ensure Cosmos DB uses Consistent indexing policy (not Lazy)

Document Parsing Errors

Missing Document Key

Error: "Could not read document: Document key property 'id' cannot be missing or empty"

Every document in the index must have a unique key field. The default key field is id or the first string field in the index. Ensure your data source provides a non-empty value for the key field.

Invalid Document Key Characters

Error: "Document key contains invalid characters"
Valid characters: letters, digits, underscore (_), hyphen (-), equals (=)
Maximum length: 1024 characters

Fix: Use Field Mappings for Key

// Map a different source field to the index key
{
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_path",
      "targetFieldName": "id",
      "mappingFunction": {
        "name": "base64Encode"
      }
    }
  ]
}

Content Extraction Limits

Text extraction from documents has tier-specific limits:

Tier Max Characters Extracted
Free 32,000
Basic 64,000
Standard (S1, S2) 4,000,000
Standard S2 8,000,000
Standard S3 16,000,000

Documents exceeding these limits are truncated, not rejected. If you need the full content, upgrade your tier.

Index Write Errors

Term Too Large

Error: "The term is too large. Maximum allowed length is 32766 bytes."

A single field value exceeds the maximum term size for filterable, facetable, or sortable fields.

Fix

// Remove filterable/facetable/sortable from large text fields
{
  "name": "content",
  "type": "Edm.String",
  "searchable": true,
  "filterable": false,   // Don't make large fields filterable
  "facetable": false,     // Don't make large fields facetable
  "sortable": false       // Don't make large fields sortable
}

Complex Collection Limit

Error: "Too many objects in collection. Max 3000 objects across all complex collections per document."

Reduce the number of nested objects in your document, or split complex collections across multiple documents.

Root Cause Analysis Framework

After applying the immediate fix, invest time in a structured root cause analysis. The Five Whys technique is a simple but effective method: start with the error symptom and ask “why” five times to drill down from the surface-level cause to the fundamental issue.

For example, considering Azure Search indexing failures and data source: Why did the service fail? Because the connection timed out. Why did the connection timeout? Because the DNS lookup returned a stale record. Why was the DNS record stale? Because the TTL was set to 24 hours during a migration and never reduced. Why was it not reduced? Because there was no checklist for post-migration cleanup. Why was there no checklist? Because the migration process was ad hoc rather than documented.

This analysis reveals that the root cause is not a technical configuration issue but a process gap that allowed undocumented changes. The preventive action is creating a migration checklist and review process, not just fixing the DNS TTL. Without this depth of analysis, the team will continue to encounter similar issues from different undocumented changes.

Categorize your root causes into buckets: configuration errors, capacity limits, code defects, external dependencies, and process gaps. Track the distribution over time. If most of your incidents fall into the configuration error bucket, invest in infrastructure-as-code validation and policy enforcement. If they fall into capacity limits, improve your monitoring and forecasting. This data-driven approach focuses your improvement efforts where they will have the most impact.

Skillset Execution Errors

Custom Skill Timeout

Error: "Skill did not execute within the time limit"

Default timeout is 30 seconds, maximum is 230 seconds.

// Increase custom skill timeout
{
  "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
  "uri": "https://my-skill-function.azurewebsites.net/api/process",
  "batchSize": 1,
  "timeout": "PT230S",
  "context": "/document",
  "inputs": [{"name": "text", "source": "/document/content"}],
  "outputs": [{"name": "result", "targetName": "enriched"}]
}

Built-in Skill Errors

Error: "Could not extract content or metadata from document"
Causes:
- Blob encrypted with customer key
- Unsupported content type
- File size exceeds tier limit
- Password-protected document

Error Tolerance Configuration

Configure the indexer to continue processing even when individual documents fail:

{
  "name": "myIndexer",
  "parameters": {
    "maxFailedItems": 100,
    "maxFailedItemsPerBatch": 10,
    "configuration": {
      "failOnUnsupportedContentType": false,
      "failOnUnprocessableDocument": false,
      "indexStorageMetadataOnlyForOversizedDocuments": true
    }
  }
}

Change Detection and Missing Updates

SQL Change Detection

-- Enable change tracking on SQL table
ALTER DATABASE myDatabase SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 2 DAYS);
ALTER TABLE myTable ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON);

-- Or use high-water mark with a non-key column
-- Ensure the column updates on every modification
ALTER TABLE myTable ADD LastModified DATETIME DEFAULT GETDATE();
CREATE TRIGGER trg_UpdateLastModified ON myTable AFTER UPDATE AS
    UPDATE t SET LastModified = GETDATE()
    FROM myTable t INNER JOIN inserted i ON t.Id = i.Id;

Blob Change Detection

Blob indexer uses metadata_storage_last_modified by default. Ensure blobs are re-uploaded (not modified in-place) for the timestamp to change.

Error Classification and Severity Assessment

Not all errors require the same response urgency. Classify errors into severity levels based on their impact on users and business operations. A severity 1 error causes complete service unavailability for all users. A severity 2 error degrades functionality for a subset of users. A severity 3 error causes intermittent issues that affect individual operations. A severity 4 error is a cosmetic or minor issue with a known workaround.

For Azure Search indexing failures and data source, map the specific error codes and messages to these severity levels. Create a classification matrix that your on-call team can reference when triaging incoming alerts. This prevents over-escalation of minor issues and under-escalation of critical ones. Include the expected resolution time for each severity level and the communication protocol (who to notify, how frequently to update stakeholders).

Track your error rates over time using Azure Monitor metrics and Log Analytics queries. Establish baseline error rates for healthy operation so you can distinguish between normal background error levels and genuine incidents. A service that normally experiences 0.1 percent error rate might not need investigation when errors spike to 0.2 percent, but a jump to 5 percent warrants immediate attention. Without this baseline context, every alert becomes equally urgent, leading to alert fatigue.

Implement error budgets as part of your SLO framework. An error budget defines the maximum amount of unreliability your service can tolerate over a measurement window (typically monthly or quarterly). When the error budget is exhausted, the team shifts focus from feature development to reliability improvements. This mechanism creates a structured trade-off between innovation velocity and operational stability.

Dependency Management and Service Health

Azure services depend on other Azure services internally, and your application adds additional dependency chains on top. When diagnosing Azure Search indexing failures and data source, map out the complete dependency tree including network dependencies (DNS, load balancers, firewalls), identity dependencies (Azure AD, managed identity endpoints), and data dependencies (storage accounts, databases, key vaults).

Check Azure Service Health for any ongoing incidents or planned maintenance affecting the services in your dependency tree. Azure Service Health provides personalized notifications specific to the services and regions you use. Subscribe to Service Health alerts so your team is notified proactively when Microsoft identifies an issue that might affect your workload.

For each critical dependency, implement a health check endpoint that verifies connectivity and basic functionality. Your application’s readiness probe should verify not just that the application process is running, but that it can successfully reach all of its dependencies. When a dependency health check fails, the application should stop accepting new requests and return a 503 status until the dependency recovers. This prevents requests from queuing up and timing out, which would waste resources and degrade the user experience.

Scheduling Indexers

# Schedule indexer to run every 5 minutes
az rest --method PUT \
  --url "https://mySearchService.search.windows.net/indexers/myIndexer?api-version=2025-09-01" \
  --headers "Content-Type=application/json" "api-key=YOUR_KEY" \
  --body '{
    "dataSourceName": "myDataSource",
    "targetIndexName": "myIndex",
    "schedule": {
      "interval": "PT5M"
    }
  }'

Running indexers on a schedule automatically recovers from transient errors — if a blob or SQL row fails once, it will be retried on the next run.

Diagnostic Checklist

# Quick indexer diagnostic
SERVICE="mySearchService"
RG="myRG"
INDEXER="myIndexer"

echo "=== Indexer Status ==="
az search indexer show-status -n $INDEXER --service-name $SERVICE -g $RG \
  --query "{status:status, lastResult:lastResult.status, itemsProcessed:lastResult.itemsProcessed, itemsFailed:lastResult.itemsFailed}"

echo "=== Data Source ==="
az search data-source show --service-name $SERVICE -g $RG --name myDataSource

echo "=== Index Document Count ==="
curl -s "https://$SERVICE.search.windows.net/indexes/myIndex/docs/\$count?api-version=2025-09-01" \
  -H "api-key: YOUR_KEY"

Prevention Best Practices

  • Schedule indexers — Running every 5 minutes automatically recovers from transient failures
  • Set error tolerance — Configure maxFailedItems to prevent single document failures from stopping the entire batch
  • Use managed identity for data source authentication when possible
  • Use base64Encode for document keys derived from file paths
  • Don’t make large text fields filterable/sortable — Causes “term too large” errors
  • Remove sensitivity labels from documents before indexing
  • Monitor indexer status — Check execution history regularly for silent failures
  • Use Consistent indexing on Cosmos DB (not Lazy) for reliable change detection

Post-Resolution Validation and Hardening

After applying the fix, perform a structured validation to confirm the issue is fully resolved. Do not rely solely on the absence of error messages. Actively verify that the service is functioning correctly by running health checks, executing test transactions, and monitoring key metrics for at least 30 minutes after the change.

Validate from multiple perspectives. Check the Azure resource health status, run your application’s integration tests, verify that dependent services are receiving data correctly, and confirm that end users can complete their workflows. A fix that resolves the immediate error but breaks a downstream integration is not a complete resolution.

Implement defensive monitoring to detect if the issue recurs. Create an Azure Monitor alert rule that triggers on the specific error condition you just fixed. Set the alert to fire within minutes of recurrence so you can respond before the issue impacts users. Include the remediation steps in the alert’s action group notification so that any on-call engineer can apply the fix quickly.

Finally, conduct a brief post-incident review. Document the root cause, the fix applied, the time to detect, diagnose, and resolve the issue, and any preventive measures that should be implemented. Share this documentation with the broader engineering team through a blameless post-mortem process. This transparency transforms individual incidents into organizational learning that raises the entire team’s operational capability.

Consider adding the error scenario to your integration test suite. Automated tests that verify the service behaves correctly under the conditions that triggered the original error provide a safety net against regression. If a future change inadvertently reintroduces the problem, the test will catch it before it reaches production.

Summary

Azure AI Search indexing failures fall into four categories: data source connectivity (firewall rules, credentials, throttling), document parsing (missing keys, invalid characters), index write errors (term size, collection limits), and skill execution timeouts. The error tolerance settings let you control whether the indexer stops or continues on failures. Schedule indexers every 5 minutes for automatic transient error recovery, and always check the indexer execution history for details on which documents failed and why.

For more details, refer to the official documentation: Indexers in Azure AI Search, Data, privacy, and built-in protections in Azure AI Search.

Leave a Reply