How to fix manifest unknown errors when pulling images from ACR

Understanding “Manifest Unknown” Errors in ACR

The error “manifest unknown” when pulling images from Azure Container Registry (ACR) means the registry cannot find the specified image manifest. This is the metadata document that describes an image’s layers and configuration. The error appears as:

Error response from daemon: manifest for myregistry.azurecr.io/myimage:latest manifest unknown: 
manifest tagged by "latest" is not found

This guide covers every cause and fix for this error.

Diagnostic Context

When encountering manifest unknown errors when pulling images from ACR, the first step is understanding what changed. In most production environments, errors do not appear spontaneously. They are triggered by a change in configuration, code, traffic patterns, or the platform itself. Review your deployment history, recent configuration changes, and Azure Service Health notifications to identify potential triggers.

Azure maintains detailed activity logs for every resource operation. These logs capture who made a change, what was changed, when it happened, and from which IP address. Cross-reference the timeline of your error reports with the activity log entries to establish a causal relationship. Often, the fix is simply reverting the most recent change that correlates with the error onset.

If no recent changes are apparent, consider external factors. Azure platform updates, regional capacity changes, and dependent service modifications can all affect your resources. Check the Azure Status page and your subscription’s Service Health blade for any ongoing incidents or planned maintenance that coincides with your issue timeline.

Common Pitfalls to Avoid

When fixing Azure service errors under pressure, engineers sometimes make the situation worse by applying changes too broadly or too quickly. Here are critical pitfalls to avoid during your remediation process.

First, avoid making multiple changes simultaneously. If you change the firewall rules, the connection string, and the service tier all at once, you cannot determine which change actually resolved the issue. Apply one change at a time, verify the result, and document what worked. This disciplined approach builds reliable operational knowledge for your team.

Second, do not disable security controls to bypass errors. Opening all firewall rules, granting overly broad RBAC permissions, or disabling SSL enforcement might eliminate the error message, but it creates security vulnerabilities that are far more dangerous than the original issue. Always find the targeted fix that resolves the error while maintaining your security posture.

Third, test your fix in a non-production environment first when possible. Azure resource configurations can be exported as ARM or Bicep templates and deployed to a test resource group for validation. This extra step takes minutes but can prevent a failed fix from escalating the production incident.

Fourth, document the error message exactly as it appears, including correlation IDs, timestamps, and request IDs. If you need to open a support case with Microsoft, this information dramatically speeds up the investigation. Azure support engineers can use correlation IDs to trace the exact request through Microsoft’s internal logging systems.

Common Causes

Cause Frequency Fix
Wrong image name or tag Most common Verify exact repository and tag
Image not pushed yet Common Push the image first
Tag overwritten or deleted Common Re-push or use digest
Wrong registry name Common Check registry FQDN
Purge policy deleted the image Moderate Re-push, adjust purge rules
Architecture mismatch Less common Pull correct platform manifest
Authentication failure (misleading error) Less common Re-authenticate

Verifying the Image Exists

# List repositories in the registry
az acr repository list --name myregistry --output table

# List tags for a specific repository
az acr repository show-tags \
  --name myregistry \
  --repository myimage \
  --output table \
  --orderby time_desc

# Show manifest details for a specific tag
az acr repository show-manifests \
  --name myregistry \
  --repository myimage \
  --query "[?tags[?@ == 'latest']]" \
  --output table

# Show full manifest metadata
az acr manifest show \
  --registry myregistry \
  --name myimage:latest

Wrong Image Name or Tag

# Common mistakes in image references:
# WRONG: Missing registry prefix
docker pull myimage:latest

# WRONG: Wrong registry name
docker pull myregistry.azurecr.io/myimage:latest
# when the actual registry is myregistry2.azurecr.io

# WRONG: Wrong tag (case-sensitive!)
docker pull myregistry.azurecr.io/myimage:Latest  # Tags are case-sensitive

# CORRECT:
docker pull myregistry.azurecr.io/myimage:latest

# Check exact repository name and tags
az acr repository list --name myregistry --output table
az acr repository show-tags --name myregistry --repository myimage

Image Not Pushed

# Build and push the image
# Step 1: Tag the image with your registry
docker tag myimage:latest myregistry.azurecr.io/myimage:latest

# Step 2: Login to ACR
az acr login --name myregistry

# Step 3: Push
docker push myregistry.azurecr.io/myimage:latest

# Or use ACR Tasks for cloud-based builds
az acr build \
  --registry myregistry \
  --image myimage:latest \
  --file Dockerfile \
  .

Authentication Issues

Some authentication failures manifest as “manifest unknown” rather than a clear 401 error, especially with token-based authentication.

# Re-authenticate to ACR
az acr login --name myregistry

# Check login status
docker login myregistry.azurecr.io

# For service principal authentication
docker login myregistry.azurecr.io \
  --username $SP_APP_ID \
  --password $SP_PASSWORD

# For managed identity (from Azure VM/App Service)
az acr login --name myregistry --expose-token

# Run ACR health check
az acr check-health \
  --name myregistry \
  --yes

Admin User vs Service Principal

# Check if admin user is enabled
az acr show --name myregistry --query "adminUserEnabled"

# Enable admin user (not recommended for production)
az acr update --name myregistry --admin-enabled true

# Get admin credentials
az acr credential show --name myregistry

# Better: Create a service principal with AcrPull role
az ad sp create-for-rbac \
  --name myregistry-pull \
  --role AcrPull \
  --scopes $(az acr show --name myregistry --query id -o tsv)

Token Expiration

ACR access tokens expire after 3 hours. Long-running CI/CD pipelines may encounter “manifest unknown” after the token expires:

# Re-login before pulling in CI pipelines
az acr login --name myregistry
docker pull myregistry.azurecr.io/myimage:latest

# In Kubernetes, use imagePullSecrets with a service principal
kubectl create secret docker-registry acr-secret \
  --docker-server=myregistry.azurecr.io \
  --docker-username=$SP_APP_ID \
  --docker-password=$SP_PASSWORD

Root Cause Analysis Framework

After applying the immediate fix, invest time in a structured root cause analysis. The Five Whys technique is a simple but effective method: start with the error symptom and ask “why” five times to drill down from the surface-level cause to the fundamental issue.

For example, considering manifest unknown errors when pulling images from ACR: Why did the service fail? Because the connection timed out. Why did the connection timeout? Because the DNS lookup returned a stale record. Why was the DNS record stale? Because the TTL was set to 24 hours during a migration and never reduced. Why was it not reduced? Because there was no checklist for post-migration cleanup. Why was there no checklist? Because the migration process was ad hoc rather than documented.

This analysis reveals that the root cause is not a technical configuration issue but a process gap that allowed undocumented changes. The preventive action is creating a migration checklist and review process, not just fixing the DNS TTL. Without this depth of analysis, the team will continue to encounter similar issues from different undocumented changes.

Categorize your root causes into buckets: configuration errors, capacity limits, code defects, external dependencies, and process gaps. Track the distribution over time. If most of your incidents fall into the configuration error bucket, invest in infrastructure-as-code validation and policy enforcement. If they fall into capacity limits, improve your monitoring and forecasting. This data-driven approach focuses your improvement efforts where they will have the most impact.

Purge Policy Deleted the Image

# Check if a purge task exists
az acr task list --registry myregistry --output table

# Show purge task details
az acr task show --registry myregistry --name purge-task

# Example: Purge policy that deletes images older than 30 days
# This might have deleted the image you're trying to pull
az acr run \
  --cmd "acr purge --filter 'myimage:.*' --ago 30d --untagged" \
  --registry myregistry \
  /dev/null

# To prevent accidental deletion, lock important tags
az acr repository update \
  --name myregistry \
  --image myimage:production \
  --write-enabled false \
  --delete-enabled false

Multi-Architecture Images

# Check image architecture
az acr manifest show \
  --registry myregistry \
  --name myimage:latest \
  --query "references[].{Platform:platform, Digest:digest}" \
  --output table

# Pull specific platform
docker pull --platform linux/amd64 myregistry.azurecr.io/myimage:latest
docker pull --platform linux/arm64 myregistry.azurecr.io/myimage:latest

# Build multi-arch image
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --tag myregistry.azurecr.io/myimage:latest \
  --push \
  .

Using Image Digests Instead of Tags

# Tags can be overwritten or deleted — digests are immutable
# Get the digest for a specific tag
az acr manifest show \
  --registry myregistry \
  --name myimage:latest \
  --query "digest" -o tsv

# Pull by digest (guaranteed to get exact image)
docker pull myregistry.azurecr.io/myimage@sha256:abc123def456...

# In Kubernetes, use digest references
# image: myregistry.azurecr.io/myimage@sha256:abc123def456...

Geo-Replication Issues

# If using geo-replicated registry, images may not be synced yet
# Check replication status
az acr replication list --registry myregistry --output table

# Check replication status for specific image
az acr manifest show \
  --registry myregistry \
  --name myimage:latest

# Force pull from home region
docker pull myregistry.azurecr.io/myimage:latest

Error Classification and Severity Assessment

Not all errors require the same response urgency. Classify errors into severity levels based on their impact on users and business operations. A severity 1 error causes complete service unavailability for all users. A severity 2 error degrades functionality for a subset of users. A severity 3 error causes intermittent issues that affect individual operations. A severity 4 error is a cosmetic or minor issue with a known workaround.

For manifest unknown errors when pulling images from ACR, map the specific error codes and messages to these severity levels. Create a classification matrix that your on-call team can reference when triaging incoming alerts. This prevents over-escalation of minor issues and under-escalation of critical ones. Include the expected resolution time for each severity level and the communication protocol (who to notify, how frequently to update stakeholders).

Track your error rates over time using Azure Monitor metrics and Log Analytics queries. Establish baseline error rates for healthy operation so you can distinguish between normal background error levels and genuine incidents. A service that normally experiences 0.1 percent error rate might not need investigation when errors spike to 0.2 percent, but a jump to 5 percent warrants immediate attention. Without this baseline context, every alert becomes equally urgent, leading to alert fatigue.

Implement error budgets as part of your SLO framework. An error budget defines the maximum amount of unreliability your service can tolerate over a measurement window (typically monthly or quarterly). When the error budget is exhausted, the team shifts focus from feature development to reliability improvements. This mechanism creates a structured trade-off between innovation velocity and operational stability.

Dependency Management and Service Health

Azure services depend on other Azure services internally, and your application adds additional dependency chains on top. When diagnosing manifest unknown errors when pulling images from ACR, map out the complete dependency tree including network dependencies (DNS, load balancers, firewalls), identity dependencies (Azure AD, managed identity endpoints), and data dependencies (storage accounts, databases, key vaults).

Check Azure Service Health for any ongoing incidents or planned maintenance affecting the services in your dependency tree. Azure Service Health provides personalized notifications specific to the services and regions you use. Subscribe to Service Health alerts so your team is notified proactively when Microsoft identifies an issue that might affect your workload.

For each critical dependency, implement a health check endpoint that verifies connectivity and basic functionality. Your application’s readiness probe should verify not just that the application process is running, but that it can successfully reach all of its dependencies. When a dependency health check fails, the application should stop accepting new requests and return a 503 status until the dependency recovers. This prevents requests from queuing up and timing out, which would waste resources and degrade the user experience.

AKS Integration

# Attach ACR to AKS cluster (recommended approach)
az aks update \
  --name myAKS \
  --resource-group myRG \
  --attach-acr myregistry

# Verify the AKS identity has AcrPull role
az role assignment list \
  --scope $(az acr show --name myregistry --query id -o tsv) \
  --query "[?roleDefinitionName=='AcrPull'].{Principal:principalName, Role:roleDefinitionName}" \
  --output table

# Debug image pull failures in AKS
kubectl describe pod mypod | grep -A5 "Events"
# Look for: "Failed to pull image" or "manifest unknown"

# Check imagePullSecrets in pod spec
kubectl get pod mypod -o jsonpath='{.spec.imagePullSecrets}'

Container Apps and App Service

# Container Apps: Configure ACR access with managed identity
az containerapp update \
  --name myApp \
  --resource-group myRG \
  --registry-server myregistry.azurecr.io \
  --registry-identity system

# App Service: Configure ACR access
az webapp config container set \
  --name myWebApp \
  --resource-group myRG \
  --container-image-name myregistry.azurecr.io/myimage:latest \
  --container-registry-url https://myregistry.azurecr.io \
  --container-registry-user $SP_APP_ID \
  --container-registry-password $SP_PASSWORD

Diagnostic Commands

# Comprehensive ACR health check
az acr check-health \
  --name myregistry \
  --yes

# Check connectivity from current machine
az acr check-health \
  --name myregistry \
  --vnet myVNet  # If using private endpoint

# View ACR diagnostic logs
az monitor diagnostic-settings create \
  --name myDiag \
  --resource $(az acr show --name myregistry --query id -o tsv) \
  --workspace myLogAnalytics \
  --logs '[
    {"category": "ContainerRegistryRepositoryEvents", "enabled": true},
    {"category": "ContainerRegistryLoginEvents", "enabled": true}
  ]'
-- KQL: Find failed pull attempts
ContainerRegistryRepositoryEvents
| where OperationName == "Pull"
| where ResultType == "Failed"
| project TimeGenerated, Repository, Tag, CallerIpAddress, ResultDescription
| order by TimeGenerated desc

Post-Resolution Validation and Hardening

After applying the fix, perform a structured validation to confirm the issue is fully resolved. Do not rely solely on the absence of error messages. Actively verify that the service is functioning correctly by running health checks, executing test transactions, and monitoring key metrics for at least 30 minutes after the change.

Validate from multiple perspectives. Check the Azure resource health status, run your application’s integration tests, verify that dependent services are receiving data correctly, and confirm that end users can complete their workflows. A fix that resolves the immediate error but breaks a downstream integration is not a complete resolution.

Implement defensive monitoring to detect if the issue recurs. Create an Azure Monitor alert rule that triggers on the specific error condition you just fixed. Set the alert to fire within minutes of recurrence so you can respond before the issue impacts users. Include the remediation steps in the alert’s action group notification so that any on-call engineer can apply the fix quickly.

Finally, conduct a brief post-incident review. Document the root cause, the fix applied, the time to detect, diagnose, and resolve the issue, and any preventive measures that should be implemented. Share this documentation with the broader engineering team through a blameless post-mortem process. This transparency transforms individual incidents into organizational learning that raises the entire team’s operational capability.

Consider adding the error scenario to your integration test suite. Automated tests that verify the service behaves correctly under the conditions that triggered the original error provide a safety net against regression. If a future change inadvertently reintroduces the problem, the test will catch it before it reaches production.

Summary

“Manifest unknown” errors when pulling from ACR are most commonly caused by referencing a non-existent image name or tag (typos, case sensitivity), the image not being pushed yet, or authentication failures that present misleadingly as manifest errors. Always verify the exact repository name and tag with az acr repository show-tags, run az acr check-health to diagnose connectivity and auth issues, re-authenticate with az acr login before pulling, and consider using image digests instead of mutable tags for production deployments.

For more details, refer to the official documentation: Introduction to Azure Container Registry, Troubleshoot registry login, Troubleshoot registry performance.

Leave a Reply