How to fix Azure Blob upload failures due to size or timeout limits

Understanding Azure Blob Upload Size and Timeout Limits

Azure Blob Storage is designed to handle massive amounts of unstructured data, supporting individual blobs up to 190.7 TiB. However, uploading data to Blob Storage involves specific size limits, timeout constraints, and throttling thresholds that can cause upload failures if not properly understood and handled. Whether you’re uploading small configuration files or multi-terabyte datasets, knowing these limits — and how to work within them — is essential.

This guide covers every upload failure scenario, from single-blob size limits to storage account throttling, with tested solutions and production-ready code examples.

Diagnostic Context

When encountering Azure Blob upload failures due to size or timeout, the first step is understanding what changed. In most production environments, errors do not appear spontaneously. They are triggered by a change in configuration, code, traffic patterns, or the platform itself. Review your deployment history, recent configuration changes, and Azure Service Health notifications to identify potential triggers.

Azure maintains detailed activity logs for every resource operation. These logs capture who made a change, what was changed, when it happened, and from which IP address. Cross-reference the timeline of your error reports with the activity log entries to establish a causal relationship. Often, the fix is simply reverting the most recent change that correlates with the error onset.

If no recent changes are apparent, consider external factors. Azure platform updates, regional capacity changes, and dependent service modifications can all affect your resources. Check the Azure Status page and your subscription’s Service Health blade for any ongoing incidents or planned maintenance that coincides with your issue timeline.

Common Pitfalls to Avoid

When fixing Azure service errors under pressure, engineers sometimes make the situation worse by applying changes too broadly or too quickly. Here are critical pitfalls to avoid during your remediation process.

First, avoid making multiple changes simultaneously. If you change the firewall rules, the connection string, and the service tier all at once, you cannot determine which change actually resolved the issue. Apply one change at a time, verify the result, and document what worked. This disciplined approach builds reliable operational knowledge for your team.

Second, do not disable security controls to bypass errors. Opening all firewall rules, granting overly broad RBAC permissions, or disabling SSL enforcement might eliminate the error message, but it creates security vulnerabilities that are far more dangerous than the original issue. Always find the targeted fix that resolves the error while maintaining your security posture.

Third, test your fix in a non-production environment first when possible. Azure resource configurations can be exported as ARM or Bicep templates and deployed to a test resource group for validation. This extra step takes minutes but can prevent a failed fix from escalating the production incident.

Fourth, document the error message exactly as it appears, including correlation IDs, timestamps, and request IDs. If you need to open a support case with Microsoft, this information dramatically speeds up the investigation. Azure support engineers can use correlation IDs to trace the exact request through Microsoft’s internal logging systems.

Blob Upload Size Limits Reference

Azure Blob Storage supports three blob types, each with different size constraints:

Blob Type	Max Single Upload (Put Blob)	Max Block/Page Size	Max Total Blob Size
Block Blob (API v2019-12-12+)	5,000 MiB	4,000 MiB per block	~190.7 TiB (50,000 blocks)
Block Blob (older API versions)	256 MiB	100 MiB per block	~4.75 TiB
Append Blob	N/A	4 MiB per block	~195 GiB (50,000 blocks)
Page Blob	N/A	4 MiB per page	8 TiB

Storage Account-Level Limits

Beyond individual blob limits, the storage account itself has throughput and request rate constraints:

Limit	Value (Major Regions)	Value (Other Regions)
Max account capacity	5 PiB (default)	5 PiB (default)
Max request rate	40,000 requests/sec	20,000 requests/sec
Max ingress	60 Gbps	25 Gbps
Max egress	200 Gbps	50 Gbps

Single Upload Exceeds Put Blob Limit

Error: 413 Request Entity Too Large

When you attempt to upload a file larger than the Put Blob maximum size (5,000 MiB on modern API versions) in a single request, the server rejects the upload immediately.

Fix: Use Block Upload

For files larger than 256 MiB, always use the block upload pattern: split the file into blocks, upload each block individually with Put Block, then commit all blocks with Put Block List.

# PowerShell: Upload large file using block upload
$storageAccount = "mystorageaccount"
$containerName = "uploads"
$blobName = "largefile.dat"
$filePath = "C:\data\largefile.dat"
$blockSizeMB = 100  # 100 MB blocks

$context = New-AzStorageContext -StorageAccountName $storageAccount -StorageAccountKey $key

# Set-AzStorageBlobContent handles block upload automatically for large files
Set-AzStorageBlobContent `
  -File $filePath `
  -Container $containerName `
  -Blob $blobName `
  -Context $context `
  -Force

# Python SDK: Upload with automatic chunking
from azure.storage.blob import BlobServiceClient, BlobClient

blob_service = BlobServiceClient.from_connection_string(conn_str)
blob_client = blob_service.get_blob_client(container="uploads", blob="largefile.dat")

# max_single_put_size controls when SDK switches to block upload
# max_block_size controls individual block size
with open("largefile.dat", "rb") as data:
    blob_client.upload_blob(
        data,
        overwrite=True,
        max_single_put_size=64 * 1024 * 1024,   # 64 MB threshold
        max_block_size=100 * 1024 * 1024,         # 100 MB blocks
        max_concurrency=8                          # Parallel block uploads
    )

Using AzCopy for Large Uploads

AzCopy is Microsoft’s recommended tool for large file transfers. It handles block uploads, parallelism, retries, and resume automatically.

# Upload single large file
azcopy copy "largefile.dat" \
  "https://mystorageaccount.blob.core.windows.net/uploads/largefile.dat?sv=..." \
  --block-size-mb 100 \
  --cap-mbps 1000

# Upload entire directory
azcopy copy "/data/uploads/*" \
  "https://mystorageaccount.blob.core.windows.net/uploads/?sv=..." \
  --recursive \
  --put-md5

# Resume a failed transfer
azcopy jobs resume <job-id>

Storage Account Throttling — 503 Server Busy

Error: HTTP 503 Server Busy

When your upload rate exceeds the storage account’s partition server capacity, Azure returns HTTP 503. This is not a hard limit but a dynamic throttling response based on current load.

Diagnosis

# Check storage account metrics
az monitor metrics list \
  --resource /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Storage/storageAccounts/{account} \
  --metric "Transactions" \
  --dimension "ResponseType" \
  --interval PT1H \
  --start-time 2024-01-01T00:00:00Z

Fix: Implement Exponential Backoff

# Python: Upload with exponential backoff retry
from azure.storage.blob import BlobServiceClient
from azure.core.exceptions import ServiceResponseError
import time

def upload_with_retry(blob_client, data, max_retries=5):
    for attempt in range(max_retries):
        try:
            blob_client.upload_blob(data, overwrite=True)
            return True
        except ServiceResponseError as e:
            if "503" in str(e) or "Server Busy" in str(e):
                wait_time = (2 ** attempt) + (random.random() * 0.5)
                print(f"Throttled. Retrying in {wait_time:.1f}s...")
                time.sleep(wait_time)
                data.seek(0)  # Reset stream position
            else:
                raise
    raise Exception("Max retries exceeded")

// C#: BlobClientOptions with retry policy
var options = new BlobClientOptions();
options.Retry.MaxRetries = 5;
options.Retry.Mode = RetryMode.Exponential;
options.Retry.Delay = TimeSpan.FromSeconds(1);
options.Retry.MaxDelay = TimeSpan.FromSeconds(60);
options.Retry.NetworkTimeout = TimeSpan.FromMinutes(10);

var blobServiceClient = new BlobServiceClient(connectionString, options);

Operation Timeout — 500 Internal Server Error

Error: HTTP 500 Operation Timeout

This error occurs when a single upload operation takes longer than the server-side timeout allows. It’s common with large single-block uploads over slow connections.

Fix

Reduce block size — Smaller blocks upload faster per-request
Increase client-side timeout — Match the timeout to your expected upload duration
Use parallel uploads — Upload multiple blocks simultaneously

# Azure CLI: Set longer timeout
az storage blob upload \
  --account-name mystorageaccount \
  --container-name uploads \
  --name largefile.dat \
  --file ./largefile.dat \
  --timeout 600 \
  --max-connections 8

# PowerShell: Configure timeout in storage context
$context = New-AzStorageContext `
  -StorageAccountName $accountName `
  -StorageAccountKey $key

# ServerTimeoutPerRequest in seconds
Set-AzStorageBlobContent `
  -File "largefile.dat" `
  -Container "uploads" `
  -Blob "largefile.dat" `
  -Context $context `
  -ServerTimeoutPerRequest 600 `
  -ConcurrentTaskCount 8

SAS Token Issues

Error: 403 Forbidden

Upload failures with 403 errors typically indicate SAS token problems rather than true permission issues.

Common SAS Token Issues

Token expired — Check the se (expiry) parameter in the SAS URL
Missing write permission — SAS must include sp=w (write) or sp=c (create)
IP restriction — Check sip parameter in the SAS URL
Protocol restriction — SAS may require HTTPS only (spr=https)
Wrong service version — Old sv parameter may not support your operation

# Generate a SAS token with proper permissions for upload
az storage container generate-sas \
  --account-name mystorageaccount \
  --name uploads \
  --permissions cwl \
  --expiry $(date -u -d "+1 day" '+%Y-%m-%dT%H:%MZ') \
  --https-only

# Generate account-level SAS
az storage account generate-sas \
  --account-name mystorageaccount \
  --services b \
  --resource-types co \
  --permissions cwl \
  --expiry $(date -u -d "+1 day" '+%Y-%m-%dT%H:%MZ') \
  --https-only

Root Cause Analysis Framework

After applying the immediate fix, invest time in a structured root cause analysis. The Five Whys technique is a simple but effective method: start with the error symptom and ask “why” five times to drill down from the surface-level cause to the fundamental issue.

For example, considering Azure Blob upload failures due to size or timeout: Why did the service fail? Because the connection timed out. Why did the connection timeout? Because the DNS lookup returned a stale record. Why was the DNS record stale? Because the TTL was set to 24 hours during a migration and never reduced. Why was it not reduced? Because there was no checklist for post-migration cleanup. Why was there no checklist? Because the migration process was ad hoc rather than documented.

This analysis reveals that the root cause is not a technical configuration issue but a process gap that allowed undocumented changes. The preventive action is creating a migration checklist and review process, not just fixing the DNS TTL. Without this depth of analysis, the team will continue to encounter similar issues from different undocumented changes.

Categorize your root causes into buckets: configuration errors, capacity limits, code defects, external dependencies, and process gaps. Track the distribution over time. If most of your incidents fall into the configuration error bucket, invest in infrastructure-as-code validation and policy enforcement. If they fall into capacity limits, improve your monitoring and forecasting. This data-driven approach focuses your improvement efforts where they will have the most impact.

Lease Conflicts — 409 Conflict

Error: HTTP 409 Conflict — There is currently a lease on the blob

Active leases on a blob prevent overwriting. This commonly happens when a previous upload was interrupted or when another process holds a lease.

# Break the lease on a blob
az storage blob lease break \
  --account-name mystorageaccount \
  --container-name uploads \
  --blob-name myfile.dat

# Then retry the upload
az storage blob upload \
  --account-name mystorageaccount \
  --container-name uploads \
  --name myfile.dat \
  --file ./myfile.dat \
  --overwrite

Hot Partition Throttling

Azure Blob Storage automatically partitions data. When many blobs with similar names are uploaded rapidly, they may land on the same partition server, causing localized throttling even when account-level limits are not reached.

Symptoms

Intermittent 503 errors during bulk uploads
Performance degrades as upload batch size increases
Some uploads succeed while others in the same batch fail

Fix: Distribute Blob Names

# Bad: Sequential names create hot partitions
# logs/2024-01-01.log, logs/2024-01-02.log, logs/2024-01-03.log

# Good: Prefix with hash for distribution
import hashlib

def distributed_blob_name(original_name):
    hash_prefix = hashlib.md5(original_name.encode()).hexdigest()[:4]
    return f"{hash_prefix}/{original_name}"

# Result: a1b2/logs/2024-01-01.log, c3d4/logs/2024-01-02.log

Append Blob Specific Limits

Append blobs have stricter limits than block blobs and are commonly used for logging. Each append operation is limited to 4 MiB, and the total blob size cannot exceed approximately 195 GiB.

Error: Append Block Size Exceeded

# Python: Safely append data with size checking
from azure.storage.blob import BlobServiceClient

MAX_APPEND_BLOCK = 4 * 1024 * 1024  # 4 MiB
MAX_APPEND_BLOB = 195 * 1024 * 1024 * 1024  # ~195 GiB

def safe_append(blob_client, data):
    # Check current blob size
    props = blob_client.get_blob_properties()
    current_size = props.size
    
    if current_size + len(data) > MAX_APPEND_BLOB:
        raise ValueError("Append would exceed maximum blob size")
    
    # Split into chunks if data exceeds max block size
    for i in range(0, len(data), MAX_APPEND_BLOCK):
        chunk = data[i:i + MAX_APPEND_BLOCK]
        blob_client.append_block(chunk)

Network-Level Upload Failures

Connection Reset and Socket Errors

Network instability between the client and Azure Storage can cause upload failures, especially for long-running transfers.

# Python: Robust upload with connection retry
from azure.storage.blob import BlobServiceClient
from azure.core.pipeline.policies import RetryPolicy

retry_policy = RetryPolicy(
    retry_total=10,
    retry_connect=5,
    retry_read=5,
    retry_status=5,
    retry_backoff_factor=0.5
)

blob_service = BlobServiceClient.from_connection_string(
    conn_str,
    retry_policy=retry_policy,
    connection_timeout=300,
    read_timeout=600
)

Error Classification and Severity Assessment

Not all errors require the same response urgency. Classify errors into severity levels based on their impact on users and business operations. A severity 1 error causes complete service unavailability for all users. A severity 2 error degrades functionality for a subset of users. A severity 3 error causes intermittent issues that affect individual operations. A severity 4 error is a cosmetic or minor issue with a known workaround.

For Azure Blob upload failures due to size or timeout, map the specific error codes and messages to these severity levels. Create a classification matrix that your on-call team can reference when triaging incoming alerts. This prevents over-escalation of minor issues and under-escalation of critical ones. Include the expected resolution time for each severity level and the communication protocol (who to notify, how frequently to update stakeholders).

Track your error rates over time using Azure Monitor metrics and Log Analytics queries. Establish baseline error rates for healthy operation so you can distinguish between normal background error levels and genuine incidents. A service that normally experiences 0.1 percent error rate might not need investigation when errors spike to 0.2 percent, but a jump to 5 percent warrants immediate attention. Without this baseline context, every alert becomes equally urgent, leading to alert fatigue.

Implement error budgets as part of your SLO framework. An error budget defines the maximum amount of unreliability your service can tolerate over a measurement window (typically monthly or quarterly). When the error budget is exhausted, the team shifts focus from feature development to reliability improvements. This mechanism creates a structured trade-off between innovation velocity and operational stability.

Dependency Management and Service Health

Azure services depend on other Azure services internally, and your application adds additional dependency chains on top. When diagnosing Azure Blob upload failures due to size or timeout, map out the complete dependency tree including network dependencies (DNS, load balancers, firewalls), identity dependencies (Azure AD, managed identity endpoints), and data dependencies (storage accounts, databases, key vaults).

Check Azure Service Health for any ongoing incidents or planned maintenance affecting the services in your dependency tree. Azure Service Health provides personalized notifications specific to the services and regions you use. Subscribe to Service Health alerts so your team is notified proactively when Microsoft identifies an issue that might affect your workload.

For each critical dependency, implement a health check endpoint that verifies connectivity and basic functionality. Your application’s readiness probe should verify not just that the application process is running, but that it can successfully reach all of its dependencies. When a dependency health check fails, the application should stop accepting new requests and return a 503 status until the dependency recovers. This prevents requests from queuing up and timing out, which would waste resources and degrade the user experience.

Monitoring Upload Performance

# Monitor storage account metrics
az monitor metrics list \
  --resource /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Storage/storageAccounts/{account} \
  --metric "BlobCount,BlobCapacity,Ingress,Egress,SuccessServerLatency,Transactions" \
  --interval PT5M \
  --start-time $(date -u -d "-1 hour" '+%Y-%m-%dT%H:%M:%SZ')

# Check for throttled transactions
az monitor metrics list \
  --resource /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Storage/storageAccounts/{account} \
  --metric "Transactions" \
  --dimension "ResponseType" \
  --filter "ResponseType eq 'ServerBusy' or ResponseType eq 'ServerTimeoutError'" \
  --interval PT1H

Premium Block Blob Accounts

For high-transaction workloads, consider Premium Block Blob storage accounts, which offer:

Consistent low-latency performance
Higher transaction rates per partition
SSD-backed storage

# Create premium block blob account
az storage account create \
  --name mypremiumaccount \
  --resource-group myRG \
  --location eastus \
  --kind BlockBlobStorage \
  --sku Premium_LRS \
  --min-tls-version TLS1_2

Prevention Best Practices

Use AzCopy for large transfers — It handles parallelism, retry, and resume automatically
Use block uploads for files over 256 MiB — Never attempt single-request uploads for large files
Implement exponential backoff — All storage SDKs support configurable retry policies
Distribute blob names — Use hash prefixes to avoid hot partitions during bulk uploads
Monitor ingress metrics — Set alerts when approaching account throughput limits
Set appropriate timeouts — Match client-side timeouts to expected upload duration
Use managed identities instead of SAS tokens where possible to avoid expiration issues
Choose the right blob type — Block blobs for general uploads, append blobs for logging, page blobs for random access

Post-Resolution Validation and Hardening

After applying the fix, perform a structured validation to confirm the issue is fully resolved. Do not rely solely on the absence of error messages. Actively verify that the service is functioning correctly by running health checks, executing test transactions, and monitoring key metrics for at least 30 minutes after the change.

Validate from multiple perspectives. Check the Azure resource health status, run your application’s integration tests, verify that dependent services are receiving data correctly, and confirm that end users can complete their workflows. A fix that resolves the immediate error but breaks a downstream integration is not a complete resolution.

Implement defensive monitoring to detect if the issue recurs. Create an Azure Monitor alert rule that triggers on the specific error condition you just fixed. Set the alert to fire within minutes of recurrence so you can respond before the issue impacts users. Include the remediation steps in the alert’s action group notification so that any on-call engineer can apply the fix quickly.

Finally, conduct a brief post-incident review. Document the root cause, the fix applied, the time to detect, diagnose, and resolve the issue, and any preventive measures that should be implemented. Share this documentation with the broader engineering team through a blameless post-mortem process. This transparency transforms individual incidents into organizational learning that raises the entire team’s operational capability.

Consider adding the error scenario to your integration test suite. Automated tests that verify the service behaves correctly under the conditions that triggered the original error provide a safety net against regression. If a future change inadvertently reintroduces the problem, the test will catch it before it reaches production.

Summary

Azure Blob Storage upload failures come down to three categories: size limit violations, account-level throttling, and network or authentication issues. For size limits, always use block upload patterns for files over 256 MiB. For throttling, implement exponential backoff retries and distribute blob naming to avoid hot partitions. For authentication failures, validate SAS token permissions and expiry. AzCopy remains the most robust tool for bulk uploads, handling all of these concerns automatically.

For more details, refer to the official documentation: What is Azure Blob storage?, Introduction to Azure Blob Storage, Scalability and performance targets for Blob storage.

Zeeshan

My technology den