Understanding Request Units and Throttling in Cosmos DB
Azure Cosmos DB uses Request Units (RUs) as its unified currency for database operations. Every read, write, query, and stored procedure execution consumes a specific number of RUs. When your operations consume more RUs per second than your provisioned throughput allows, Cosmos DB returns HTTP 429 (RequestRateTooLarge) errors — commonly known as throttling. While the SDKs automatically retry throttled requests (up to 9 times by default), excessive throttling degrades application performance and user experience.
This guide covers how to diagnose throttling, reduce RU consumption through query and data modeling optimization, and configure throughput settings that balance cost against performance requirements.
Strategic Context
Implementing How to optimize RU consumption in Azure Cosmos DB to avoid throttling in a production environment requires careful planning that goes beyond the technical configuration. Consider the operational model, the team skills required to manage the solution, the integration points with existing systems, and the long-term maintenance burden.
A solution that works brilliantly in a proof-of-concept may become an operational liability if it requires specialized expertise that your team does not possess or if it creates tight coupling with services that have different lifecycle management requirements. Evaluate the total cost of ownership including development, deployment, monitoring, incident response, and ongoing maintenance.
Azure’s shared responsibility model means that while Microsoft manages the underlying infrastructure, you are responsible for the application architecture, data management, access configuration, and operational procedures built on top of that infrastructure. This guide provides the implementation details for a robust solution, but adapt the patterns to fit your organization’s specific requirements, compliance constraints, and operational maturity level.
How RU Consumption Works
Every Cosmos DB operation has a deterministic RU cost based on several factors:
| Factor | Impact on RU Cost |
|---|---|
| Item size | Larger items consume more RUs (roughly proportional) |
| Indexing | More indexed properties = higher write RU cost |
| Consistency level | Strong/Bounded Staleness costs 2x compared to Session/Eventual |
| Query complexity | Cross-partition queries, aggregations, and JOINs cost more |
| Operation type | Writes cost more than reads; replaces cost ~2x inserts |
Baseline RU Costs
- Point read (1 KB item): ~1 RU
- Point read (100 KB item): ~10 RUs
- Insert (1 KB item, default indexing): ~5.5 RUs
- Replace (1 KB item): ~10 RUs (approximately 2x insert)
- Delete (1 KB item): ~5.5 RUs
- Cross-partition query: Variable, significantly more than single-partition
Every API response includes an x-ms-request-charge header that reports the actual RU cost of the operation. Always check this header when optimizing.
Diagnosing Throttling Issues
Step 1: Check 429 Metrics in Azure Monitor
- Navigate to your Cosmos DB account in the Azure portal.
- Click Insights → Requests tab.
- Look at Total Requests by Status Code — filter for status code 429.
- Microsoft considers 1-5% of requests being 429s as acceptable if latency remains within SLA.
Step 2: Identify Hot Partitions
Throttling often affects only one or a few physical partitions, even when overall throughput appears sufficient. This is the hot partition problem.
- Go to Insights → Throughput tab.
- Look at Normalized RU Consumption (%) By PartitionKeyRangeID.
- If one PartitionKeyRangeId is at 100% while others are at 20-30%, you have a hot partition.
Hot partitions mean one physical partition is receiving disproportionate traffic. Increasing total throughput does not help — the bottleneck is on a single partition. The solution is to redesign the partition key.
Step 3: Identify Expensive Operations
Enable Azure Diagnostics and send logs to a Log Analytics workspace, then query for the most RU-intensive operations:
// Top operations by RU consumption in the last 24 hours
CDBDataPlaneRequests
| where TimeGenerated >= ago(24h)
| summarize
TotalRU = sum(RequestCharge),
AvgRU = avg(RequestCharge),
Count = count(),
ThrottledCount = countif(StatusCode == 429)
by DatabaseName, CollectionName, OperationName
| extend ThrottleRate = round(100.0 * ThrottledCount / Count, 2)
| order by TotalRU desc
// Find the most expensive individual queries
CDBDataPlaneRequests
| where TimeGenerated >= ago(24h)
| where OperationName == "Query"
| project TimeGenerated, DatabaseName, CollectionName,
RequestCharge, DurationMs, ResponseLength,
QueryText = tostring(RequestBody)
| order by RequestCharge desc
| take 20
Step 4: Identify Hot Partition Keys
// Top partition keys consuming the most RUs
CDBPartitionKeyRUConsumption
| where TimeGenerated >= ago(24h)
| where CollectionName == "myContainer"
| where isnotempty(PartitionKey)
| summarize TotalRU = sum(RequestCharge) by PartitionKey, OperationName
| order by TotalRU desc
| take 20
Optimizing Read Operations
Use Point Reads Instead of Queries
A point read retrieves a document by its id and partition key — this is the cheapest possible read operation at ~1 RU for a 1 KB item. SQL queries against the same data can cost 5-50x more depending on complexity.
// Point read — ~1 RU for a 1 KB item
ItemResponse<Product> response = await container.ReadItemAsync<Product>(
id: "product-123",
partitionKey: new PartitionKey("electronics")
);
Console.WriteLine($"RU charge: {response.RequestCharge}");
// Query for the same item — typically 3-10+ RUs
var query = container.GetItemQueryIterator<Product>(
new QueryDefinition("SELECT * FROM c WHERE c.id = @id")
.WithParameter("@id", "product-123")
);
// RU cost shown in response.RequestCharge
Choose the Right Consistency Level
Strong and Bounded Staleness consistency levels cost 2x the RUs of Session, Consistent Prefix, or Eventual consistency. For most applications, Session consistency provides the right balance — a user always sees their own writes while reads from other sessions are eventually consistent.
You can also override the account-level consistency on a per-request basis:
var options = new ItemRequestOptions
{
ConsistencyLevel = ConsistencyLevel.Eventual
};
var response = await container.ReadItemAsync<Product>(
id: "product-123",
partitionKey: new PartitionKey("electronics"),
requestOptions: options
);
Optimize Query Patterns
- Add partition key to queries — Queries without a partition key filter become cross-partition queries, which are significantly more expensive and slower.
- Use projections — Select only the fields you need instead of
SELECT *. Smaller result sets consume fewer RUs. - Avoid cross-partition ORDER BY — Sorting across partitions requires reading and merging data from all partitions.
- Use pagination — Use
MaxItemCountin query options to limit the number of items per page, reducing per-request RU consumption.
// Optimized query with partition key and projection
var query = container.GetItemQueryIterator<dynamic>(
new QueryDefinition(
"SELECT c.id, c.name, c.price FROM c WHERE c.category = @category AND c.price < @maxPrice"
)
.WithParameter("@category", "electronics")
.WithParameter("@maxPrice", 100),
requestOptions: new QueryRequestOptions
{
PartitionKey = new PartitionKey("electronics"),
MaxItemCount = 50 // Limit items per page
}
);
Optimizing Write Operations
Reduce Item Size
RU cost for writes scales with item size. Reducing your document size directly reduces write RU consumption.
- Avoid storing large binary data — Store images, documents, and files in Azure Blob Storage; store only the blob URL in Cosmos DB.
- Use short property names — In high-throughput scenarios, shorter property names reduce document size.
"n"instead of"productName"can matter at millions of writes per day. - Remove unused properties — Audit your document schema for properties that are written but never read.
Customize the Indexing Policy
By default, Cosmos DB indexes every property in every document. For write-heavy workloads, excluding properties that are never queried from the index significantly reduces write RU costs.
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/category/*"
},
{
"path": "/price/*"
},
{
"path": "/createdDate/*"
}
],
"excludedPaths": [
{
"path": "/description/*"
},
{
"path": "/metadata/*"
},
{
"path": "/*"
}
]
}
This policy only indexes category, price, and createdDate — properties used in queries. All other properties are excluded, reducing write RU costs by 30-50% for documents with many properties.
Use Bulk Operations for Batch Writes
// Enable bulk execution for high-throughput writes
var options = new CosmosClientOptions
{
AllowBulkExecution = true
};
var client = new CosmosClient(connectionString, options);
var container = client.GetContainer("mydb", "mycontainer");
// Create items concurrently — SDK batches them efficiently
var tasks = items.Select(item =>
container.CreateItemAsync(item, new PartitionKey(item.PartitionKey))
);
var responses = await Task.WhenAll(tasks);
var totalRU = responses.Sum(r => r.RequestCharge);
Console.WriteLine($"Total RU for bulk insert: {totalRU}");
Throughput Configuration Strategies
Manual vs. Autoscale Throughput
| Mode | Behavior | Best For | Minimum |
|---|---|---|---|
| Manual (Provisioned) | Fixed RU/s, always available | Stable, predictable workloads | 400 RU/s |
| Autoscale | Scales between 10-100% of max RU/s | Variable workloads with unpredictable spikes | 1,000 RU/s max (100 RU/s min) |
| Serverless | Per-request billing, no provisioned throughput | Low-traffic or spiky workloads (<5,000 RU/s) | No minimum |
Enabling Autoscale
# Migrate a container from manual to autoscale throughput
az cosmosdb sql container throughput migrate \
--account-name myCosmosAccount \
--database-name myDatabase \
--name myContainer \
--throughput-type autoscale
# Set the maximum autoscale throughput
az cosmosdb sql container throughput update \
--account-name myCosmosAccount \
--database-name myDatabase \
--name myContainer \
--max-throughput 10000
With autoscale set to 10,000 RU/s maximum, Cosmos DB scales from 1,000 to 10,000 RU/s based on demand. You only pay for the throughput actually used (billed hourly at the highest RU/s reached in that hour).
Important: Autoscale does not eliminate hot partition throttling. If a single partition consumes more than its share of the provisioned RU/s, it will still be throttled regardless of the total autoscale capacity. Fix the partition key design to resolve hot partitions.
Database-Level vs. Container-Level Throughput
- Database-level throughput — Shared across all containers in the database. Cost-effective when you have many containers (10+) with variable workloads. Minimum 400 RU/s shared.
- Container-level throughput — Dedicated to a single container with guaranteed SLA. Required for containers that need predictable performance.
For cost optimization, use database-level throughput for containers with low or unpredictable usage, and container-level throughput for your busiest containers.
Architecture Considerations
When implementing this solution, align with the Azure Well-Architected Framework pillars: reliability, security, cost optimization, operational excellence, and performance efficiency. Each pillar provides design principles and patterns that help you build a solution that meets your quality requirements.
For reliability, design for failure by implementing retry logic, circuit breakers, and graceful degradation. For security, follow the principle of least privilege and encrypt data at rest and in transit. For cost optimization, right-size your resources and leverage Azure’s pricing models (Reserved Instances, Savings Plans, Spot VMs) for predictable workloads. For operational excellence, implement infrastructure as code, automated testing, and comprehensive monitoring. For performance efficiency, test under realistic load conditions and optimize the critical path.
Consider the scaling characteristics of your solution. Will it need to handle 10x traffic during peak periods? Can the Azure services you are using autoscale to meet demand? What happens when a dependent service is unavailable? Answering these questions during the design phase prevents costly re-architecture later.
Partition Key Design for Even Distribution
The partition key is the single most important design decision for Cosmos DB performance and cost. A good partition key distributes both storage and throughput evenly across partitions.
Characteristics of a Good Partition Key
- High cardinality — Many distinct values (e.g., userId, deviceId, orderId)
- Even access distribution — No single value dominates read or write traffic
- Frequently used in queries — Appears in WHERE clauses of your most common queries
Anti-Patterns to Avoid
| Anti-Pattern | Problem | Better Alternative |
|---|---|---|
| Date as partition key | All writes go to today’s partition (hot) | userId or deviceId + synthetic suffix |
| Boolean value | Only 2 partitions, one is always hot | Compound key combining the boolean with another property |
| Tenant ID (one large tenant) | Large tenant creates a hot partition | Dedicated container for the large tenant |
| Country | Uneven distribution (US dominates) | userId or orderId |
Synthetic Partition Keys
When no single property provides ideal distribution, create a synthetic partition key that combines multiple properties:
// Create a synthetic partition key for even distribution
function createItem(item) {
// Combine tenant and a hash suffix for distribution
item.partitionKey = `${item.tenantId}-${hashToRange(item.id, 10)}`;
return container.items.create(item);
}
function hashToRange(value, buckets) {
let hash = 0;
for (let i = 0; i < value.length; i++) {
hash = ((hash << 5) - hash) + value.charCodeAt(i);
hash |= 0;
}
return Math.abs(hash) % buckets;
}
SDK Configuration for Throttling Resilience
// .NET SDK retry configuration
var clientOptions = new CosmosClientOptions
{
// Increase max retries from default 9
MaxRetryAttemptsOnRateLimitedRequests = 15,
// Increase max wait time from default 30 seconds
MaxRetryWaitTimeOnRateLimitedRequests = TimeSpan.FromSeconds(60),
// Use Direct mode for lower latency
ConnectionMode = ConnectionMode.Direct,
// Enable TCP connection pooling
MaxTcpConnectionsPerEndpoint = 65535
};
// IMPORTANT: Use a single static client instance
private static readonly CosmosClient _client = new CosmosClient(
connectionString, clientOptions
);
Monitoring and Alerting
# Create an alert for rate-limited requests
az monitor metrics alert create \
--name "CosmosDB-Throttling-Alert" \
--resource-group rg-production \
--scopes "/subscriptions/{sub}/resourceGroups/rg-production/providers/Microsoft.DocumentDB/databaseAccounts/myCosmosAccount" \
--condition "total TotalRequests where StatusCode includes 429 > 10" \
--window-size 15m \
--evaluation-frequency 5m \
--action "/subscriptions/{sub}/resourceGroups/rg-monitoring/providers/microsoft.insights/actionGroups/ag-ops"
Cost Optimization with Reserved Capacity
For workloads with predictable throughput needs, Cosmos DB reserved capacity provides significant discounts:
- 1-year term: ~20% discount
- 3-year term: up to ~65% discount
Reserved capacity applies to provisioned throughput (RU/s) and storage. Calculate your baseline RU/s usage over the past 3-6 months to determine the optimal reservation size. Use the Azure Cosmos DB Capacity Planner to estimate RU requirements for new workloads before deployment.
Common Pitfalls and Best Practices
- Single client instance — Always use a single, static
CosmosClientinstance per account. Creating new client instances for each request wastes connections and increases latency. - Metadata operation limits — Operations like listing databases and containers have a separate system RU limit. Cache database and container references rather than looking them up repeatedly.
- Scale-up timing — Instantaneous scale-up is limited to 10,000 RU/s per physical partition. Beyond this, scaling is asynchronous and can take up to 5-6 hours. Plan throughput increases well before peak load periods.
- Billing granularity — Throughput is billed hourly at the highest RU/s provisioned during that hour. Changing throughput more frequently than hourly does not save money.
- Hot partition masking — A common mistake is increasing total throughput to fix throttling without checking for hot partitions. If one partition is at 100% and others are at 10%, doubling total throughput wastes money while only marginally improving the hot partition.
- TTL for data lifecycle — Enable Time-to-Live (TTL) on containers with temporal data. Expired documents are automatically deleted, reducing storage costs and improving query performance.
Operational Excellence
After implementing this solution, invest in operational excellence practices that ensure long-term reliability. Document the architecture decisions, configuration rationale, and operational procedures for your team. Set up monitoring dashboards that provide at-a-glance visibility into the solution's health and performance. Create runbooks for common operational scenarios such as scaling, failover, and incident response.
Schedule periodic reviews to assess whether the solution continues to meet your requirements as your workload evolves. Azure services release new features and capabilities regularly, and what was the optimal configuration six months ago may not be the best approach today. Stay current with Azure update announcements and evaluate new capabilities for potential improvements to your implementation.
Implement a feedback loop that captures operational insights and feeds them back into your deployment pipeline. If a monitoring gap is identified during an incident, add the necessary metrics and alerts to your infrastructure-as-code templates so they are deployed consistently across environments. This continuous improvement cycle ensures that your operational capability grows over time.
Conclusion
Optimizing RU consumption in Azure Cosmos DB is a multi-layered effort: use point reads over queries, select the right consistency level, customize indexing policies to exclude unqueried properties, and design partition keys for even distribution. When throttling occurs, diagnose whether it is a hot partition problem (requires partition key redesign) or an overall throughput shortfall (requires scaling). Configure autoscale for variable workloads to avoid paying for peak capacity during off-peak hours, and invest in reserved capacity for steady-state throughput to capture significant discounts. The combination of efficient data modeling, right-sized throughput, and proactive monitoring creates a Cosmos DB deployment that delivers consistent performance without unnecessary cost.
For more details, refer to the official documentation: Azure Cosmos DB overview, Best practices for Azure Cosmos DB .NET SDK.