How to fix Azure Cache for Redis connection refused errors

Understanding Azure Cache for Redis Connection Refused Errors

Azure Cache for Redis is a fully managed, in-memory data store used for caching, session management, message brokering, and real-time analytics. Connection refused errors are among the most disruptive issues you can encounter — they prevent your application from accessing cached data entirely, often causing cascading performance degradation or outages across your application tier.

This guide covers every root cause of connection refused errors, from firewall rules and network configuration to client-side issues and server-side failovers, with exact diagnostic commands and fixes.

Important: Microsoft has announced the retirement of Azure Cache for Redis. The recommended replacement is Azure Managed Redis. New deployments should consider Azure Managed Redis, though the troubleshooting concepts in this guide apply to both services.

Diagnostic Context

When encountering Azure Cache for Redis connection refused, the first step is understanding what changed. In most production environments, errors do not appear spontaneously. They are triggered by a change in configuration, code, traffic patterns, or the platform itself. Review your deployment history, recent configuration changes, and Azure Service Health notifications to identify potential triggers.

Azure maintains detailed activity logs for every resource operation. These logs capture who made a change, what was changed, when it happened, and from which IP address. Cross-reference the timeline of your error reports with the activity log entries to establish a causal relationship. Often, the fix is simply reverting the most recent change that correlates with the error onset.

If no recent changes are apparent, consider external factors. Azure platform updates, regional capacity changes, and dependent service modifications can all affect your resources. Check the Azure Status page and your subscription’s Service Health blade for any ongoing incidents or planned maintenance that coincides with your issue timeline.

Common Pitfalls to Avoid

When fixing Azure service errors under pressure, engineers sometimes make the situation worse by applying changes too broadly or too quickly. Here are critical pitfalls to avoid during your remediation process.

First, avoid making multiple changes simultaneously. If you change the firewall rules, the connection string, and the service tier all at once, you cannot determine which change actually resolved the issue. Apply one change at a time, verify the result, and document what worked. This disciplined approach builds reliable operational knowledge for your team.

Second, do not disable security controls to bypass errors. Opening all firewall rules, granting overly broad RBAC permissions, or disabling SSL enforcement might eliminate the error message, but it creates security vulnerabilities that are far more dangerous than the original issue. Always find the targeted fix that resolves the error while maintaining your security posture.

Third, test your fix in a non-production environment first when possible. Azure resource configurations can be exported as ARM or Bicep templates and deployed to a test resource group for validation. This extra step takes minutes but can prevent a failed fix from escalating the production incident.

Fourth, document the error message exactly as it appears, including correlation IDs, timestamps, and request IDs. If you need to open a support case with Microsoft, this information dramatically speeds up the investigation. Azure support engineers can use correlation IDs to trace the exact request through Microsoft’s internal logging systems.

Connection Requirements

Before troubleshooting, understand the baseline connectivity requirements:

Setting Value Notes
Non-SSL port 6379 Disabled by default on Azure Cache for Redis
SSL/TLS port 6380 Default and recommended port
Clustered ports 13000-13019, 15000-15019 Required for clustered cache instances
Minimum TLS TLS 1.2 Older TLS versions are rejected
Hostname format <name>.redis.cache.windows.net Always use hostname, never IP address

Firewall Rules Blocking Client IP

Symptoms

  • Connection times out immediately
  • Error: No connection is available to service this operation
  • Error: It was not possible to connect to the redis server

Diagnosis

# List current firewall rules
az redis firewall-rules list \
  --name myRedisCache \
  --resource-group myRG \
  --output table

# Check if your IP is allowed
# First, find your public IP
curl -s https://api.ipify.org

Fix

# Add a firewall rule for your IP
az redis firewall-rules create \
  --name myRedisCache \
  --resource-group myRG \
  --rule-name AllowMyIP \
  --start-ip 203.0.113.50 \
  --end-ip 203.0.113.50

# Add a range for your office network
az redis firewall-rules create \
  --name myRedisCache \
  --resource-group myRG \
  --rule-name AllowOffice \
  --start-ip 203.0.113.0 \
  --end-ip 203.0.113.255

Navigate to Portal > Azure Cache for Redis > Settings > Firewall to verify rules visually.

Private Endpoint Misconfiguration

If your cache uses Private Link, public access is typically disabled. Clients must connect through the private endpoint, and DNS resolution must return the private IP address.

Diagnosis

# Check if public access is enabled
az redis show \
  --name myRedisCache \
  --resource-group myRG \
  --query "publicNetworkAccess" -o tsv

# Verify DNS resolves to private IP
nslookup myRedisCache.redis.cache.windows.net

# Expected: Should resolve to a private IP (10.x.x.x or 172.x.x.x)
# Problem: If it resolves to a public IP, DNS is not configured correctly

Fix

# Verify private endpoint exists
az network private-endpoint list \
  --resource-group myRG \
  --query "[?contains(privateLinkServiceConnections[0].privateLinkServiceId, 'myRedisCache')]" \
  --output table

# Check private DNS zone
az network private-dns zone list \
  --resource-group myRG \
  --query "[?name=='privatelink.redis.cache.windows.net']" \
  --output table

# Verify DNS zone is linked to VNet
az network private-dns link vnet list \
  --resource-group myRG \
  --zone-name privatelink.redis.cache.windows.net \
  --output table

Important: Always connect to <cachename>.redis.cache.windows.net on port 6380, even with private endpoints. Do not use *.privatelink.redis.cache.windows.net directly in your connection string.

VNet Injection Configuration Issues

For caches deployed inside a virtual network (VNet injection), network security groups and routing must allow the required traffic.

Required NSG Rules for VNet-Injected Cache

Direction Port Protocol Purpose
Inbound 6379, 6380 TCP Client communication
Inbound 8443 TCP Internal management
Inbound 10221-10231 TCP Cluster communication
Inbound 13000-13999 TCP Cluster client ports
Inbound 15000-15999 TCP Cluster management
Outbound 443 TCP Azure dependencies
Outbound 53 TCP/UDP DNS
Inbound 20226 TCP Redis management
# List NSG rules for the cache subnet
az network nsg rule list \
  --nsg-name myCacheNSG \
  --resource-group myRG \
  --output table

# Verify the internal load balancer IP (168.63.129.16) is not blocked
# This IP is used by Azure platform health probes and must be allowed

Migration note: Microsoft recommends migrating from VNet injection to Private Link. VNet injection will be deprecated for new caches.

Maximum Connected Clients Reached

Each cache tier has a maximum number of concurrent connections. When this limit is reached, new connections are refused.

Connection Limits by Tier

Cache Size Max Connections
C0 (250 MB) 256
C1 (1 GB) 1,000
C2 (2.5 GB) 2,000
C3 (6 GB) 5,000
C4 (13 GB) 10,000
C5 (26 GB) 15,000
C6 (53 GB) 20,000
P1 (6 GB) 7,500
P2 (13 GB) 15,000
P3 (26 GB) 30,000
P4 (53 GB) 40,000

Diagnosis

# Check connected clients metric
az monitor metrics list \
  --resource /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Cache/redis/myRedisCache \
  --metric "connectedclients" \
  --interval PT5M \
  --output table

Fix

// C# StackExchange.Redis: Configure connection multiplexing
var options = ConfigurationOptions.Parse("myRedisCache.redis.cache.windows.net:6380");
options.Ssl = true;
options.Password = "your-access-key";
options.AbortOnConnectFail = false;
options.ConnectRetry = 3;
options.ConnectTimeout = 15000;

// IMPORTANT: Reuse a single ConnectionMultiplexer instance
// Do NOT create a new connection per request
private static Lazy<ConnectionMultiplexer> lazyConnection = 
    new Lazy<ConnectionMultiplexer>(() =>
    {
        return ConnectionMultiplexer.Connect(options);
    });

public static ConnectionMultiplexer Connection => lazyConnection.Value;

Server Maintenance and Failover

Planned maintenance and unplanned failovers cause brief disconnections. Standard and Premium tiers have replicas, but the failover process still causes momentary connection drops.

Diagnosis

# Check for failover events
az monitor metrics list \
  --resource /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Cache/redis/myRedisCache \
  --metric "errors" \
  --dimension "ErrorType" \
  --interval PT1H \
  --output table

Fix: Implement Connection Resilience

// C# StackExchange.Redis: Resilient configuration
var options = new ConfigurationOptions
{
    EndPoints = { "myRedisCache.redis.cache.windows.net:6380" },
    Password = "access-key",
    Ssl = true,
    AbortOnConnectFail = false,         // Don't throw on initial connect failure
    ConnectRetry = 5,                    // Retry connect 5 times
    ConnectTimeout = 15000,              // 15 second connect timeout
    SyncTimeout = 5000,                  // 5 second sync operation timeout
    ReconnectRetryPolicy = new ExponentialRetry(5000),  // Exponential backoff
    KeepAlive = 60                       // Send keepalive every 60 seconds
};

// Handle connection events for monitoring
var connection = ConnectionMultiplexer.Connect(options);
connection.ConnectionFailed += (sender, args) =>
{
    Console.WriteLine($"Connection failed: {args.Exception?.Message}");
};
connection.ConnectionRestored += (sender, args) =>
{
    Console.WriteLine("Connection restored");
};

Root Cause Analysis Framework

After applying the immediate fix, invest time in a structured root cause analysis. The Five Whys technique is a simple but effective method: start with the error symptom and ask “why” five times to drill down from the surface-level cause to the fundamental issue.

For example, considering Azure Cache for Redis connection refused: Why did the service fail? Because the connection timed out. Why did the connection timeout? Because the DNS lookup returned a stale record. Why was the DNS record stale? Because the TTL was set to 24 hours during a migration and never reduced. Why was it not reduced? Because there was no checklist for post-migration cleanup. Why was there no checklist? Because the migration process was ad hoc rather than documented.

This analysis reveals that the root cause is not a technical configuration issue but a process gap that allowed undocumented changes. The preventive action is creating a migration checklist and review process, not just fixing the DNS TTL. Without this depth of analysis, the team will continue to encounter similar issues from different undocumented changes.

Categorize your root causes into buckets: configuration errors, capacity limits, code defects, external dependencies, and process gaps. Track the distribution over time. If most of your incidents fall into the configuration error bucket, invest in infrastructure-as-code validation and policy enforcement. If they fall into capacity limits, improve your monitoring and forecasting. This data-driven approach focuses your improvement efforts where they will have the most impact.

Third-Party Firewall and Proxy Issues

Corporate firewalls, web application firewalls, and proxy servers between your application and Azure Cache for Redis can block or interfere with Redis connections.

Required Firewall Allowances

  • Allow outbound TCP to *.redis.cache.windows.net on ports 6379 and 6380
  • For clustered caches, also allow ports 13000-13019 and 15000-15019
  • Ensure the firewall does not perform SSL/TLS inspection on Redis traffic, as this can break the TLS handshake
# Test connectivity through firewall
# From the client machine or App Service Kudu console
psping myRedisCache.redis.cache.windows.net:6380

# Test with redis-cli
redis-cli -h myRedisCache.redis.cache.windows.net \
  -p 6380 \
  -a "your-access-key" \
  --tls \
  ping

Kubernetes Service Mesh Conflicts

When running in AKS with a service mesh like Istio, port conflicts can prevent Redis connections.

Problem

Istio and similar service meshes may intercept or block traffic on ports 13000-13019 and 15000-15019, which overlap with Redis cluster ports.

Fix

# Kubernetes: Exclude Redis ports from Istio sidecar
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    metadata:
      annotations:
        traffic.sidecar.istio.io/excludeOutboundPorts: "6379,6380,13000,13001,13002,13003,13004,13005,13006,13007,13008,13009,13010,13011,13012,13013,13014,13015,13016,13017,13018,13019,15000,15001,15002,15003,15004,15005,15006,15007,15008,15009,15010,15011,15012,15013,15014,15015,15016,15017,15018,15019"

Linux TCP Settings

On Linux clients, default TCP settings may be too optimistic for Azure Cache for Redis connections, causing connections to stall or drop during network glitches.

# Recommended Linux kernel TCP settings for Redis
sudo sysctl -w net.ipv4.tcp_retries2=5
sudo sysctl -w net.ipv4.tcp_keepalive_time=60
sudo sysctl -w net.ipv4.tcp_keepalive_intvl=10
sudo sysctl -w net.ipv4.tcp_keepalive_probes=5

# Persist settings
echo "net.ipv4.tcp_retries2=5" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_time=60" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_intvl=10" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_probes=5" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

TLS Version Mismatch

Azure Cache for Redis requires TLS 1.2 or higher. Clients using older TLS versions will have their connections refused.

# Check minimum TLS version on the cache
az redis show \
  --name myRedisCache \
  --resource-group myRG \
  --query minimumTlsVersion -o tsv

# Set minimum TLS version
az redis update \
  --name myRedisCache \
  --resource-group myRG \
  --set minimumTlsVersion=1.2
# Python redis-py: Force TLS 1.2
import redis
import ssl

ssl_context = ssl.create_default_context()
ssl_context.minimum_version = ssl.TLSVersion.TLSv1_2

r = redis.Redis(
    host='myRedisCache.redis.cache.windows.net',
    port=6380,
    password='your-access-key',
    ssl=True,
    ssl_cert_reqs='required',
    ssl_ca_certs=certifi.where(),
    ssl_min_version=ssl.TLSVersion.TLSv1_2
)

print(r.ping())  # True

Error Classification and Severity Assessment

Not all errors require the same response urgency. Classify errors into severity levels based on their impact on users and business operations. A severity 1 error causes complete service unavailability for all users. A severity 2 error degrades functionality for a subset of users. A severity 3 error causes intermittent issues that affect individual operations. A severity 4 error is a cosmetic or minor issue with a known workaround.

For Azure Cache for Redis connection refused, map the specific error codes and messages to these severity levels. Create a classification matrix that your on-call team can reference when triaging incoming alerts. This prevents over-escalation of minor issues and under-escalation of critical ones. Include the expected resolution time for each severity level and the communication protocol (who to notify, how frequently to update stakeholders).

Track your error rates over time using Azure Monitor metrics and Log Analytics queries. Establish baseline error rates for healthy operation so you can distinguish between normal background error levels and genuine incidents. A service that normally experiences 0.1 percent error rate might not need investigation when errors spike to 0.2 percent, but a jump to 5 percent warrants immediate attention. Without this baseline context, every alert becomes equally urgent, leading to alert fatigue.

Implement error budgets as part of your SLO framework. An error budget defines the maximum amount of unreliability your service can tolerate over a measurement window (typically monthly or quarterly). When the error budget is exhausted, the team shifts focus from feature development to reliability improvements. This mechanism creates a structured trade-off between innovation velocity and operational stability.

Dependency Management and Service Health

Azure services depend on other Azure services internally, and your application adds additional dependency chains on top. When diagnosing Azure Cache for Redis connection refused, map out the complete dependency tree including network dependencies (DNS, load balancers, firewalls), identity dependencies (Azure AD, managed identity endpoints), and data dependencies (storage accounts, databases, key vaults).

Check Azure Service Health for any ongoing incidents or planned maintenance affecting the services in your dependency tree. Azure Service Health provides personalized notifications specific to the services and regions you use. Subscribe to Service Health alerts so your team is notified proactively when Microsoft identifies an issue that might affect your workload.

For each critical dependency, implement a health check endpoint that verifies connectivity and basic functionality. Your application’s readiness probe should verify not just that the application process is running, but that it can successfully reach all of its dependencies. When a dependency health check fails, the application should stop accepting new requests and return a 503 status until the dependency recovers. This prevents requests from queuing up and timing out, which would waste resources and degrade the user experience.

Public IP Address Changes

Azure Cache for Redis public IP addresses can change during scale operations or maintenance. If your application or firewall rules reference the IP address directly, connections will break.

Fix

  • Always use the hostname (myRedisCache.redis.cache.windows.net) in connection strings, never the IP address
  • Update any security appliance rules that reference cache IP addresses to use hostname-based rules or DNS resolution

Diagnostic Checklist

# Quick diagnostic script
CACHE_NAME="myRedisCache"
RG="myRG"

echo "=== Cache Status ==="
az redis show -n $CACHE_NAME -g $RG --query "provisioningState" -o tsv

echo "=== Public Network Access ==="
az redis show -n $CACHE_NAME -g $RG --query "publicNetworkAccess" -o tsv

echo "=== Minimum TLS ==="
az redis show -n $CACHE_NAME -g $RG --query "minimumTlsVersion" -o tsv

echo "=== Non-SSL Port ==="
az redis show -n $CACHE_NAME -g $RG --query "enableNonSslPort" -o tsv

echo "=== Firewall Rules ==="
az redis firewall-rules list -n $CACHE_NAME -g $RG --output table

echo "=== Connected Clients ==="
az monitor metrics list \
  --resource /subscriptions/$(az account show --query id -o tsv)/resourceGroups/$RG/providers/Microsoft.Cache/redis/$CACHE_NAME \
  --metric "connectedclients" \
  --interval PT5M \
  --query "value[0].timeseries[0].data[-1].average" -o tsv

Prevention Best Practices

  • Always use hostname-based connections, never IP addresses
  • Reuse ConnectionMultiplexer instances — Creating new connections per request is the #1 cause of connection exhaustion
  • Set AbortOnConnectFail to false — Allow the client to retry connections during transient failures
  • Monitor Connected Clients metric — Alert when approaching 80% of the connection limit
  • Implement exponential backoff retry — Handle transient failures gracefully
  • Migrate from VNet injection to Private Link — Private Link is the recommended network isolation approach
  • Use TLS 1.2 or higher — Ensure all client libraries support modern TLS versions
  • Plan for cache migration — Azure Cache for Redis is being retired; plan migration to Azure Managed Redis

Post-Resolution Validation and Hardening

After applying the fix, perform a structured validation to confirm the issue is fully resolved. Do not rely solely on the absence of error messages. Actively verify that the service is functioning correctly by running health checks, executing test transactions, and monitoring key metrics for at least 30 minutes after the change.

Validate from multiple perspectives. Check the Azure resource health status, run your application’s integration tests, verify that dependent services are receiving data correctly, and confirm that end users can complete their workflows. A fix that resolves the immediate error but breaks a downstream integration is not a complete resolution.

Implement defensive monitoring to detect if the issue recurs. Create an Azure Monitor alert rule that triggers on the specific error condition you just fixed. Set the alert to fire within minutes of recurrence so you can respond before the issue impacts users. Include the remediation steps in the alert’s action group notification so that any on-call engineer can apply the fix quickly.

Finally, conduct a brief post-incident review. Document the root cause, the fix applied, the time to detect, diagnose, and resolve the issue, and any preventive measures that should be implemented. Share this documentation with the broader engineering team through a blameless post-mortem process. This transparency transforms individual incidents into organizational learning that raises the entire team’s operational capability.

Consider adding the error scenario to your integration test suite. Automated tests that verify the service behaves correctly under the conditions that triggered the original error provide a safety net against regression. If a future change inadvertently reintroduces the problem, the test will catch it before it reaches production.

Summary

Connection refused errors in Azure Cache for Redis stem from firewall rules blocking client IPs, private endpoint DNS misconfiguration, VNet NSG rules, connection limit exhaustion, server failovers, or TLS version mismatches. The diagnostic checklist above systematically verifies each of these potential causes. For production environments, the most important preventive measures are connection reuse (single ConnectionMultiplexer instance), proper retry policies, and monitoring connection count metrics to prevent exhaustion.

For more details, refer to the official documentation: What is Azure Cache for Redis?, Troubleshoot Azure Cache for Redis latency and timeouts.

Leave a Reply