Query Exported Azure Cost Data Using Azure Data Explorer: A Practical Azure FinOps Guide

Why Azure Data Explorer for Cost Analytics

Azure Cost Management‘s built-in views handle routine cost analysis well, but they impose constraints: a 13-month history limit, restricted query complexity, rigid grouping options (maximum two dimensions), and rate-limited APIs. When your FinOps practice needs multi-year trend analysis, anomaly detection with machine learning operators, complex joins across cost and operational data, or sub-second query performance over billions of rows, Azure Data Explorer (ADX) is the analytical engine that removes these constraints.

ADX is purpose-built for time-series and log analytics at scale. It ingests cost export data from storage accounts directly, supports KQL (Kusto Query Language) for expressive analytical queries, and integrates with Power BI, Grafana, and custom dashboards. This guide covers the complete pipeline: configuring exports, designing the ADX schema, ingesting data, writing KQL queries for FinOps analysis, and building anomaly detection on top of your cost data.

Why FinOps Maturity Matters

Cloud financial management is not merely about reducing costs. It is about maximizing the business value of every dollar spent on cloud infrastructure. The FinOps Foundation defines three phases of cloud financial management maturity: Inform, Optimize, and Operate. This guide addresses practical implementation techniques that span all three phases.

In the Inform phase, organizations gain visibility into where their cloud spending goes. Azure Cost Management provides the raw data, but transforming that data into actionable insights requires structured approaches to tagging, cost allocation, and reporting. Without consistent resource tagging and cost center mapping, finance teams cannot attribute cloud costs to the business units that generate them, and engineering teams cannot identify which workloads are driving cost growth.

In the Optimize phase, teams actively reduce waste and improve efficiency. This includes rightsizing underutilized resources, eliminating orphaned resources, leveraging Reserved Instances and Savings Plans for predictable workloads, and implementing auto-scaling to match capacity with demand. The optimization opportunities identified through the Inform phase directly feed the actions in this phase.

In the Operate phase, FinOps practices become embedded in the organization’s standard operating procedures. Cost governance policies are enforced through Azure Policy, budget alerts trigger automated responses, and cost reviews are integrated into sprint planning and architectural decision-making. The goal is continuous financial optimization that happens as a natural part of engineering operations rather than as a periodic cleanup exercise.

Organizational Alignment

Effective cloud cost management requires collaboration between engineering, finance, and business leadership. Engineering teams understand the technical trade-offs between cost and performance. Finance teams understand the budget constraints and reporting requirements. Business leaders understand the revenue impact and strategic priorities that should drive investment decisions.

Establish a FinOps team or practice that brings these perspectives together. This cross-functional team should meet regularly to review spending trends, discuss optimization opportunities, and make joint decisions about investment priorities. The techniques in this guide provide the shared data foundation that enables these cross-functional conversations and ensures that cost decisions are informed by both technical and business context.

Create executive dashboards that translate technical cost data into business language. Instead of showing raw Azure meter costs, show cost per customer, cost per transaction, or cost as a percentage of revenue. These are the metrics that business leaders can act on and that connect cloud spending to business outcomes.

Configuring Cost Data Exports

The pipeline starts with Azure Cost Management scheduled exports that deposit cost data into a storage account. For ADX ingestion, the FOCUS format (FinOps Open Cost and Usage Specification) with Parquet file format is the optimal choice.

Why FOCUS Format

FOCUS combines actual and amortized costs into a single dataset, eliminating the need to manage separate exports and join them later. The schema is standardized across cloud providers (Azure, AWS, GCP), making it the natural choice for multi-cloud FinOps. Azure exports FOCUS data using version 1.0 for GA accounts and 1.2-preview for EA and MCA billing accounts.

Export Configuration

  1. Navigate to Cost Management → Exports
  2. Select + Add
  3. Choose Cost and usage details (FOCUS) as the export type
  4. Set schedule: Daily export of month-to-date costs
  5. Select file format: Parquet with Snappy compression
  6. Enable File partitioning (always enabled for FOCUS exports)
  7. Enable Overwrite data so each run replaces the previous data for the same period
  8. Choose the destination storage account and container

The export creates files in the path structure: container/directory/ExportName/YYYYMMDD-YYYYMMDD/RunID/partN.parquet. Each export includes a manifest.json file listing all data partitions, row counts, and byte sizes. Individual partition files are kept under 1 GB uncompressed.

Manifest Structure

{
  "manifestVersion": "2024-04-01",
  "byteCount": 304521,
  "blobCount": 3,
  "dataRowCount": 28547,
  "exportConfig": {
    "exportName": "focus-daily-export",
    "dataVersion": "2023-05-01",
    "type": "CostAndUsageDetails",
    "timeFrame": "MonthToDate",
    "granularity": null
  },
  "deliveryConfig": {
    "partitionData": true,
    "dataOverwriteBehavior": "OverwritePreviousReport",
    "fileFormat": "Parquet",
    "compressionMode": "Snappy"
  },
  "blobs": [
    {
      "blobName": "exports/focus-daily-export/.../part0.parquet",
      "byteCount": 102400,
      "dataRowCount": 9500
    }
  ]
}

Designing the ADX Table Schema

The FOCUS specification defines over 60 columns across billing, pricing, resource, and service categories. For most FinOps workloads, you do not need all of them. Design your ADX table to include the columns your queries will actually use, reducing storage and improving query performance.

Recommended Core Table

.create table CostData (
    BillingAccountId: string,
    BillingAccountName: string,
    BillingPeriodStart: datetime,
    BillingPeriodEnd: datetime,
    ChargePeriodStart: datetime,
    ChargePeriodEnd: datetime,
    BilledCost: decimal,
    EffectiveCost: decimal,
    ListCost: decimal,
    BillingCurrency: string,
    ChargeCategory: string,
    ChargeFrequency: string,
    CommitmentDiscountId: string,
    CommitmentDiscountName: string,
    CommitmentDiscountStatus: string,
    CommitmentDiscountType: string,
    ConsumedQuantity: decimal,
    ConsumedUnit: string,
    PricingCategory: string,
    PricingQuantity: decimal,
    ProviderName: string,
    RegionName: string,
    ResourceId: string,
    ResourceName: string,
    ResourceType: string,
    ServiceCategory: string,
    ServiceName: string,
    SkuId: string,
    SkuMeter: string,
    SubAccountId: string,
    SubAccountName: string,
    Tags: dynamic
)

Parquet Ingestion Mapping

.create table CostData ingestion parquet mapping 'CostParquetMapping'
'['
'  {"Column":"BillingAccountId","Properties":{"Path":"$.BillingAccountId"}},'
'  {"Column":"BillingAccountName","Properties":{"Path":"$.BillingAccountName"}},'
'  {"Column":"BillingPeriodStart","Properties":{"Path":"$.BillingPeriodStart"}},'
'  {"Column":"BillingPeriodEnd","Properties":{"Path":"$.BillingPeriodEnd"}},'
'  {"Column":"ChargePeriodStart","Properties":{"Path":"$.ChargePeriodStart"}},'
'  {"Column":"ChargePeriodEnd","Properties":{"Path":"$.ChargePeriodEnd"}},'
'  {"Column":"BilledCost","Properties":{"Path":"$.BilledCost"}},'
'  {"Column":"EffectiveCost","Properties":{"Path":"$.EffectiveCost"}},'
'  {"Column":"ListCost","Properties":{"Path":"$.ListCost"}},'
'  {"Column":"BillingCurrency","Properties":{"Path":"$.BillingCurrency"}},'
'  {"Column":"ChargeCategory","Properties":{"Path":"$.ChargeCategory"}},'
'  {"Column":"ChargeFrequency","Properties":{"Path":"$.ChargeFrequency"}},'
'  {"Column":"CommitmentDiscountId","Properties":{"Path":"$.CommitmentDiscountId"}},'
'  {"Column":"CommitmentDiscountName","Properties":{"Path":"$.CommitmentDiscountName"}},'
'  {"Column":"CommitmentDiscountStatus","Properties":{"Path":"$.CommitmentDiscountStatus"}},'
'  {"Column":"CommitmentDiscountType","Properties":{"Path":"$.CommitmentDiscountType"}},'
'  {"Column":"ConsumedQuantity","Properties":{"Path":"$.ConsumedQuantity"}},'
'  {"Column":"ConsumedUnit","Properties":{"Path":"$.ConsumedUnit"}},'
'  {"Column":"PricingCategory","Properties":{"Path":"$.PricingCategory"}},'
'  {"Column":"PricingQuantity","Properties":{"Path":"$.PricingQuantity"}},'
'  {"Column":"ProviderName","Properties":{"Path":"$.ProviderName"}},'
'  {"Column":"RegionName","Properties":{"Path":"$.RegionName"}},'
'  {"Column":"ResourceId","Properties":{"Path":"$.ResourceId"}},'
'  {"Column":"ResourceName","Properties":{"Path":"$.ResourceName"}},'
'  {"Column":"ResourceType","Properties":{"Path":"$.ResourceType"}},'
'  {"Column":"ServiceCategory","Properties":{"Path":"$.ServiceCategory"}},'
'  {"Column":"ServiceName","Properties":{"Path":"$.ServiceName"}},'
'  {"Column":"SkuId","Properties":{"Path":"$.SkuId"}},'
'  {"Column":"SkuMeter","Properties":{"Path":"$.SkuMeter"}},'
'  {"Column":"SubAccountId","Properties":{"Path":"$.SubAccountId"}},'
'  {"Column":"SubAccountName","Properties":{"Path":"$.SubAccountName"}},'
'  {"Column":"Tags","Properties":{"Path":"$.Tags"}}'
']'

Ingestion Strategies

Three approaches for getting cost data from storage into ADX, each suited to different operational needs.

Option 1: Event Grid Data Connection (Recommended)

An Event Grid data connection automatically ingests new Parquet files as Cost Management exports them. This is the recommended production setup — zero manual intervention after initial configuration.

  1. Navigate to your ADX cluster → Database → Data connections
  2. Select + Add data connectionEvent Grid
  3. Configure the storage account, container, and Event Grid subscription
  4. Set event type to Microsoft.Storage.BlobCreated
  5. Set data format to PARQUET and select the ingestion mapping
  6. Add a blob path prefix filter to target only cost export files: /blobServices/default/containers/exports/blobs/cost-export/
  7. Add a suffix filter: .parquet
  8. Enable managed identity authentication (recommended over storage keys)

The connection automatically provisions an Event Grid Subscription, Event Hubs namespace, and Event Hub. New files trigger ingestion within minutes of the export completing. Maximum supported file size is 6 GB per ingestion operation — well within the 1 GB partition limit of cost exports.

Option 2: Ad-Hoc Ingestion

For initial data loading, testing, or backfilling historical data, use the .ingest command directly:

.ingest into table CostData (
    h'https://storageaccount.blob.core.windows.net/exports/cost-export/20260301-20260331/RunID/part0.parquet;storagekey'
)
with (
    format='parquet',
    ingestionMappingReference='CostParquetMapping'
)

Option 3: Idempotent Batch Ingestion

For automated pipelines that might retry, use ingestIfNotExists tags to prevent duplicate ingestion. This is critical when an Azure Function or Logic App processes exports and might be triggered multiple times for the same file:

.ingest into table CostData (
    h'https://storageaccount.blob.core.windows.net/exports/cost-export/20260301-20260331/RunID/part0.parquet;storagekey'
)
with (
    format='parquet',
    ingestionMappingReference='CostParquetMapping',
    ingestIfNotExists='["2026-03-RunID-part0"]',
    tags='["ingest-by:2026-03-RunID-part0"]',
    creationTime='2026-03-31'
)

The ingestIfNotExists parameter checks whether a matching tag already exists in the table’s extents. If it does, the ingestion is silently skipped. This gives you exactly-once semantics for pipeline retries.

Advanced Cost Optimization Techniques

Beyond the basic optimization strategies, consider these advanced techniques that can yield significant additional savings.

Spot Instances and Low-Priority VMs: For fault-tolerant batch processing, machine learning training, dev/test environments, and CI/CD build agents, use Azure Spot VMs that offer up to 90 percent discount compared to pay-as-you-go pricing. Implement graceful shutdown handlers that checkpoint progress when Azure reclaims the capacity, and design your workloads to resume from the last checkpoint on a new instance.

Reserved Instance Exchange and Return: Azure Reservations can be exchanged for different VM families, regions, or terms without penalty. If your workload characteristics change, exchange your existing reservation rather than letting it go unused. This flexibility makes reservations less risky than they might appear, as you can adjust your commitments as your infrastructure evolves.

Hybrid Benefit: If your organization has existing Windows Server or SQL Server licenses with Software Assurance, apply Azure Hybrid Benefit to reduce VM and managed database costs by up to 80 percent when combined with Reserved Instances. Track license utilization to ensure you are maximizing the value of your existing license investments.

Resource Lifecycle Automation: Implement automation that shuts down development and testing environments outside of business hours and weekends. A typical dev/test VM that runs 10 hours per day, 5 days per week costs 70 percent less than one that runs 24/7. Azure Automation schedules, Azure DevTest Labs auto-shutdown, and Azure Functions with timer triggers can all implement this pattern with minimal effort.

Right-Sizing Based on Actual Usage: Azure Advisor provides right-sizing recommendations based on CPU and memory utilization over the past 14 days. Review these recommendations weekly and act on them. A VM that consistently uses less than 20 percent of its allocated CPU should be downsized to the next smaller SKU. For databases, review DTU or vCore utilization and adjust the service tier accordingly.

External Tables: Query Without Ingestion

For exploratory analysis or infrequent queries on historical cost data, external tables let you query Parquet files directly in storage without ingesting them into ADX. There is no data duplication and no ingestion cost — but queries run slower because there is no hot cache or index.

.create external table CostDataExternal (
    BillingAccountId: string,
    BillingPeriodStart: datetime,
    ChargePeriodStart: datetime,
    ChargePeriodEnd: datetime,
    BilledCost: decimal,
    EffectiveCost: decimal,
    BillingCurrency: string,
    ServiceName: string,
    ResourceId: string,
    ResourceName: string,
    ResourceType: string,
    RegionName: string,
    SubAccountName: string,
    Tags: dynamic,
    PricingCategory: string,
    CommitmentDiscountName: string
)
kind=storage
partition by (BillingMonth: datetime = startofmonth(ChargePeriodStart))
pathformat=(datetime_pattern("'year='yyyy'/month='MM", BillingMonth))
dataformat=parquet
(
    h'https://storageaccount.blob.core.windows.net/exports/cost-export;storagekey'
)
with (fileExtension=".parquet")

Query the external table the same way you query a regular table, but prefix with external_table():

external_table("CostDataExternal")
| where ChargePeriodStart between (datetime(2025-01-01) .. datetime(2025-12-31))
| summarize AnnualCost = sum(BilledCost) by ServiceName
| top 10 by AnnualCost desc

Essential KQL Queries for FinOps

Daily Cost Trend

CostData
| where ChargePeriodStart >= ago(30d)
| summarize DailyCost = sum(BilledCost) by Day = bin(ChargePeriodStart, 1d)
| order by Day asc
| render timechart

Top 10 Most Expensive Resources

CostData
| where ChargePeriodStart >= startofmonth(now())
| summarize TotalCost = sum(EffectiveCost) by ResourceName, ResourceId, ServiceName
| top 10 by TotalCost desc

Cost by Service and Subscription

CostData
| where ChargePeriodStart >= startofmonth(now())
| summarize 
    BilledTotal = sum(BilledCost), 
    EffectiveTotal = sum(EffectiveCost)
    by ServiceName, SubAccountName
| order by BilledTotal desc

Month-Over-Month Comparison

let currentMonth = startofmonth(now());
let previousMonth = datetime_add('month', -1, currentMonth);
CostData
| where ChargePeriodStart >= previousMonth
| extend Month = iff(ChargePeriodStart >= currentMonth, "Current", "Previous")
| summarize TotalCost = sum(BilledCost) by ServiceName, Month
| evaluate pivot(Month, sum(TotalCost))
| extend ChangePercent = round((Current - Previous) / Previous * 100, 2)
| order by Current desc

Anomaly Detection with Machine Learning

ADX’s built-in series_decompose_anomalies() function applies a machine learning model to time-series data, detecting points that deviate significantly from the expected pattern:

CostData
| where ChargePeriodStart >= ago(90d)
| summarize DailyCost = sum(BilledCost) by Day = bin(ChargePeriodStart, 1d)
| make-series CostSeries = sum(DailyCost) on Day from ago(90d) to now() step 1d
| extend (Anomalies, AnomalyScore, ExpectedCost) = series_decompose_anomalies(CostSeries)
| mv-expand Day to typeof(datetime), 
            CostSeries to typeof(decimal), 
            Anomalies to typeof(int), 
            AnomalyScore to typeof(double), 
            ExpectedCost to typeof(decimal)
| where Anomalies != 0
| project Day, ActualCost = CostSeries, ExpectedCost, AnomalyScore,
          Direction = iff(Anomalies > 0, "Spike", "Drop")

This query builds a 90-day time series of daily costs, decomposes it into trend, seasonal, and residual components, then flags days where the residual exceeds the model’s confidence interval. Positive anomalies indicate cost spikes; negative anomalies indicate unexpected drops (possibly deleted resources or export gaps).

Cost Forecasting

CostData
| where ChargePeriodStart >= ago(90d)
| summarize DailyCost = sum(BilledCost) by Day = bin(ChargePeriodStart, 1d)
| make-series CostSeries = sum(DailyCost) default=0 on Day 
    from ago(90d) to datetime_add('day', 30, now()) step 1d
| extend ForecastedCost = series_decompose_forecast(CostSeries, 30)
| render timechart

The series_decompose_forecast() function extends the time series 30 days into the future using the learned seasonal and trend components. This gives you a projected daily cost for the next month based on 90 days of historical patterns.

Commitment Discount Utilization

CostData
| where ChargePeriodStart >= startofmonth(now())
| where isnotempty(CommitmentDiscountId)
| summarize 
    UsedCost = sumif(EffectiveCost, CommitmentDiscountStatus == "Used"),
    UnusedCost = sumif(EffectiveCost, CommitmentDiscountStatus == "Unused")
    by CommitmentDiscountName, CommitmentDiscountType
| extend UtilizationPct = round(UsedCost / (UsedCost + UnusedCost) * 100, 2)
| order by UnusedCost desc

Tag-Based Chargeback Report

CostData
| where ChargePeriodStart >= startofmonth(now())
| extend CostCenter = tostring(Tags["CostCenter"])
| summarize 
    TotalCost = sum(EffectiveCost),
    ResourceCount = dcount(ResourceId)
    by CostCenter = iif(isempty(CostCenter), "Untagged", CostCenter)
| order by TotalCost desc
| render piechart

Retention and Caching Policies

Configure retention and caching policies to balance query performance with storage costs:

// Keep 3 years of cost data (retention)
.alter table CostData policy retention 
```
{ "SoftDeletePeriod": "1095.00:00:00", "Recoverability": "Enabled" }
```

// Keep the last 90 days in hot cache for fast queries
.alter table CostData policy caching 
```
{ "DataHotSpan": "90.00:00:00" }
```

Data in hot cache (SSD) delivers sub-second query performance. Data beyond the hot cache window moves to cold storage (cheaper blob storage) and queries run slower but still complete. For FinOps, keeping the last 90 days hot covers most active analysis needs while retaining years of data for trend analysis.

ADX vs. Other Approaches

Approach Query Latency History Depth Query Complexity Cost
Cost Management portal Seconds 13 months Limited (2 group-by max) Free
Cost Management API Seconds 13 months Limited (2 group-by max) Free (rate limited)
Power BI + connector Seconds-minutes 13 months (connector) or unlimited (exports) DAX/M (moderate) Power BI Pro license
Azure Data Explorer Sub-second to seconds Unlimited (export-based) KQL (high — ML, time series, joins) ADX cluster cost
ADX external tables Seconds-minutes Unlimited KQL (high) Minimal (query-time only)

ADX is the right choice when you need multi-year history, ML-powered anomaly detection, complex multi-dimensional queries, or integration with operational monitoring data (joining cost data with performance metrics, deployment logs, or incident data in the same KQL query). For teams with moderate analytical needs and fewer than 13 months of history, the built-in Cost Management tools or Power BI connectors may be sufficient.

Governance and Automation

Manual cost management does not scale. As your Azure footprint grows beyond a handful of subscriptions, you need automated governance to maintain cost discipline.

Azure Policy can enforce tagging requirements at deployment time, ensuring that every resource is tagged with the cost center, environment, application name, and owner before it is created. Without consistent tagging, cost allocation becomes a manual, error-prone guessing game. Define a mandatory tag set and use a deny policy effect to prevent untagged resources from being deployed.

Budget alerts with action groups can trigger automated responses when spending thresholds are crossed. At 80 percent of budget, send a notification to the engineering team lead. At 100 percent, notify the engineering manager and finance partner. At 120 percent, trigger an automated workflow that inventories recently created resources and flags potential cost anomalies for immediate review.

Consider implementing a cost anomaly detection pipeline. Azure Cost Management provides anomaly detection capabilities that flag unusual spending patterns. Supplement this with custom KQL queries in Log Analytics that monitor resource creation events, SKU changes, and scaling operations. When an anomaly is detected, an automated investigation workflow can gather the relevant context (who created the resource, which pipeline deployed it, what business justification was provided) and route it to the responsible team for review.

Regular cost optimization reviews should be scheduled on a monthly cadence. Use the Azure Advisor cost recommendations as a starting point, then layer in your organization-specific optimization criteria. Track optimization actions and their measured impact over time to demonstrate the ROI of your FinOps program to leadership. A well-run FinOps program typically achieves 20 to 30 percent cost reduction in the first year, with ongoing annual optimization of 5 to 10 percent as the program matures.

Production Deployment Checklist

  1. Export configuration: FOCUS format, Parquet with Snappy compression, daily schedule, file overwrite enabled
  2. ADX cluster: Dev/Test SKU for exploration, Standard SKU for production dashboards
  3. Table and mapping: Create the CostData table and Parquet ingestion mapping
  4. Data connection: Event Grid connection with blob path prefix filter and managed identity auth
  5. Retention policy: Set based on your compliance and analytics requirements (1-7 years)
  6. Caching policy: Hot cache for active analysis window (30-90 days typical)
  7. Access control: ADX database viewer role for analysts, admin role for pipeline operators
  8. Monitoring: Set up ingestion failure alerts on the ADX cluster and export failure alerts on the Cost Management export

For more details, refer to the official documentation: What is Microsoft Cost Management.

Leave a Reply