Most cloud cost conversations start in the wrong place. Teams look at the bill, see a big number, and ask “how do we reduce this?” The real question is: “do we understand why this number is what it is?”

Understanding your cloud spend isn’t the same as reducing it. You need to understand it before you can make good decisions about it.

Here’s a practical audit checklist — five questions to run through with your cloud team this quarter.


Question 1: Do you know which SLA incidents from the past 60 days are eligible for credit claims?

Why it matters: Every major cloud provider guarantees service uptime in their SLAs. When they miss it, they owe you a credit. The filing windows are:

  • AWS: 60 days from the billing cycle after the incident
  • Azure: 60 days from the billing month end
  • GCP: 30 days from when you become eligible

Most enterprises miss these windows not because they don’t know about SLA credits in theory, but because nobody has a job that is specifically “monitor SLA adherence and file claims.”

What to look for:

  • Pull your AWS Health dashboard and look for events marked SERVICE_ISSUE or SCHEDULED_CHANGE in the past 60 days
  • Check Azure Service Health history for your subscriptions
  • Review GCP’s Status Dashboard for services you consume

The number that matters: What percentage of your monthly cloud bill is attributable to services with a non-trivial SLA? For most enterprises with significant EC2, Azure VMs, or GCP Compute, this is 60–80% of spend. A 0.1% availability miss on that base is real money.


Question 2: How would you know if your cloud bill was 15% higher than it should be this month?

Why it matters: Cloud billing errors are more common than most CTOs realise. Mistaken reservation allocations, idle resources that were supposed to be terminated, tags that drifted so cost allocation is wrong — these compound quietly.

What to look for:

On AWS:

  • Compare your CUR (Cost and Usage Report) line items this month vs the same month last year, normalised for growth
  • Check for OnDemand charges in services where you expect Reserved Instance coverage
  • Look for EC2 instances with stopped status still incurring charges (stopped instances still accrue EBS charges)

On Azure:

  • Review your Cost Analysis by resource group with a 30-day trend
  • Check for “unused” reservations in the Reservations dashboard
  • Look for resources in resource groups that should have been deleted

On GCP:

  • Use BigQuery billing export and look for services with > 20% MoM change without a corresponding deployment
  • Check for persistent disk charges on deleted VMs

The signal to watch: A service or resource group growing > 20% month-over-month without a corresponding product or team change is a flag, not a certainty. It requires investigation.


Question 3: What’s your Reserved Instance / Committed Use coverage, and where are the gaps?

Why it matters: On-demand pricing on AWS, Azure, and GCP is 30–60% more expensive than 1-year commitment pricing for the same compute. If you’re running steady-state workloads on on-demand, you’re paying a significant premium.

What to look for:

On AWS:

  • In Cost Explorer, look at “Coverage” in the Reserved Instance section
  • Target 70–80% coverage for predictable workloads; the rest on-demand for flexibility
  • Check for convertible RIs that could be exchanged to match your current instance mix

On Azure:

  • Azure Advisor’s “Cost” recommendations directly flag reservation opportunities
  • Check your VM usage by SKU family — consistent usage of D-series or E-series is an RI candidate

On GCP:

  • Sustained Use Discounts are automatic (you get them by using an instance for > 25% of the month)
  • Committed Use Discounts require opt-in — check your Billing dashboard for CUD utilisation

The benchmark: If your RI/CUD coverage is below 50% for workloads that have been running stably for 3+ months, that’s an optimisation conversation worth having.


Question 4: Do you know which resources are idle — and do you have a process for terminating them?

Why it matters: Orphaned and idle resources are the most common source of cloud waste. They accumulate because deprovisioning is nobody’s job, or because engineers are afraid to delete things they don’t understand.

OrphanedEBS volumesunattached to any EC2Disks from deleted VMsLoad balancers, no targetsIdle<5% CPU for 14 daysDatabases withno active connectionsDev envs running nightsOverprovisionedm5.4xlarge runningat 8% CPU, 12% memoryRDS with <10 DTUsconsumed consistently

What to look for:

  • EC2 instances with < 5% average CPU over the past 30 days
  • RDS instances with < 10 active connections per day
  • Stopped EC2 instances older than 30 days (you’re paying for attached EBS)
  • Azure VMs with < 5% CPU and no active sessions
  • GCP disks not attached to any VM

The process question: How does a resource get terminated at your company? If the answer is “someone has to ask for it,” the friction is probably too high. Idle resource termination should be the default, not the exception.


Question 5: Is your cost allocation accurate enough to make decisions with?

Why it matters: You can’t optimise what you can’t attribute. If your engineering teams don’t know how much their services cost, they’ll make infrastructure decisions without cost as a factor.

What to look for:

  • What percentage of your cloud spend is tagged to a team, product, or cost centre? (Target: > 85%)
  • Do your tags survive across resource lifecycle events — when an AMI is launched from a tagged instance, does the new instance inherit the tags? (Often: no)
  • Can a team lead look at a dashboard and see their team’s cloud spend by service, updated daily?

The metric: “Tagging coverage” — what percentage of billable resources have the tags you require. Most enterprises starting this exercise are at 40–60%. Getting to 90% is a quarter of focused effort.


What Good Looks Like

CategoryAt-RiskAcceptableStrong
SLA credit claim rate< 20%50–70%> 90%
RI/CUD coverage< 40%60–75%> 80%
Idle resource rate> 15% of spend5–10%< 3%
Tagging coverage< 60%75–85%> 90%
Cost anomaly detectionManual/noneWeekly reviewDaily automated

Most teams are in the “At-Risk” column on at least two of these. The goal isn’t perfection — it’s knowing where you stand and having a plan.


Fintropy automates SLA credit detection and filing, cost anomaly detection, and idle resource identification across AWS, Azure, and GCP. We’re in private beta — learn more at nuvikatech.com