Most people assume SLA breach detection is the hard part. Detect an outage, file a claim, get a credit. Simple enough.
It isn’t. Detection is actually the easiest step. The complexity lives in everything that comes after: calculating the credit accurately, managing per-provider filing windows, handling breaches that span month boundaries, and tracking claims through provider review processes that each speak different status vocabularies.
Here’s how Fintropy handles the complete lifecycle — from the moment an SLA breach is detected to the moment a credit lands in a customer’s account.
The Four Steps
Step 1: Detect
Fintropy monitors uptime snapshots for each cloud subscription. When uptime for a service drops below its SLA threshold — 99.9% for most AWS services, 99.95% for certain Azure tiers — the system opens a breach record:
# A breach starts as "Pending" when we detect the dip
breach = SLABreach(
tenant_id=tenant_id,
subscription_id=subscription_id,
service=service,
status="Pending",
start_time=now,
provider=provider,
evidence={"tier": sla_tier_id},
)
The breach stays open as long as the service is below threshold. When it recovers, we close it.
Step 2: Quantify — The Hard Part
This is where the real complexity lives.
Month-boundary splits
SLA credits are calculated as a percentage of the monthly spend for the affected service. If a breach starts on January 28th and ends on February 3rd, it crosses a month boundary. The credit calculation uses January’s spend for January’s downtime, and February’s spend for February’s downtime.
We split cross-month breaches into two records:
def resolve_breach(self, breach, end_time):
if breach.start_time.month != end_time.month:
# Split at month boundary
original_breach.end_time = month_end # Jan 31
spillover = SLABreach(
start_time=feb_1,
end_time=end_time,
description=original.description + " (continued from previous month)",
)
Credit calculation
Each provider publishes credit tiers. AWS EC2, for example:
| Uptime | Credit |
|---|---|
| 99.0% – 99.99% | 10% |
| 95.0% – 99.0% | 30% |
| < 95.0% | 100% |
We maintain these tiers in a registry and calculate against them:
Monthly EC2 spend: $50,000
Uptime this month: 99.1% (just below 99.99%)
Credit tier: 10%
Credit amount: $5,000
The credit is calculated against actual spend, sourced from billing data (FOCUS 1.2 format internally) or spend snapshots as a fallback.
Step 3: File
Once a breach is quantified and resolved, we auto-file the claim — if the tenant has opted into automated filing.
Each provider has a different API and a different filing deadline:
| Provider | API | Filing Window |
|---|---|---|
| AWS | Support API CreateCase | 60 days |
| Azure | Support REST API /supportTickets | 60 days |
| GCP | Cloud Support API v2 cases.create | 30 days |
GCP’s 30-day window is the most dangerous. Miss it and the credit is forfeited — no appeals, no extensions. Our filing system checks the window before every attempt and fires alerts as the deadline approaches.
On successful filing:
breach.status = "Filed"
breach.claim_reference = case_id
breach.evidence["claim"] = {
"case_id": case_id,
"submitted_at": datetime.utcnow().isoformat(),
"filing_deadline": deadline,
"credit_status": "pending",
}
On failure (wrong support plan tier, missing credentials, API error):
breach.status = "Assisted Filing Required"
# Notify user with pre-filled form data and instructions
The failure path never breaks the resolve transaction. A filing failure is logged and surfaced as an alert, but the breach record is intact.
Step 4: Poll
After filing, we need to know when the credit is approved or denied. Providers don’t send webhooks for this — you have to poll.
A Cloud Scheduler job fires every day at 2am and polls all breaches in “Filed” status:
# Each provider speaks a different status language
# We normalise to: approved / denied / pending / unknown
AWS: "resolved" → approved
"work-in-progress" → pending
Azure: "closed" → approved
"open" → pending
GCP: "SOLUTION_PROVIDED" → approved
"CLOSED" → denied
When a status changes, the breach record is updated and an alert fires:
breach.status = "Credit Approved"
# Alert: "Your SLA credit of $5,000 for EC2 has been approved. Case ID: CASE-12345."
The Lifecycle at a Glance
Pending → Active → Resolved → Filed → Credit Approved
↘ Assisted Filing Required
↘ Expired (missed filing window)
↘ Credit Denied
The full state machine handles every real-world edge case: cross-month breaches, provider API failures, support plan tier limitations, missed deadlines, and claim denials that need manual follow-up.
Why This Matters
Most enterprises lose their SLA credits not because the credits don’t exist, but because the process of claiming them is manual, time-sensitive, and easy to deprioritise. The average cloud team has more urgent things to do than monitor case statuses in three different vendor support portals.
Automating the lifecycle — from detection through credit confirmation — is the core value proposition of Fintropy’s SLA module. The complexity in the implementation is exactly proportional to the complexity that companies would have to manage manually.
Fintropy is a multi-cloud FinOps platform in private beta. Learn more at nuvikatech.com