Most people assume SLA breach detection is the hard part. Detect an outage, file a claim, get a credit. Simple enough.

It isn’t. Detection is actually the easiest step. The complexity lives in everything that comes after: calculating the credit accurately, managing per-provider filing windows, handling breaches that span month boundaries, and tracking claims through provider review processes that each speak different status vocabularies.

Here’s how Fintropy handles the complete lifecycle — from the moment an SLA breach is detected to the moment a credit lands in a customer’s account.


The Four Steps

1. DetectUptime below SLA2. QuantifyCredit calculation3. FileAuto-submit claim4. PollTrack approval/denial

Step 1: Detect

Fintropy monitors uptime snapshots for each cloud subscription. When uptime for a service drops below its SLA threshold — 99.9% for most AWS services, 99.95% for certain Azure tiers — the system opens a breach record:

# A breach starts as "Pending" when we detect the dip
breach = SLABreach(
    tenant_id=tenant_id,
    subscription_id=subscription_id,
    service=service,
    status="Pending",
    start_time=now,
    provider=provider,
    evidence={"tier": sla_tier_id},
)

The breach stays open as long as the service is below threshold. When it recovers, we close it.


Step 2: Quantify — The Hard Part

This is where the real complexity lives.

Month-boundary splits

SLA credits are calculated as a percentage of the monthly spend for the affected service. If a breach starts on January 28th and ends on February 3rd, it crosses a month boundary. The credit calculation uses January’s spend for January’s downtime, and February’s spend for February’s downtime.

We split cross-month breaches into two records:

def resolve_breach(self, breach, end_time):
    if breach.start_time.month != end_time.month:
        # Split at month boundary
        original_breach.end_time = month_end  # Jan 31
        spillover = SLABreach(
            start_time=feb_1,
            end_time=end_time,
            description=original.description + " (continued from previous month)",
        )

Credit calculation

Each provider publishes credit tiers. AWS EC2, for example:

UptimeCredit
99.0% – 99.99%10%
95.0% – 99.0%30%
< 95.0%100%

We maintain these tiers in a registry and calculate against them:

Monthly EC2 spend: $50,000
Uptime this month: 99.1% (just below 99.99%)
Credit tier: 10%
Credit amount: $5,000

The credit is calculated against actual spend, sourced from billing data (FOCUS 1.2 format internally) or spend snapshots as a fallback.


Step 3: File

Once a breach is quantified and resolved, we auto-file the claim — if the tenant has opted into automated filing.

Each provider has a different API and a different filing deadline:

ProviderAPIFiling Window
AWSSupport API CreateCase60 days
AzureSupport REST API /supportTickets60 days
GCPCloud Support API v2 cases.create30 days

GCP’s 30-day window is the most dangerous. Miss it and the credit is forfeited — no appeals, no extensions. Our filing system checks the window before every attempt and fires alerts as the deadline approaches.

On successful filing:

breach.status = "Filed"
breach.claim_reference = case_id
breach.evidence["claim"] = {
    "case_id": case_id,
    "submitted_at": datetime.utcnow().isoformat(),
    "filing_deadline": deadline,
    "credit_status": "pending",
}

On failure (wrong support plan tier, missing credentials, API error):

breach.status = "Assisted Filing Required"
# Notify user with pre-filled form data and instructions

The failure path never breaks the resolve transaction. A filing failure is logged and surfaced as an alert, but the breach record is intact.


Step 4: Poll

After filing, we need to know when the credit is approved or denied. Providers don’t send webhooks for this — you have to poll.

A Cloud Scheduler job fires every day at 2am and polls all breaches in “Filed” status:

# Each provider speaks a different status language
# We normalise to: approved / denied / pending / unknown

AWS:   "resolved"   approved
       "work-in-progress"  pending

Azure: "closed"     approved
       "open"       pending

GCP:   "SOLUTION_PROVIDED"  approved
       "CLOSED"              denied

When a status changes, the breach record is updated and an alert fires:

breach.status = "Credit Approved"
# Alert: "Your SLA credit of $5,000 for EC2 has been approved. Case ID: CASE-12345."

The Lifecycle at a Glance

Pending  →  Active  →  Resolved  →  Filed  →  Credit Approved
                                   ↘  Assisted Filing Required
                                   ↘  Expired (missed filing window)
                                                          ↘  Credit Denied

The full state machine handles every real-world edge case: cross-month breaches, provider API failures, support plan tier limitations, missed deadlines, and claim denials that need manual follow-up.


Why This Matters

Most enterprises lose their SLA credits not because the credits don’t exist, but because the process of claiming them is manual, time-sensitive, and easy to deprioritise. The average cloud team has more urgent things to do than monitor case statuses in three different vendor support portals.

Automating the lifecycle — from detection through credit confirmation — is the core value proposition of Fintropy’s SLA module. The complexity in the implementation is exactly proportional to the complexity that companies would have to manage manually.


Fintropy is a multi-cloud FinOps platform in private beta. Learn more at nuvikatech.com