# How We Built 433 Cloud Scan Rules Across 6 Provider Categories

The rule engine architecture behind Fintropy's cost optimisation scans — deterministic Tier 1 rules, AI-assisted Tier 2 rules, and a registry that auto-discovers both.

April 28, 2026 · 4 min · Amit Jethva

[](#the-rule-registry)

[](#the-scan-context)

[](#versioning-for-audit-trails)

[](#what-433-rules-taught-us)

Fintropy’s core value is finding cloud waste — resources that are over-provisioned, idle, orphaned, or structurally inefficient. To do that reliably at enterprise scale, we need a lot of rules. And those rules need to work consistently across six different provider categories.

Today we ship 433 scan rules:

ProviderRulesAWS193Azure149GCP59On-Premises (VMware/Hyper-V)26Kubernetes17Multi-cloud8

Here’s how they’re structured, how they’re discovered, and how we think about the two tiers of rule quality.

---

## The Rule Interface

Every rule — regardless of provider — implements the same base class:

```
class BaseRule:
    rule_id: str        # Unique identifier, e.g. "aws-ec2-idle-instance"
    name: str           # Human-readable name
    description: str    # What it finds
    severity: str       # "LOW", "MEDIUM", "HIGH", "CRITICAL"
    tier: int           # 1 = Deterministic, 2 = AI-assisted

    def evaluate(self, context: ScanContext) -> list[RuleResult]:
        """Run this rule against the scan context. Return findings."""
        raise NotImplementedError

```

A `RuleResult` contains:

- The affected resource ID

- What was found (specific values that triggered the finding)

- Estimated monthly savings

- Remediation steps

- Evidence for audit trails

The interface is simple. The implementations vary enormously.

---

## The Two Tiers

### Tier 1: Deterministic

These rules have objective, binary criteria. Either the resource matches or it doesn’t. There’s no judgment call.

Examples:

- **EC2 instances with < 5% average CPU over 14 days** — idle

- **EBS volumes not attached to any instance** — orphaned

- **Azure SQL databases with < 10 DTUs consumed** — overprovisioned

- **GCP disks not attached and older than 30 days** — waste

Tier 1 rules are used for audits and compliance reporting. When an enterprise customer asks “how many orphaned resources do we have?” they need a number they can report to their CFO. That number has to be accurate and reproducible. No statistical uncertainty.

### Tier 2: AI-Assisted (Pattern Discovery)

These rules look for patterns that require context to evaluate. The criteria aren’t binary — they require understanding workload characteristics.

Examples:

- **Instances with bursty CPU that could use a different family** — requires understanding workload shape

- **Reserved Instance coverage gaps** — requires understanding commitment patterns over time

- **Cross-region data transfer optimisation** — requires understanding application topology

Tier 2 rules use Gemini for pattern discovery on historical telemetry. They produce recommendations with confidence scores rather than binary findings. They’re appropriate for cost optimisation conversations, not compliance audits.

---

## The Rule Registry

433 rules can’t be manually registered. We use an auto-discovery registry that scans the rules directory at startup:

```
class RuleRegistry:
    def discover_rules(self, rules_dir: Path) -> dict[str, BaseRule]:
        rules = {}
        for rule_file in rules_dir.rglob("*.py"):
            if rule_file.name.startswith("_"):
                continue
            module = importlib.import_module(module_name_from_path(rule_file))
            for name, obj in inspect.getmembers(module, inspect.isclass):
                if issubclass(obj, BaseRule) and obj is not BaseRule:
                    rule = obj()
                    rules[rule.rule_id] = rule
        return rules

```

Adding a new rule is one file. Drop it in the right provider directory, give it a unique `rule_id`, implement `evaluate()`. The registry finds it on next startup.

This design means we can iterate quickly: write a rule, test it against a sample context, deploy. No registration step, no central file to update.

---

## The Scan Context

Rules don’t call cloud APIs themselves. They receive a pre-built `ScanContext` that contains already-fetched data:

```
@dataclass
class ScanContext:
    subscription_id: str
    provider: str
    region: str
    resources: list[ResourceSnapshot]
    metrics: dict[str, list[MetricDatapoint]]
    billing: list[FocusBillingRecord]
    tags: dict[str, str]

```

This separation is critical for performance. A scan job fetches all the data once, then runs all applicable rules against it. Rules are pure functions — same context, same output, every time. They can be tested with synthetic context data without any cloud API access.

---

## Versioning for Audit Trails

Rules that are used in compliance audits need version tracking. When a customer generates a report showing 47 orphaned resources in Q1, they need to be able to reproduce that number in Q4 to show remediation.

Each rule has a version, and each scan result stores the rule version used. The registry maintains the full history of rule definitions — no rule is ever deleted, only superseded.

---

## What 433 Rules Taught Us

The two biggest lessons:

**1. Rules need to be lightweight.** A rule that makes its own API calls during evaluation becomes a bottleneck. The context-first design prevents this by design.

**2. False positives are worse than missed findings.** A missed idle instance means a customer pays $50 they didn’t need to. A false positive means a customer’s ops team investigates a phantom issue and loses trust in the platform. We tune aggressively for precision over recall.

---

_Fintropy is a multi-cloud FinOps platform in private beta. [Learn more at nuvikatech.com](https://www.nuvikatech.com)_
