Fintropy’s core value is finding cloud waste — resources that are over-provisioned, idle, orphaned, or structurally inefficient. To do that reliably at enterprise scale, we need a lot of rules. And those rules need to work consistently across six different provider categories.

Today we ship 433 scan rules:

ProviderRules
AWS193
Azure149
GCP59
On-Premises (VMware/Hyper-V)26
Kubernetes17
Multi-cloud8

Here’s how they’re structured, how they’re discovered, and how we think about the two tiers of rule quality.


The Rule Interface

Every rule — regardless of provider — implements the same base class:

class BaseRule:
    rule_id: str        # Unique identifier, e.g. "aws-ec2-idle-instance"
    name: str           # Human-readable name
    description: str    # What it finds
    severity: str       # "LOW", "MEDIUM", "HIGH", "CRITICAL"
    tier: int           # 1 = Deterministic, 2 = AI-assisted

    def evaluate(self, context: ScanContext) -> list[RuleResult]:
        """Run this rule against the scan context. Return findings."""
        raise NotImplementedError

A RuleResult contains:

  • The affected resource ID
  • What was found (specific values that triggered the finding)
  • Estimated monthly savings
  • Remediation steps
  • Evidence for audit trails

The interface is simple. The implementations vary enormously.


The Two Tiers

Tier 1: Deterministic

These rules have objective, binary criteria. Either the resource matches or it doesn’t. There’s no judgment call.

Examples:

  • EC2 instances with < 5% average CPU over 14 days — idle
  • EBS volumes not attached to any instance — orphaned
  • Azure SQL databases with < 10 DTUs consumed — overprovisioned
  • GCP disks not attached and older than 30 days — waste

Tier 1 rules are used for audits and compliance reporting. When an enterprise customer asks “how many orphaned resources do we have?” they need a number they can report to their CFO. That number has to be accurate and reproducible. No statistical uncertainty.

Tier 2: AI-Assisted (Pattern Discovery)

These rules look for patterns that require context to evaluate. The criteria aren’t binary — they require understanding workload characteristics.

Examples:

  • Instances with bursty CPU that could use a different family — requires understanding workload shape
  • Reserved Instance coverage gaps — requires understanding commitment patterns over time
  • Cross-region data transfer optimisation — requires understanding application topology

Tier 2 rules use Gemini for pattern discovery on historical telemetry. They produce recommendations with confidence scores rather than binary findings. They’re appropriate for cost optimisation conversations, not compliance audits.

Tier 1: DeterministicBinary criteriaReproducible resultsAudit & compliance useNo confidence score needed193 + 149 + 59 + 17 rulesTier 2: AI-AssistedPattern recognitionConfidence scoredOptimisation useContext-dependent26 + 8 pattern rules

The Rule Registry

433 rules can’t be manually registered. We use an auto-discovery registry that scans the rules directory at startup:

class RuleRegistry:
    def discover_rules(self, rules_dir: Path) -> dict[str, BaseRule]:
        rules = {}
        for rule_file in rules_dir.rglob("*.py"):
            if rule_file.name.startswith("_"):
                continue
            module = importlib.import_module(module_name_from_path(rule_file))
            for name, obj in inspect.getmembers(module, inspect.isclass):
                if issubclass(obj, BaseRule) and obj is not BaseRule:
                    rule = obj()
                    rules[rule.rule_id] = rule
        return rules

Adding a new rule is one file. Drop it in the right provider directory, give it a unique rule_id, implement evaluate(). The registry finds it on next startup.

This design means we can iterate quickly: write a rule, test it against a sample context, deploy. No registration step, no central file to update.


The Scan Context

Rules don’t call cloud APIs themselves. They receive a pre-built ScanContext that contains already-fetched data:

@dataclass
class ScanContext:
    subscription_id: str
    provider: str
    region: str
    resources: list[ResourceSnapshot]
    metrics: dict[str, list[MetricDatapoint]]
    billing: list[FocusBillingRecord]
    tags: dict[str, str]

This separation is critical for performance. A scan job fetches all the data once, then runs all applicable rules against it. Rules are pure functions — same context, same output, every time. They can be tested with synthetic context data without any cloud API access.


Versioning for Audit Trails

Rules that are used in compliance audits need version tracking. When a customer generates a report showing 47 orphaned resources in Q1, they need to be able to reproduce that number in Q4 to show remediation.

Each rule has a version, and each scan result stores the rule version used. The registry maintains the full history of rule definitions — no rule is ever deleted, only superseded.


What 433 Rules Taught Us

The two biggest lessons:

1. Rules need to be lightweight. A rule that makes its own API calls during evaluation becomes a bottleneck. The context-first design prevents this by design.

2. False positives are worse than missed findings. A missed idle instance means a customer pays $50 they didn’t need to. A false positive means a customer’s ops team investigates a phantom issue and loses trust in the platform. We tune aggressively for precision over recall.


Fintropy is a multi-cloud FinOps platform in private beta. Learn more at nuvikatech.com