By Amit & Animesh, Co-founders, Nuvika Technologies

Two people. 50+ years of combined IT experience. No funding. No team beyond us. 470+ cost optimization rules across five platforms.

That’s where Fintropy stands today. But six months ago, it was a blank screen and a shared conviction that cloud cost optimization was broken.

This is the unfiltered story of how we built an enterprise-grade, multi-cloud FinOps platform — the tools we used, the decisions we made, the things that worked, and the things that didn’t.


Why We Built Fintropy

Between us, Animesh and I have spent 50+ years working in IT — much of it helping companies navigate cloud infrastructure, make smart technology decisions, and run efficient operations. The pattern we saw over and over:

Companies were overpaying for cloud. Not by a little — by 25-30%. Idle VMs running 24/7. Premium-tier databases for dev/test workloads. Azure Firewalls deployed during a PoC and never deleted — at $900/month each, with zero traffic. Storage blobs untouched for years but sitting on the most expensive tier.

And nobody was catching it.

The engineering team was busy shipping features. The CFO saw a growing line item but couldn’t pinpoint why. The native tools — Azure Advisor, AWS Cost Explorer, GCP Recommender — showed dashboards but didn’t tell you what to actually change.

Worst of all, many companies relied on managed service partners who earned a margin on cloud spend. Their revenue went up when the customer’s bill went up. The structural incentive to optimize simply didn’t exist. This isn’t malice — it’s a business model. But it means nobody is watching the bill on the customer’s behalf.

Then there was the SLA problem. Every cloud provider guarantees uptime. When they miss it, customers are owed credits. But none of the providers notify you. None of them auto-issue refunds. You have to detect the breach, collect evidence, and file a case within a tight deadline — Azure gives you 2 months, GCP just 30 days, AWS until the end of the next billing cycle. Almost nobody does this. Millions in credits go unclaimed every year.

We kept thinking: someone should build a tool that fixes all of this. Then we decided that someone was us.


The AI Development Journey

Phase 1: Writing Python by Hand

We started the way every developer starts — writing code line by line. Python, FastAPI, SQLAlchemy. Every function handcrafted.

It was slow. A single cloud cost rule — the detection logic, the remediation recommendation, the API integration — could take a full day. At that pace, building 470+ rules would have taken years.

But this phase was invaluable. We understood every line of the codebase. We made architectural decisions that would hold up later: the rules engine design, the multi-tenant isolation pattern using PostgreSQL Row-Level Security, the async worker architecture. These foundational decisions were human decisions. No AI involved. They came from decades of building and operating enterprise systems.

Phase 2: Copy-Paste AI (ChatGPT + Gemini)

When ChatGPT 4.1 and Gemini became available, we started using them the way most developers do — as a better Stack Overflow. Type a function description into the chat window, get generated code back, copy it into VS Code, fix what was wrong.

It was maybe 2x faster than writing from scratch. But the friction was real. Context switching between a browser tab and the editor. Re-explaining the codebase every time. Generated code that worked in isolation but didn’t fit our architecture.

We learned an important lesson in this phase: AI-generated code without architectural context is just boilerplate. It saves typing, not thinking.

Phase 3: Inline AI (GitHub Copilot)

The next step was bringing AI into the editor. GitHub Copilot in VS Code changed the speed of writing boilerplate — tab-completion for code, with enough context to be useful.

We tested every model available: ChatGPT, Gemini, Grok, DeepSeek. Each had strengths. Grok was surprisingly good at infrastructure code. Gemini was fast for simple utilities. DeepSeek was impressive for the price.

But the fundamental limitation remained. Copilot works at the line or function level. It doesn’t understand your architecture across files. It doesn’t know that this API endpoint connects to that database model through this service layer with those specific multi-tenant isolation rules. For simple code, it was excellent. For complex, cross-cutting changes, we were still doing the heavy lifting ourselves.

Phase 4: AI as Lead Developer (Claude Code)

Then we discovered Claude Code. And everything changed.

This wasn’t autocomplete. This wasn’t copy-paste from a chat window. This was handing Claude an entire module specification — “build the SLA monitoring engine for Azure, here’s our existing patterns for AWS, here’s the database schema, here’s how we handle multi-tenant isolation” — and getting back production-ready code that we’d review and ship.

Claude Code understood context across files. It knew our FastAPI patterns, our SQLAlchemy models, our Pydantic v2 schemas, our async task queue architecture. It caught edge cases we hadn’t thought of. It suggested improvements to error handling that made the code more resilient.

This is where Fintropy’s rule engine exploded from dozens of rules to 470+. We could describe a cloud cost pattern — “detect Azure Firewall instances in non-production environments that cost $900/month even with zero traffic” — and Claude Code would generate the detection logic, the API calls using the correct azure-mgmt-network SDK, the recommendation text, the severity rating, and the savings calculation. What used to take a day took an hour.

Claude Code became our third team member. The one who never sleeps, never forgets the codebase, and never argues about coding style.

Phase 5: AI Across Everything

Building a product isn’t just writing code. There’s research, strategy, positioning, design, content, and a hundred other things that a 2-person team has to handle.

Our AI toolkit expanded:

Perplexity (via the Comet browser) became our research engine. Cloud pricing documentation, SLA fine print across all providers, competitive analysis of existing FinOps tools, market sizing — all researched at a depth that would have taken weeks manually.

Claude (the chat interface, not Claude Code) became our business strategist. We used it for pricing models, positioning frameworks, go-to-market planning, messaging guides, and sales strategy. The business side of building a startup — which we needed to sharpen — became navigable because we could pressure-test every decision with a thinking partner that didn’t get tired.

Canva for design assets. Whisper.ai for transcription. Google Gemini integrated into our frontend for AI-powered features. Vertex AI and Azure AI for backend intelligence. And probably 20 other tools we tried, tested, and either adopted or discarded along the way.

The lesson: AI isn’t one tool. It’s an ecosystem. And the competitive advantage isn’t in knowing which tools exist — it’s in knowing how to use them together for a specific purpose.


The Infrastructure: Tri-Cloud by Design

Here’s a decision that surprises people: Fintropy runs its own infrastructure across all three major clouds. Not for redundancy — for credibility and quality.

GCP — Development (asia-south1)

Our development environment runs on GCP. Cloud Run handles our backend, frontend, and worker services. Cloud Run Jobs manage database migrations and cross-scan bootstrapping. Data lives in Cloud SQL (PostgreSQL 15) with Memorystore (Redis 7) for caching. Async work flows through Cloud Tasks and Cloud Scheduler for cron jobs.

The networking layer uses a Global HTTPS Load Balancer with managed SSL certificates, Serverless NEG, and automatic HTTP-to-HTTPS redirect. Everything runs inside a VPC with a Serverless VPC connector. Secrets live in Secret Manager, encryption keys in Cloud KMS, and container images in Artifact Registry. Authentication for CI/CD uses Workload Identity Federation — completely keyless.

Azure — QA and Web (South India)

Our QA environment and web layer run on Azure. Container Apps host our backend, worker, and frontend services, with Container Apps Jobs for batch operations. Container images are stored in Azure Container Registry.

The database layer uses PostgreSQL Flexible Server 15 with Azure Cache for Redis. The networking is hardened with Application Gateway WAF v2 running OWASP 3.2 rulesets, a full VNet with NAT Gateway, and Private DNS Zones. Secrets live in Key Vault, encryption in KMS, and logs flow to Log Analytics Workspace. Identity is managed through User-Assigned Managed Identities with RBAC — no service principal secrets stored anywhere.

AWS — Production (ap-south-1)

Our production environment runs on AWS. ECS handles the cluster with separate backend, frontend, and worker services. Container images are stored in ECR.

The data layer uses RDS (PostgreSQL) with ElastiCache (Redis). Networking runs through an Application Load Balancer with Route 53 for DNS and ACM for TLS certificates. Secrets live in Secrets Manager, encryption in KMS. The VPC is properly segmented with public and private subnets, NAT gateways, and security groups. CI/CD authenticates via IAM task roles with GitHub OIDC federation.

Why Three Clouds?

Because we help customers optimize across all three. Running our own infrastructure on each means we find bugs, SDK quirks, API inconsistencies, and cost gotchas before our customers do. When Fintropy tells you that your Azure Firewall is wasting money, that recommendation was developed and tested on real Azure infrastructure — not in a simulator.

When we scan a customer’s AWS environment using boto3, we know the SDK’s quirks because we run boto3 in our own production every day. When we query Azure Monitor metrics, we know the latency patterns because our own QA environment runs on Azure Monitor.

Tri-cloud by design — not by accident.


The Full Tech Stack

Backend

Our API and rules engine are built on FastAPI with Python 3.11, using Uvicorn for async serving and Pydantic v2 for data validation. Database operations run through SQLAlchemy 2.0 with asyncpg for native PostgreSQL async support. Migrations are managed by Alembic. Background tasks are processed through cloud-native queues (Cloud Tasks, Service Bus, SQS) with APScheduler for scheduled jobs.

Data analysis for billing normalization and anomaly detection uses pandas and NumPy.

The Scanning Engine

The heart of Fintropy is the scanning engine — 470+ deterministic cost rules across four cloud providers. The SDKs powering it:

AWS: boto3 and botocore — 193 rules covering compute, storage, databases, networking, AI/ML, reserved instances, savings plans, and EDP commitments.

Azure: The full azure-mgmt family — compute, network, monitor, cost-management, SQL, advisor, resource-health, support — plus azure-identity, azure-monitor-query, and azure-ai-ml. 149 rules covering everything from idle VMs to overprovisioned Synapse pools to abandoned HDInsight clusters.

GCP: google-cloud asset, billing, monitoring, compute, BigQuery, storage, container, run, KMS, resource-manager, service-health, and Vertex AI. 59 rules and growing.

OCI: Oracle Cloud SDK for OCI environments. Coverage expanding.

Plus 17 Kubernetes rules, 26 VMware on-prem rules, and 8 multi-cloud cross-cutting rules.

Every rule is deterministic — not ML-based guesses. Each rule is versioned, auditable, and reproducible. A rule that flagged a resource yesterday will flag it the same way today. This is critical for enterprise customers who need to explain findings to their auditors.

Frontend

React 19 with TypeScript 5.6, built with Vite 7 for fast development cycles. Styled with Tailwind CSS and PostCSS. Data visualization powered by Recharts. Forms handled by React Hook Form with Zod validation. Tested with Vitest, Playwright, and Testing Library.

Security and Authentication

Authentication uses PyJWT for token management with bcrypt and argon2 for password hashing and cryptography for data encryption. Secret management uses HashiCorp Vault for on-prem deployments.

Every cloud environment authenticates via OIDC:

  • GCP: Workload Identity Federation — completely keyless
  • Azure: User-Assigned Managed Identities with RBAC
  • AWS: IAM task roles with GitHub OIDC federation

No long-lived API keys. No stored credentials. No key rotation overhead.

Observability

Prometheus for metrics collection with OpenTelemetry for distributed tracing — OTLP traces instrumented across FastAPI, SQLAlchemy, Redis, and httpx. Every request can be traced from ingress to database and back across services.

CI/CD Pipeline

GitHub Actions orchestrates everything. Code quality is enforced by ruff, black, and isort (Python) and gts/ESLint with tsc (TypeScript). Security scanning runs Trivy on container images and CodeQL for static analysis with SARIF reporting. Container images are built with Docker Buildx for multi-architecture support. Coverage reports go to Codecov.

Deployment follows branch-based promotion: merge to main deploys to GCP (dev), release/qa branch deploys to Azure (QA), and version tags (v*) deploy to AWS (production) — with manual approval required for production.

Notifications

Alerts and notifications flow through Slack (SDK), Microsoft Teams (webhooks), and Microsoft 365 Graph API (email). Customers choose their channel.

Infrastructure as Code

All infrastructure is managed via Terraform with per-cloud modules. Customer onboarding templates are available as Terraform, CloudFormation, and Bicep — meeting customers wherever their IaC practices already live.

Deployment Models

The SaaS version runs as Docker containers (Alpine Linux base) on cloud-native compute — Cloud Run, Container Apps, ECS. For privacy-sensitive customers — banks, government, regulated industries — we offer a private deployment option: an OVA/VHD appliance deployable inside the customer’s own subscription. Their data never leaves their environment.


The Architecture Decisions That Matter

Not every decision was technical. Some of the most important ones were about what NOT to build and how to think about tradeoffs.

PostgreSQL Row-Level Security for multi-tenancy. Instead of separate databases per customer (complex, expensive) or application-level isolation (fragile, error-prone), we use PostgreSQL’s native RLS. The database engine itself enforces that Customer A can never see Customer B’s data. It’s the same approach used by major SaaS platforms at scale, and it means our security isolation doesn’t depend on our application code being bug-free — it’s enforced at the engine level.

FOCUS 1.2 as the canonical billing schema. Instead of building our own billing data model, we adopted the FinOps Foundation’s FOCUS open standard. Every cloud’s billing data gets normalized to one schema. This means our analysis, visualization, and anomaly detection code works identically across AWS, Azure, and GCP. One codebase, three clouds, one truth.

Deterministic rules, not ML. Our 470+ rules are hand-crafted detection logic, not machine learning models. This was deliberate. ML models are black boxes — when a model flags a resource, you can’t explain why in a way that satisfies an auditor. Our rules are auditable, versioned, and reproducible. A finding comes with an explanation a CFO can read and an engineer can verify. The logic is transparent.

OIDC everywhere. Zero stored credentials in our CI/CD pipeline. Zero long-lived API keys anywhere. Every authentication is a short-lived token obtained via federated identity. GCP uses Workload Identity Federation. Azure uses Managed Identities. AWS uses OIDC IAM roles. This is the security standard we practice and the standard we recommend to our customers.

Lazy imports for cold start optimization. Cloud Run charges from the moment your container starts. Cloud SDKs are massive — boto3, azure-mgmt-, google-cloud- together would add seconds to every cold start. We load SDK imports inside functions, not at module level. The container starts fast; SDKs load only when needed.

WAF in front of QA, not just production. Our Azure QA environment runs behind Application Gateway WAF v2 with OWASP 3.2 rulesets. Most teams only add WAF to production. We test behind WAF so that security rules never surprise us during a prod deployment.

Four cloud SDKs, one scanning interface. Our rules engine abstracts the cloud SDK differences behind a common scanning interface. A rule author writes detection logic once; the engine handles the SDK-specific API calls. This is how we scaled to 470+ rules without the codebase becoming unmanageable.


What AI Can’t Do

After months of building with AI, here’s the honest truth: AI multiplied us. Two people now ship what would typically require a team of 15-20.

But AI didn’t make the decisions that matter.

AI didn’t decide to build SLA credit recovery. That came from decades of watching companies leave money on the table because nobody told them they were owed it.

AI didn’t identify the managed partner conflict of interest as our sharpest competitive angle. That came from understanding the cloud ecosystem’s economics from the inside.

AI didn’t design the multi-tenant isolation architecture. That came from knowing what enterprise customers require before they’ll trust you with their billing data.

AI didn’t set the pricing model. That came from understanding what a CFO values versus what an engineer values — and knowing that the buyer is often not the user.

AI didn’t tell us to run three clouds. That came from knowing that credibility with customers requires eating your own cooking.

AI writes the code. 50 years of combined experience decides what to build. And the gap between “technically possible” and “worth building” is where that experience pays off.


Where We Are Now

Fintropy is in closed beta. 470+ cost optimization rules across AWS, Azure, GCP, Kubernetes, and on-premise VMware. Automated SLA credit recovery — the only FinOps tool that does this. FOCUS 1.2 billing normalization. Real-time anomaly detection. One-click remediation with approval workflows. Available as SaaS or private deployment.

Built by two people with 50+ years of experience and a very capable AI.

We’re selecting companies for a free 2-week pilot. If you’re interested in being considered, reach out at nuvikatech.com or DM either of us on LinkedIn.

Our mission is to end digital waste. Every rupee saved goes back to the people who power the business.


Amit and Animesh are the co-founders of Nuvika Technologies and the creators of Fintropy. With 50+ years of combined IT experience — including customer support, customer success, engineering, and cloud operations — they built the tool they always wished existed.

Learn more: nuvikatech.com/Fintropy_Overview.html