AWS FinOps for Cloud Ops Teams — Week 3: Reserved Instances & Savings Plans

Last week was about visibility. Once you can see what you're spending, the next question is: Why are we paying full price for it? That's what this post is about — committing to long-term usage in exchange for a meaningful discount, without committing yourself into a corner.

The on-demand tax

Running everything on-demand is the easiest path. It's also the most expensive. AWS charges a real premium for the flexibility of on-demand pricing, and for most production workloads you don't need that flexibility — you know roughly how much compute you'll need a year from now, give or take 20%.

The two main ways to pay less for that predictable usage are Reserved Instances and Savings Plans.

Reserved Instances vs. Savings Plans, briefly

Reserved Instances (RIs)

Tied to a specific instance family (e.g., m5) in a specific region.
Best for predictable, stable services where the instance type rarely changes — RDS, ElastiCache, OpenSearch.
Discounts up to ~72% over on-demand.

Savings Plans

Compute Savings Plans apply automatically across EC2, Fargate, and Lambda, regardless of region or instance family. Maximum flexibility, ~66% max discount.
EC2 Instance Savings Plans are family- and region-locked, similar to RIs in scope, but easier to manage.
Both are committed in $/hour rather than instance-hours, which makes them feel more like a budget than an inventory item.

For most modern workloads, Compute Savings Plans should be your default. RIs still make sense for managed databases and a handful of stable EC2 fleets.

Why this gets harder with multiple accounts

The mechanics are the same as a single account, but four operational issues show up at scale:

Sharing. Commitments bought in a child account stay in that child account by default. Buy them in the management (payer) account so unused capacity floats across the org.
Coverage drift. Steady-state usage moves around as teams launch and decommission services. Yesterday's perfect commitment is tomorrow's underused one.
Visibility. Utilization and coverage reports are great in Cost Explorer, but you need to look at them on a cadence. They don't email themselves.
Approval politics. A 1-year, no-upfront Compute Savings Plan worth $200K is a real financial commitment that finance will want to weigh in on. Build that conversation into your process.

An ops-team-friendly buying playbook

The framework I use looks like this:

Step 1 — Find your steady-state floor. Pull at least 60 days of compute usage and identify the bottom of the curve. That's your safe-to-commit baseline. Don't commit to peaks.
Step 2 — Cover ~70% of that baseline first. Going from 0% to 70% coverage is the easy money. Going from 70% to 100% is where regret lives.
Step 3 — Default to Compute Savings Plans, 1-year, no upfront. Lower discount than 3-year all-upfront, but dramatically lower risk. You can always layer on more aggressive plans later.
Step 4 — Reserve specific RIs only for stable RDS / ElastiCache. These workloads don't shift instance families casually, so RIs are a clean fit.
Step 5 — Re-evaluate every quarter, not every month. Monthly is too noisy; quarterly matches how usage actually evolves.

Use AWS's recommendations, but verify

Cost Explorer's Savings Plan recommendations are the right starting point. They analyze 7, 30, or 60 days of usage and propose commitments at hourly granularity.

Two caveats:

Recommendations are based on past usage. If you're about to migrate a chunky workload off EC2 onto Fargate, that's information the recommendation engine doesn't have.
Recommendations always favor a slightly higher commitment than I'd take. Trim them by 10–15% as a safety margin.

Track utilization like an SLA

An unused commitment is just expensive on-demand. Treat utilization as an SLA: target >90% utilization across your portfolio, and alert when it slips. The data is in Cost Explorer's Utilization & Coverage reports, and you can pull it via the API into whatever dashboarding stack you use (we'll wire this into Grafana in week 6).

If utilization drops, the failure modes are usually:

A workload was decommissioned and the commitment didn't shrink with it.
A team migrated from EC2 to Fargate or Lambda mid-quarter.
A region change moved usage out of where the RIs were locked.

None of these are emergencies — they just need to be caught quickly enough that you can lean back into Compute Savings Plans before the unused commitment burns a quarter of value.

What about Spot?

Spot is technically a cost lever, not a commitment, so it doesn't fit neatly into this post. Worth a sentence anyway: for batch, ML training, and stateless services that tolerate interruption, Spot can stack on top of Savings Plans and crush your effective compute rate. It's a great lever after you've covered your steady-state baseline with commitments — not before.

Rule of thumb

Cover ~70% of your steady-state compute with 1-year Compute Savings Plans, layer RIs only where instance types are genuinely stable, and treat utilization as a metric you watch on a weekly cadence. Most teams leave 20–30% of cloud spend on the table simply by not committing to compute they're already using.

Next week we shift from "pay less for what you use" to "use less" — rightsizing and waste reduction.