Back to Blog
Data Integration
18 min read

Data Warehouse Schema for SaaS Metrics

Complete guide to data warehouse schema for saas metrics. Learn best practices, implementation strategies, and optimization techniques for SaaS businesses.

Published: September 24, 2025Updated: December 28, 2025By Ben Callahan
Data integration pipeline and infrastructure
BC

Ben Callahan

Financial Operations Lead

Ben specializes in financial operations and reporting for subscription businesses, with deep expertise in revenue recognition and compliance.

Financial Operations
Revenue Recognition
Compliance
11+ years in Finance

The difference between SaaS companies with trustworthy metrics and those constantly reconciling spreadsheets often comes down to data warehouse schema design. A well-designed schema for SaaS metrics enables accurate MRR calculation, cohort analysis, and churn prediction while remaining flexible enough to evolve with your business. Poorly designed schemas create technical debt that makes every analytical question harder than it should be. This guide covers schema patterns, dimensional modeling strategies, and implementation approaches specifically for SaaS subscription metrics.

Schema Design Principles for SaaS

SaaS metrics have unique characteristics that influence schema design: time-series nature of subscriptions, complex event sequences (trials, conversions, upgrades), and the need to track both point-in-time state and changes over time. These principles guide effective schema design.

Dimensional Modeling for Subscriptions

Dimensional modeling separates facts (measurements) from dimensions (descriptive context). SaaS facts: subscription events, payments, MRR values, usage counts. Dimensions: customers, plans, time periods, cohorts. This separation enables flexible analysis—slice MRR by any dimension without restructuring. Star schema connects fact tables to dimension tables through foreign keys. Most SaaS analytics questions can be answered by joining one fact table to relevant dimensions.

Point-in-Time vs Event-Based Models

Two approaches to tracking subscription state. Event-based: store every change (subscription_created, plan_changed, cancelled). Reconstruct state at any point by replaying events. Point-in-time snapshots: store daily/monthly snapshots of current state. Faster queries but larger storage. Most teams use hybrid: event tables for detailed analysis, snapshot tables for standard reports. Event sourcing provides auditability; snapshots provide query performance.

Slowly Changing Dimensions

Customer and plan attributes change over time. Type 1 SCD: overwrite old values—lose history. Type 2 SCD: add new row with effective dates—preserve history but complex joins. Type 3 SCD: add columns for previous values—limited history depth. For SaaS, Type 2 SCD works best for plan pricing (need historical accuracy), Type 1 for customer name (current value usually sufficient). Design based on analytical requirements for each attribute.

Grain Definition

Grain is the level of detail in fact tables—the most fundamental design decision. Customer-day grain: one row per customer per day, enables time-series analysis. Transaction grain: one row per payment/event, enables detailed audit. Subscription grain: one row per subscription version, enables lifecycle analysis. Wrong grain makes queries impossible or inefficient. Start with finest grain needed for analysis; aggregate to coarser grains for performance.

Event-First Design

Store granular events first, aggregate into snapshots second. You can always aggregate events into summaries, but you can't recover detail from aggregates.

Core SaaS Metric Tables

Every SaaS data warehouse needs certain foundational tables. These core tables enable the essential metrics—MRR, churn, LTV, cohorts—that drive business decisions. Design these tables carefully; they're the foundation for all analytics.

Subscription Facts Table

Central table tracking subscription state changes. Columns: subscription_id, customer_id, plan_id, event_type (created, upgraded, downgraded, cancelled, reactivated), event_timestamp, mrr_change, mrr_after, billing_interval, trial_end_date, contract_end_date. One row per subscription state change. Enable reconstruction of subscription timeline and MRR movements. Include both the change (mrr_change) and resulting state (mrr_after) for flexible analysis.

MRR Snapshot Table

Daily or monthly snapshots of MRR state. Columns: snapshot_date, customer_id, subscription_id, mrr, plan_id, status, cohort_month, days_active. One row per active subscription per snapshot period. Enables time-series analysis without event replay. Pre-calculate MRR to avoid repeated calculation in queries. Include cohort assignment for retention analysis. Monthly snapshots reduce storage; daily provides more precision for short-term analysis.

Customer Dimension Table

Master table of customer attributes. Columns: customer_id, stripe_customer_id, company_name, industry, employee_count, acquisition_source, first_subscription_date, current_mrr, lifetime_revenue, health_score. One row per customer. Include both static attributes (acquisition_source) and computed attributes (lifetime_revenue). Update computed attributes on schedule (daily) or via streaming. Links subscription facts to customer context.

Plan Dimension Table

Reference table for pricing plans. Columns: plan_id, plan_name, base_price, billing_interval, feature_tier, is_active, effective_date, end_date. Use Type 2 SCD for plan versions—price changes shouldn't retroactively affect historical analysis. Include plan hierarchy if you have plan families. Map to Stripe price IDs for integration. This table enables analysis by plan without hardcoding plan names in queries.

Single Source of Truth

Define MRR calculation once in your schema. Every report should pull from the same MRR table—multiple calculation methods create confusion and mistrust.

MRR Calculation Schema

Accurate MRR calculation requires careful schema design. The schema must handle annual plans, discounts, multi-currency, and the various MRR movement types (new, expansion, contraction, churn, reactivation). Get this right and all downstream metrics become reliable.

Normalizing Billing Intervals

Store MRR normalized to monthly regardless of billing interval. Annual plan of $1,200/year = $100 MRR. Quarterly plan of $300/quarter = $100 MRR. Store both: original_amount (what customer pays), mrr (normalized monthly). Normalization enables apples-to-apples comparison across billing intervals. Handle edge cases: lifetime deals (typically exclude from MRR or amortize), usage overages (separate from base MRR), one-time fees (exclude entirely).

MRR Movement Classification

Track how MRR changes, not just what it is. Categories: New MRR (first subscription), Expansion MRR (upgrades, add-ons), Contraction MRR (downgrades), Churn MRR (cancellations), Reactivation MRR (returning customers). Schema: mrr_movement table with movement_type, customer_id, subscription_id, mrr_change, movement_date. These categories power growth analysis—where is MRR coming from and going to? Define classification rules clearly and consistently.

Multi-Currency Handling

Store amounts in original currency plus base currency equivalent. Columns: amount_original, currency_original, amount_base, currency_base, exchange_rate, rate_date. Convert at consistent timing: transaction time or period end. Store exchange rate for auditability. Report in base currency for aggregation. Provide currency filters for regional analysis. Consistent currency handling prevents MRR discrepancies from rate fluctuations.

Discount and Coupon Tracking

Track discounts separately from base pricing. Columns in subscription facts: base_mrr (before discount), discount_amount, discount_type (percentage, fixed), coupon_code, net_mrr (after discount). This separation enables analysis: how much revenue are we discounting, which coupons drive conversions, what's our effective discount rate by segment. When discounts expire, track the MRR change as expansion (revenue increased even though plan didn't change).

MRR Reconciliation

Build reconciliation queries comparing calculated MRR to Stripe's reported MRR. Discrepancies indicate schema or calculation issues. Run reconciliation daily.

Cohort Analysis Schema

Cohort analysis reveals how customer behavior varies by when they joined or other defining characteristics. Schema design determines which cohort analyses are possible and how performant they are.

Cohort Assignment Tables

Assign customers to cohorts at signup time. Table: customer_cohorts with customer_id, cohort_month (subscription start month), acquisition_channel, initial_plan, industry. Cohort month never changes—it's when they became a customer. Include multiple cohort dimensions for flexible analysis. Pre-calculate cohort assignment rather than computing in queries. Cohort tables are small (one row per customer) but frequently joined.

Retention Matrix Schema

Pre-compute retention for common time periods. Table: cohort_retention with cohort_month, months_since_signup (0, 1, 2...), customers_remaining, customers_starting, retention_rate, mrr_remaining, mrr_starting, nrr_rate. One row per cohort per month since signup. Pre-computation makes retention dashboards fast. Update monthly after close. Include both logo retention (customer count) and dollar retention (MRR-weighted).

Behavioral Cohort Dimensions

Beyond time-based cohorts, enable behavioral analysis. Columns in cohort tables: activation_status (activated within 7 days?), engagement_tier (high/medium/low), feature_adoption_score, first_plan_tier. These enable analysis like "retention of fast-activating customers vs. slow." Compute behavioral dimensions at cohort assignment time for consistency. Adding new behavioral dimensions requires backfilling historical customers.

LTV by Cohort Schema

Track cumulative revenue by cohort over time. Table: cohort_ltv with cohort_month, months_since_signup, cumulative_revenue, customer_count, ltv_per_customer. Enables LTV curves showing how value accumulates. Compare LTV by acquisition channel, initial plan, or behavioral segment. Include projected LTV from ML models alongside observed LTV. This table powers payback period and LTV:CAC analysis.

Cohort Immutability

Cohort assignment should be immutable—a customer's cohort month never changes. This ensures cohort analysis remains consistent over time.

Churn and Retention Schema

Churn analysis requires tracking not just when customers leave, but the signals preceding churn and the type of churn (voluntary vs. involuntary). Schema design enables both historical analysis and predictive modeling.

Churn Event Table

Detailed record of every churn event. Columns: churn_id, customer_id, subscription_id, churn_date, churn_type (voluntary, involuntary), churn_reason (user-reported or inferred), mrr_lost, tenure_months, last_payment_date, reactivation_eligible. One row per churn event. Enable analysis: churn by reason, churn by tenure, involuntary vs voluntary rates. Link to customer dimension for segment analysis.

Involuntary Churn Tracking

Track payment failures separately from voluntary cancellations. Table: payment_failures with customer_id, subscription_id, failure_date, failure_reason, retry_count, recovery_date (if recovered), final_outcome (recovered, churned, pending). Enables dunning analysis: which failure reasons are recoverable, how does retry timing affect recovery, what's our involuntary churn rate. Critical for optimizing payment recovery processes.

Pre-Churn Signals Schema

Store leading indicators of churn for predictive analysis. Table: customer_health_signals with customer_id, signal_date, engagement_score, support_tickets_30d, login_count_30d, feature_usage_trend, payment_failures_90d, nps_score. Daily or weekly snapshots. Join to churn events to identify predictive features. This data trains churn prediction models and enables proactive intervention.

Net Revenue Retention Schema

Track NRR components monthly. Table: nrr_components with month, starting_mrr, new_mrr, expansion_mrr, contraction_mrr, churn_mrr, ending_mrr, gross_retention, net_retention. One row per month. Include segmented views: NRR by customer size, plan tier, industry. NRR is the single most important SaaS health metric—schema should make it trivially easy to calculate and segment.

Churn Definition Consistency

Define churn precisely and consistently. Is a paused subscription churned? What about 90-day delinquent? Document definitions and apply them uniformly.

Implementation and Optimization

Good schema design is necessary but not sufficient—implementation choices affect query performance, maintenance burden, and accuracy. These practices ensure your schema performs well in production.

Partitioning Strategies

Partition large tables by time for query performance. Subscription events: partition by event_date. MRR snapshots: partition by snapshot_month. Partitioning enables query pruning—scans only relevant partitions. Choose partition grain based on query patterns: monthly for historical analysis, daily for recent data. Most cloud warehouses handle partitioning automatically with proper clustering keys.

Indexing for SaaS Queries

Index columns used in filters and joins. Essential indexes: customer_id on all tables, event_date/snapshot_date for time filtering, subscription_id for lifecycle queries. Composite indexes for common query patterns: (customer_id, event_date) for customer timeline queries. Avoid over-indexing—each index has write overhead. Profile slow queries to identify missing indexes.

Incremental Updates

Design for incremental rather than full refreshes. Track high watermarks (last processed timestamp). Merge new data into existing tables rather than replacing. Incremental updates reduce compute costs and enable fresher data. Handle late-arriving data: events with timestamps before the watermark. Design idempotent updates—processing the same data twice should produce same result.

Data Quality Enforcement

Build quality checks into schema and transformations. Constraints: primary keys, not null on required fields, valid value ranges. Automated tests: MRR balances month-over-month, customer counts match source, no orphaned foreign keys. Run quality checks after every data load. Alert on failures before data reaches dashboards. Document data quality rules and exceptions.

dbt for Schema Management

Use dbt to manage schema transformations. dbt provides version control, testing, documentation, and dependency management—essential for maintaining complex metric schemas.

Frequently Asked Questions

Should I store raw Stripe data or transform it immediately?

Store both. Land raw Stripe data in staging tables exactly as received—this provides audit trail and enables reprocessing. Transform into analytical schema in separate tables. Never transform in place; always preserve raw data. Most teams keep raw data for 90+ days, transformed data indefinitely. The staging-to-analytics transformation is where you apply business logic and calculate metrics.

How do I handle plan changes mid-billing-cycle?

Record plan changes as events with timestamps and MRR impact. For MRR reporting, use the plan active on the snapshot date. For movement analysis, record the change as expansion or contraction with prorated MRR change. Handle prorations explicitly: store both the prorated amount (what was charged) and the full-period MRR (what the new rate implies). Stripe webhook events provide the data; your schema must store it appropriately.

What grain should I use for MRR snapshots?

Daily snapshots provide most flexibility but more storage. Monthly snapshots are sufficient for standard reporting and reduce storage 30x. Consider hybrid: daily snapshots for current month (enables real-time dashboards), monthly for history (enables trend analysis without excessive storage). Most SaaS companies find monthly grain sufficient for historical analysis with daily for operational needs.

How do I calculate MRR for annual plans?

Normalize annual plans to monthly: Annual Amount ÷ 12 = MRR. Store both original amount and normalized MRR. This enables apples-to-apples comparison across billing intervals. For movement classification, the full annual MRR counts as new MRR when the subscription starts—you've contracted for that recurring revenue. Some companies prefer recognizing MRR monthly as it's earned; document your approach clearly.

Should I store computed metrics or calculate on query?

Store frequently-accessed computations; calculate rarely-needed ones. Store: current MRR (accessed constantly), cohort assignments (never change), LTV (complex calculation). Calculate: ad-hoc segmentations, one-time analyses. Pre-computation trades storage for query performance and consistency. When you pre-compute, document the calculation and update schedule. Most teams find pre-computing core metrics (MRR, churn rate, NRR) essential for dashboard performance.

How do I handle free trials in my MRR schema?

Track trials separately from paying subscriptions. Trials have $0 MRR but represent potential. Table: trials with trial_id, customer_id, plan_id, trial_start, trial_end, conversion_date (if converted), converted_subscription_id. This enables: trial conversion rate, time-to-conversion, trial cohort analysis. When trial converts, create subscription event with new MRR. Don't count trial as churn if they never paid—they weren't MRR customers.

Key Takeaways

Data warehouse schema design determines whether your SaaS metrics are trustworthy and accessible or constantly questioned and painful to produce. Invest in dimensional modeling, careful grain decisions, and consistent MRR calculation. Build for both historical accuracy and query performance through appropriate partitioning and pre-computation. Most importantly, define metrics once in your schema so every report tells the same story. QuantLedger provides pre-built SaaS metric schemas on top of your Stripe data—accurate MRR, churn, cohorts, and LTV without designing and maintaining complex warehouse infrastructure.

Transform Your Revenue Analytics

Get ML-powered insights for better business decisions

Related Articles

Explore More Topics