Data Modeling Guide: Inmon, Kimball & Data Vault Compared
Data Modeling Kimball Inmon Data Vault Star Schema
The Three Philosophies
Data modeling methodologies answer: How should we organize data in our warehouse?
Methodology
Creator
Philosophy
Approach
Inmon
Bill Inmon
Enterprise-first
Top-down, normalized
Kimball
Ralph Kimball
Business-first
Bottom-up, dimensional
Data Vault
Dan Linstedt
Flexibility-first
Hub-satellite, historical
1. Inmon: Enterprise Data Warehouse (EDW)
Core Philosophy
“Build one integrated, normalized enterprise data warehouse first, then create departmental data marts.”
Architecture
Sources Enterprise DW (3NF) Data Marts │ │ │ ▼ ▼ ▼┌──────┐ ┌─────────┐ ┌──────────┐│ OLTP │──────────▶│ Single │─────────────▶│ Sales ││ DB │ │ Source │ │ Mart │└──────┘ │ of │ └──────────┘┌──────┐ │ Truth │ ┌──────────┐│ ERP │──────────▶│ (3NF) │─────────────▶│ Finance │└──────┘ │ │ │ Mart │┌──────┐ │ │ └──────────┘│ CRM │──────────▶│ │ ┌──────────┐└──────┘ └─────────┘ │Marketing │ │ Mart │ └──────────┘
Key Characteristics
Aspect
Description
Normalization
3NF (Third Normal Form) - minimize redundancy
Approach
Top-down: design entire enterprise model first
Focus
Data consistency and integration
Data Marts
Created FROM the central warehouse
Development
Longer initial build, more stable long-term
Example Schema (3NF)
-- Normalized: separate tables, linked by foreign keysCREATE TABLE customers ( customer_id INT PRIMARY KEY, name VARCHAR(100), address_id INT REFERENCES addresses(address_id));CREATE TABLE addresses ( address_id INT PRIMARY KEY, street VARCHAR(200), city VARCHAR(100), country_id INT REFERENCES countries(country_id));CREATE TABLE countries ( country_id INT PRIMARY KEY, country_name VARCHAR(100), region VARCHAR(50));CREATE TABLE orders ( order_id INT PRIMARY KEY, customer_id INT REFERENCES customers(customer_id), order_date DATE, status VARCHAR(20));
Pros & Cons
✅ Pros
❌ Cons
Single source of truth
Long development time
Minimal data redundancy
Complex queries (many JOINs)
Easier maintenance
Slower query performance
Strong data governance
Requires upfront enterprise buy-in
Handles complex relationships
Less intuitive for business users
2. Kimball: Dimensional Modeling
Core Philosophy
“Build dimensional data marts focused on business processes first. Integration happens through conformed dimensions.”
Architecture
Sources Staging Dimensional Data Marts │ │ │ ▼ ▼ ▼┌──────┐ ┌─────────┐ ┌──────────────┐│ POS │──────▶│ Stage │──────────▶│ Sales │└──────┘ │ Area │ │ Star Schema │ │ │ └──────────────┘┌──────┐ │ │ ┌──────────────┐│ Web │──────▶│ │──────────▶│ E-commerce │└──────┘ │ │ │ Star Schema │ │ │ └──────────────┘ └─────────┘ ┌──────────────────┐ │ Conformed │ │ Dimensions │ │ (Date, Customer) │ └──────────────────┘
Uses hash of business key as surrogate (parallel loading)
Insert-Only
Never update/delete - always insert new version
Full History
Every change is captured with timestamp
Source Tracking
Every record knows where it came from
Schema Stability
Hubs/Links rarely change, Satellites can extend
Pros & Cons
✅ Pros
❌ Cons
Complete audit history
Complex, many tables
Highly adaptable to change
Query performance (many JOINs)
Parallel loading friendly
Learning curve
Source agnostic
Needs business vault for reporting
Handles late-arriving data
More storage needed
PIT and Bridge Tables (Performance Optimization)
Data Vault’s normalized structure creates a JOIN-heavy query pattern. PIT (Point-In-Time) and Bridge tables solve this:
Table Type
Purpose
How It Works
PIT Table
Speed up time-based queries
Pre-calculates which satellite version is valid at each point in time
Bridge Table
Flatten many-to-many links
Pre-joins complex link paths for faster traversal
-- PIT Table Example-- Instead of complex JOINs with date range logic...CREATE TABLE pit_customer ASSELECT d.snapshot_date, h.customer_hk, s_name.sat_customer_hk as name_hk, -- Which satellite version s_contact.sat_contact_hk as contact_hk -- is valid on this dateFROM dim_date dCROSS JOIN hub_customer hLEFT JOIN sat_customer_name s_name ON h.customer_hk = s_name.customer_hk AND d.snapshot_date BETWEEN s_name.load_date AND s_name.load_end_date-- ... more satellites;-- Query becomes simple:SELECT * FROM pit_customer WHERE snapshot_date = '2024-06-15';
Why PIT Tables Matter: Without PIT tables, every query to Data Vault requires complex date-range JOIN logic across multiple satellites. PIT pre-computes this, making the Business Vault queries performant.
4. Comparison Matrix
High-Level Comparison
Criteria
Inmon
Kimball
Data Vault
Approach
Top-down
Bottom-up
Hybrid
Normalization
3NF (high)
Denormalized (low)
Insert-only (unique)
Time to First Value
Long
Short
Medium
Query Performance
Medium
High
Low (raw) / High (marts)
Flexibility
Medium
Medium
High
Historical Tracking
Via SCD
Via SCD
Native (all data)
Learning Curve
Medium
Low
High
Development Effort
High upfront
Incremental
High upfront
When to Use
Methodology
Best For
Inmon
Large enterprises, complex integrations, long-term vision
Kimball
BI-focused teams, quick wins, business user self-service
-- dbt model: dims and facts as code{{ config( materialized='table', unique_key='customer_key' )}}SELECT {{ dbt_utils.generate_surrogate_key(['customer_id']) }} as customer_key, customer_id, name, email, CURRENT_TIMESTAMP as valid_from, '9999-12-31'::date as valid_to, true as is_currentFROM {{ ref('stg_customers') }}
One Big Table (OBT) Pattern
In modern columnar warehouses (Snowflake/BigQuery), storage is cheap but JOINs are expensive. Many teams flatten Star Schemas into a single wide table:
Star Schema One Big Table (OBT) │ │ ▼ ▼ dim ← fact → dim ┌─────────────────────────────┐ │ │ All dims + facts in │ ▼ │ one pre-joined table │ 4+ JOINs └─────────────────────────────┘ 0 JOINs
Aspect
Star Schema
OBT
JOINs at query time
4-8 JOINs
0 JOINs
Storage
Less (normalized)
More (denormalized)
Query performance
Good
Excellent
BI tool compatibility
Standard
Optimal
Maintenance
More complex
Simpler consumption
-- dbt model: Creating OBT from Star Schema{{ config(materialized='table') }}SELECT -- Fact measures f.order_date, f.quantity, f.revenue, -- All dimension attributes flattened c.customer_name, c.customer_segment, c.customer_region, p.product_name, p.product_category, p.brand, s.store_name, s.store_cityFROM {{ ref('fct_sales') }} fJOIN {{ ref('dim_customer') }} c ON f.customer_key = c.customer_keyJOIN {{ ref('dim_product') }} p ON f.product_key = p.product_keyJOIN {{ ref('dim_store') }} s ON f.store_key = s.store_key
When to use OBT: When Power BI/Tableau performance is critical, and your analysts prefer simple SELECT * FROM obt_sales queries. Trade storage cost for query speed.
6. Summary: The Library Analogy 📚
Methodology
Library Analogy
Inmon
Academic library: carefully cataloged, one classification system, access via trained librarian
Kimball
Public library: books organized by popular sections (Fiction, History), easy browsing
Data Vault
Archive: every document version kept, tagged with source and date, researchers can find anything
7. The Grand Unification: How They Work Together
These four methodologies are not mutually exclusive. In mature modern architectures, we typically combine them:
Architecture Layer
Recommended Approach
Why?
Overall Data Flow
Medallion Architecture
Clear data quality progression (Bronze/Silver/Gold)
Silver Layer
Data Vault 2.0 or Inmon (3NF)
Flexible multi-source integration, complete history preservation
Gold Layer
Kimball (Star Schema)
Fast BI queries, business-user friendly
Quick Decision Guide
Just need fast reports? → Use Kimball (Star Schema) directly. Fastest path to business value.
Building Data Lakehouse (Databricks/Spark)? → Adopt Medallion Architecture for layer management.
Many source systems with frequent changes? → Use Data Vault in Silver layer. Steep learning curve, but low long-term maintenance.
Boss asks “What’s the current standard?” → Answer: “Medallion Architecture with Kimball Marts on top”
The Modern Stack
Sources → Bronze (Raw) → Silver (Data Vault/3NF) → Gold (Kimball Star) → BI Tools ↑ ↑ ↑ Medallion Data Vault Kimball Architecture for history for performance