Oct 7, 2025

Data Modeling Guide: Inmon, Kimball & Data Vault Compared

Data Modeling Kimball Inmon Data Vault Star Schema

The Three Philosophies

Data modeling methodologies answer: How should we organize data in our warehouse?

Methodology	Creator	Philosophy	Approach
Inmon	Bill Inmon	Enterprise-first	Top-down, normalized
Kimball	Ralph Kimball	Business-first	Bottom-up, dimensional
Data Vault	Dan Linstedt	Flexibility-first	Hub-satellite, historical

1. Inmon: Enterprise Data Warehouse (EDW)

Core Philosophy

“Build one integrated, normalized enterprise data warehouse first, then create departmental data marts.”

Architecture

Sources          Enterprise DW (3NF)          Data Marts
   │                    │                         │
   ▼                    ▼                         ▼
┌──────┐           ┌─────────┐              ┌──────────┐
│ OLTP │──────────▶│ Single  │─────────────▶│  Sales   │
│  DB  │           │ Source  │              │   Mart   │
└──────┘           │   of    │              └──────────┘
┌──────┐           │  Truth  │              ┌──────────┐
│ ERP  │──────────▶│ (3NF)   │─────────────▶│ Finance  │
└──────┘           │         │              │   Mart   │
┌──────┐           │         │              └──────────┘
│ CRM  │──────────▶│         │              ┌──────────┐
└──────┘           └─────────┘              │Marketing │
                                            │   Mart   │
                                            └──────────┘

Key Characteristics

Aspect	Description
Normalization	3NF (Third Normal Form) - minimize redundancy
Approach	Top-down: design entire enterprise model first
Focus	Data consistency and integration
Data Marts	Created FROM the central warehouse
Development	Longer initial build, more stable long-term

Example Schema (3NF)

-- Normalized: separate tables, linked by foreign keys
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(100),
    address_id INT REFERENCES addresses(address_id)
);

CREATE TABLE addresses (
    address_id INT PRIMARY KEY,
    street VARCHAR(200),
    city VARCHAR(100),
    country_id INT REFERENCES countries(country_id)
);

CREATE TABLE countries (
    country_id INT PRIMARY KEY,
    country_name VARCHAR(100),
    region VARCHAR(50)
);

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    order_date DATE,
    status VARCHAR(20)
);

Pros & Cons

✅ Pros	❌ Cons
Single source of truth	Long development time
Minimal data redundancy	Complex queries (many JOINs)
Easier maintenance	Slower query performance
Strong data governance	Requires upfront enterprise buy-in
Handles complex relationships	Less intuitive for business users

2. Kimball: Dimensional Modeling

Core Philosophy

“Build dimensional data marts focused on business processes first. Integration happens through conformed dimensions.”

Architecture

Sources          Staging          Dimensional Data Marts
   │                │                      │
   ▼                ▼                      ▼
┌──────┐       ┌─────────┐           ┌──────────────┐
│ POS  │──────▶│ Stage   │──────────▶│   Sales      │
└──────┘       │  Area   │           │  Star Schema │
               │         │           └──────────────┘
┌──────┐       │         │           ┌──────────────┐
│ Web  │──────▶│         │──────────▶│  E-commerce  │
└──────┘       │         │           │  Star Schema │
               │         │           └──────────────┘
               └─────────┘
                                   ┌──────────────────┐
                                   │    Conformed     │
                                   │   Dimensions     │
                                   │ (Date, Customer) │
                                   └──────────────────┘

Key Characteristics

Aspect	Description
Denormalization	Star/Snowflake schema - optimized for reads
Approach	Bottom-up: build marts incrementally
Focus	Business user accessibility and performance
Integration	Via conformed dimensions across marts
Development	Fast initial value, iterative expansion

Star Schema Explained

                    ┌─────────────────┐
                    │   dim_date      │
                    │─────────────────│
                    │ date_key (PK)   │
                    │ full_date       │
                    │ day_of_week     │
                    │ month           │
                    │ quarter         │
                    │ year            │
                    └────────┬────────┘
                             │
┌─────────────────┐          │          ┌─────────────────┐
│  dim_product    │          │          │  dim_customer   │
│─────────────────│          │          │─────────────────│
│ product_key(PK) │          │          │customer_key(PK) │
│ product_name    │          │          │ customer_name   │
│ category        │          │          │ segment         │
│ brand           │          │          │ region          │
└────────┬────────┘          │          └────────┬────────┘
         │                   │                   │
         │      ┌────────────┴────────────┐      │
         │      │       fact_sales        │      │
         │      │─────────────────────────│      │
         └──────│ date_key (FK)           │──────┘
                │ product_key (FK)        │
                │ customer_key (FK)       │
                │ store_key (FK)          │
                │─────────────────────────│
                │ quantity (measure)      │
                │ revenue (measure)       │
                │ discount (measure)      │
                └─────────────────────────┘

Fact vs Dimension Tables

Aspect	Fact Table	Dimension Table
Content	Measurements, metrics	Descriptive attributes
Size	Very large (millions/billions)	Smaller (thousands/millions)
Keys	Foreign keys to dimensions	Primary key + attributes
Examples	Sales, Clicks, Transactions	Customer, Product, Date, Store
Updates	Append-only (typically)	Slowly changing (SCD)

Example Schema (Star)

-- Dimension: denormalized for query performance
CREATE TABLE dim_customer (
    customer_key INT PRIMARY KEY,      -- Surrogate key
    customer_id VARCHAR(20),           -- Natural/business key
    customer_name VARCHAR(100),
    street VARCHAR(200),
    city VARCHAR(100),
    country VARCHAR(100),              -- Denormalized!
    region VARCHAR(50),                -- Denormalized!
    segment VARCHAR(50),
    -- SCD Type 2 columns
    valid_from DATE,
    valid_to DATE,
    is_current BOOLEAN
);

-- Fact: measures linked to dimensions
CREATE TABLE fact_sales (
    date_key INT REFERENCES dim_date(date_key),
    product_key INT REFERENCES dim_product(product_key),
    customer_key INT REFERENCES dim_customer(customer_key),
    store_key INT REFERENCES dim_store(store_key),
    -- Measures
    quantity INT,
    unit_price DECIMAL(10,2),
    total_revenue DECIMAL(12,2),
    discount_amount DECIMAL(10,2),
    PRIMARY KEY (date_key, product_key, customer_key, store_key)
);

Types of Fact Tables

Type	Description	Example
Transaction	One row per event	Each sale, each click
Periodic Snapshot	One row per period	Monthly inventory levels
Accumulating Snapshot	One row per lifecycle	Order from created → shipped → delivered
Factless Fact	No measures, just relationships	Student attendance (who attended which class)

Snowflake Schema

A variation where dimensions are further normalized:

┌──────────┐     ┌──────────┐     ┌──────────┐
│ Category │────▶│ Product  │────▶│   Fact   │
└──────────┘     └──────────┘     │  Sales   │
                                  └──────────┘

Star Schema	Snowflake Schema
Dimensions fully denormalized	Dimensions partially normalized
Fewer JOINs	More JOINs
More storage	Less storage
Faster queries	Slightly slower
Simpler to understand	More complex structure

Pros & Cons

✅ Pros	❌ Cons
Fast query performance	Data redundancy
Business user friendly	Harder to maintain dimension updates
Quick time-to-value	Requires careful dimension conforming
Intuitive navigation	Less flexible for new requirements
BI tool optimized	Ad-hoc analysis can be limited

3. Data Vault: Hub-Satellite-Link

Core Philosophy

“Separate business keys (Hubs), relationships (Links), and descriptive context (Satellites) for maximum flexibility and full history.”

Architecture

Sources          Raw Vault                     Business Vault
   │                 │                              │
   ▼                 ▼                              ▼
┌──────┐       ┌──────────────┐              ┌──────────────┐
│ CRM  │──────▶│    Hubs      │─────────────▶│    Star      │
└──────┘       │   Links      │              │   Schema     │
               │  Satellites  │              │    (PIT)     │
┌──────┐       │              │              └──────────────┘
│ ERP  │──────▶│              │                    │
└──────┘       └──────────────┘                    ▼
                                             ┌──────────────┐
                                             │     BI       │
                                             │    Tools     │
                                             └──────────────┘

Core Components

Component	Description	Contains
Hub	Business entity (unique keys)	Surrogate key, business key, load date, source
Link	Relationship between hubs	Surrogate key, hub foreign keys, load date, source
Satellite	Descriptive data (temporal)	Hub/Link FK, attributes, load date, source

Visual Representation

       ┌─────────────────────┐
       │   SAT_CUSTOMER      │
       │─────────────────────│
       │ customer_hk (FK)    │
       │ load_date           │
       │ name                │
       │ email               │
       │ phone               │
       └──────────┬──────────┘
                  │
       ┌──────────▼──────────┐           ┌─────────────────────┐
       │   HUB_CUSTOMER      │           │   SAT_ORDER         │
       │─────────────────────│           │─────────────────────│
       │ customer_hk (PK)    │           │ order_hk (FK)       │
       │ customer_bk         │           │ load_date           │
       │ load_date           │           │ status              │
       │ record_source       │           │ total_amount        │
       └──────────┬──────────┘           └──────────┬──────────┘
                  │                                  │
                  │      ┌─────────────────────┐     │
                  └──────│  LINK_CUST_ORDER    │─────┘
                         │─────────────────────│
                         │ link_hk (PK)        │
                         │ customer_hk (FK)    │──────┐
                         │ order_hk (FK)       │──────┤
                         │ load_date           │      │
                         │ record_source       │      │
                         └─────────────────────┘      │
                                                      │
                                           ┌──────────▼──────────┐
                                           │   HUB_ORDER         │
                                           │─────────────────────│
                                           │ order_hk (PK)       │
                                           │ order_bk            │
                                           │ load_date           │
                                           │ record_source       │
                                           └─────────────────────┘

Example Schema

-- Hub: business keys only
CREATE TABLE hub_customer (
    customer_hk BINARY(32) PRIMARY KEY,  -- Hash key (MD5/SHA)
    customer_bk VARCHAR(50),             -- Business key (original ID)
    load_date TIMESTAMP,
    record_source VARCHAR(50)
);

-- Satellite: descriptive data with history
CREATE TABLE sat_customer (
    customer_hk BINARY(32) REFERENCES hub_customer(customer_hk),
    load_date TIMESTAMP,
    load_end_date TIMESTAMP,             -- For SCD Type 2
    name VARCHAR(100),
    email VARCHAR(100),
    phone VARCHAR(20),
    record_source VARCHAR(50),
    PRIMARY KEY (customer_hk, load_date)
);

-- Link: relationships
CREATE TABLE link_customer_order (
    link_hk BINARY(32) PRIMARY KEY,
    customer_hk BINARY(32) REFERENCES hub_customer(customer_hk),
    order_hk BINARY(32) REFERENCES hub_order(order_hk),
    load_date TIMESTAMP,
    record_source VARCHAR(50)
);

Key Characteristics

Aspect	Description
Hash Keys	Uses hash of business key as surrogate (parallel loading)
Insert-Only	Never update/delete - always insert new version
Full History	Every change is captured with timestamp
Source Tracking	Every record knows where it came from
Schema Stability	Hubs/Links rarely change, Satellites can extend

Pros & Cons

✅ Pros	❌ Cons
Complete audit history	Complex, many tables
Highly adaptable to change	Query performance (many JOINs)
Parallel loading friendly	Learning curve
Source agnostic	Needs business vault for reporting
Handles late-arriving data	More storage needed

PIT and Bridge Tables (Performance Optimization)

Data Vault’s normalized structure creates a JOIN-heavy query pattern. PIT (Point-In-Time) and Bridge tables solve this:

Table Type	Purpose	How It Works
PIT Table	Speed up time-based queries	Pre-calculates which satellite version is valid at each point in time
Bridge Table	Flatten many-to-many links	Pre-joins complex link paths for faster traversal

-- PIT Table Example
-- Instead of complex JOINs with date range logic...
CREATE TABLE pit_customer AS
SELECT 
    d.snapshot_date,
    h.customer_hk,
    s_name.sat_customer_hk as name_hk,        -- Which satellite version
    s_contact.sat_contact_hk as contact_hk    -- is valid on this date
FROM dim_date d
CROSS JOIN hub_customer h
LEFT JOIN sat_customer_name s_name 
    ON h.customer_hk = s_name.customer_hk
    AND d.snapshot_date BETWEEN s_name.load_date AND s_name.load_end_date
-- ... more satellites
;

-- Query becomes simple:
SELECT * FROM pit_customer WHERE snapshot_date = '2024-06-15';

Why PIT Tables Matter: Without PIT tables, every query to Data Vault requires complex date-range JOIN logic across multiple satellites. PIT pre-computes this, making the Business Vault queries performant.

4. Comparison Matrix

High-Level Comparison

Criteria	Inmon	Kimball	Data Vault
Approach	Top-down	Bottom-up	Hybrid
Normalization	3NF (high)	Denormalized (low)	Insert-only (unique)
Time to First Value	Long	Short	Medium
Query Performance	Medium	High	Low (raw) / High (marts)
Flexibility	Medium	Medium	High
Historical Tracking	Via SCD	Via SCD	Native (all data)
Learning Curve	Medium	Low	High
Development Effort	High upfront	Incremental	High upfront

When to Use

Methodology	Best For
Inmon	Large enterprises, complex integrations, long-term vision
Kimball	BI-focused teams, quick wins, business user self-service
Data Vault	Regulatory industries, multiple sources, frequent changes

Decision Framework

┌─────────────────────────────────────────────────────────────────┐
│                    Methodology Decision Tree                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Q1: Primary goal?                                              │
│      │                                                           │
│      ├── Quick BI dashboards ──▶ KIMBALL                        │
│      │                                                           │
│      ├── Enterprise integration ──▶ INMON                       │
│      │                                                           │
│      └── Audit/compliance/flexibility ──▶ DATA VAULT            │
│                                                                  │
│  Q2: Team skills?                                               │
│      │                                                           │
│      ├── SQL-heavy, business-focused ──▶ KIMBALL                │
│      │                                                           │
│      ├── Data modeling experts ──▶ INMON or DATA VAULT          │
│      │                                                           │
│      └── Mixed/growing team ──▶ KIMBALL (start), evolve later   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

5. Modern Hybrid Approaches

In practice, many organizations combine methodologies:

Data Vault + Kimball

Sources → Data Vault (Raw) → Kimball Star Schemas (Business)

Data Vault for raw data integration, audit, history
Kimball for BI-optimized marts consumed by users

Lakehouse + Medallion + Kimball

Sources → Bronze (Raw) → Silver (Cleaned) → Gold (Star Schema)

Medallion layers for data organization
Kimball dimensional modeling in Gold layer

dbt + Kimball

-- dbt model: dims and facts as code
{{
  config(
    materialized='table',
    unique_key='customer_key'
  )
}}

SELECT
    {{ dbt_utils.generate_surrogate_key(['customer_id']) }} as customer_key,
    customer_id,
    name,
    email,
    CURRENT_TIMESTAMP as valid_from,
    '9999-12-31'::date as valid_to,
    true as is_current
FROM {{ ref('stg_customers') }}

One Big Table (OBT) Pattern

In modern columnar warehouses (Snowflake/BigQuery), storage is cheap but JOINs are expensive. Many teams flatten Star Schemas into a single wide table:

           Star Schema                           One Big Table (OBT)
               │                                        │
               ▼                                        ▼
       dim ← fact → dim                    ┌─────────────────────────────┐
            │                              │  All dims + facts in    │
            ▼                              │  one pre-joined table   │
       4+ JOINs                            └─────────────────────────────┘
                                                  0 JOINs

Aspect	Star Schema	OBT
JOINs at query time	4-8 JOINs	0 JOINs
Storage	Less (normalized)	More (denormalized)
Query performance	Good	Excellent
BI tool compatibility	Standard	Optimal
Maintenance	More complex	Simpler consumption

-- dbt model: Creating OBT from Star Schema
{{ config(materialized='table') }}

SELECT
    -- Fact measures
    f.order_date,
    f.quantity,
    f.revenue,
    -- All dimension attributes flattened
    c.customer_name,
    c.customer_segment,
    c.customer_region,
    p.product_name,
    p.product_category,
    p.brand,
    s.store_name,
    s.store_city
FROM {{ ref('fct_sales') }} f
JOIN {{ ref('dim_customer') }} c ON f.customer_key = c.customer_key
JOIN {{ ref('dim_product') }} p ON f.product_key = p.product_key
JOIN {{ ref('dim_store') }} s ON f.store_key = s.store_key

When to use OBT: When Power BI/Tableau performance is critical, and your analysts prefer simple SELECT * FROM obt_sales queries. Trade storage cost for query speed.

6. Summary: The Library Analogy 📚

Methodology	Library Analogy
Inmon	Academic library: carefully cataloged, one classification system, access via trained librarian
Kimball	Public library: books organized by popular sections (Fiction, History), easy browsing
Data Vault	Archive: every document version kept, tagged with source and date, researchers can find anything

7. The Grand Unification: How They Work Together

These four methodologies are not mutually exclusive. In mature modern architectures, we typically combine them:

Architecture Layer	Recommended Approach	Why?
Overall Data Flow	Medallion Architecture	Clear data quality progression (Bronze/Silver/Gold)
Silver Layer	Data Vault 2.0 or Inmon (3NF)	Flexible multi-source integration, complete history preservation
Gold Layer	Kimball (Star Schema)	Fast BI queries, business-user friendly

Quick Decision Guide

Just need fast reports? → Use Kimball (Star Schema) directly. Fastest path to business value.
Building Data Lakehouse (Databricks/Spark)? → Adopt Medallion Architecture for layer management.
Many source systems with frequent changes? → Use Data Vault in Silver layer. Steep learning curve, but low long-term maintenance.
Boss asks “What’s the current standard?” → Answer: “Medallion Architecture with Kimball Marts on top”

The Modern Stack

Sources → Bronze (Raw) → Silver (Data Vault/3NF) → Gold (Kimball Star) → BI Tools
           ↑                    ↑                         ↑
       Medallion            Data Vault              Kimball
      Architecture          for history           for performance

← Previous Data Storage Evolution: From Warehouse to Lake to Lakehouse

Next → OLTP vs. OLAP: A Deep Dive into Data Architecture