Hero image for NoSQL Introduction: Types, CAP Theorem & Storage Engines

NoSQL Introduction: Types, CAP Theorem & Storage Engines

nosql mongodb database cap-theorem distributed-systems

Prerequisites: Understanding of relational databases. See DB 02 Software Layer for SQL fundamentals.

NoSQL databases were created to solve problems that traditional SQL databases struggle with: massive scale, flexible schemas, and distributed computing.


Part A: The Four Types of NoSQL

1. Document Database

Structure: JSON-like documents (nested, flexible)

{
  "_id": "user123",
  "name": "Alice",
  "orders": [
    { "item": "Laptop", "price": 1200 },
    { "item": "Mouse", "price": 25 }
  ]
}
ProductUse Case
MongoDBContent management, e-commerce catalogs
CouchbaseMobile apps with offline sync

Why use it: When your data has variable structure (products with different attributes).


2. Key-Value Store

Structure: Simple dictionary — key → value

session:abc123 → { userId: 1, expires: "2024-03-15" }
cache:product:99 → { name: "Laptop", price: 1200 }
ProductUse Case
RedisCaching, session storage, real-time leaderboards
DynamoDBServerless apps, high-throughput workloads

Why use it: Blazing fast reads/writes for simple lookups.


3. Column-Family Store

Structure: Data stored by columns, not rows

graph LR
    subgraph "Row Store (SQL)"
        R1["Row 1: ID, Name, Age, City"]
        R2["Row 2: ID, Name, Age, City"]
    end
    
    subgraph "Column Store (NoSQL)"
        C1["Column: All IDs"]
        C2["Column: All Names"]
        C3["Column: All Ages"]
    end
    
    style R1 fill:#e74c3c,color:#fff
    style R2 fill:#e74c3c,color:#fff
    style C1 fill:#27ae60,color:#fff
    style C2 fill:#27ae60,color:#fff
    style C3 fill:#27ae60,color:#fff
ProductUse Case
CassandraTime-series data, IoT sensor logs, high-velocity writes
HBaseHadoop ecosystem, log data ingestion

Why use it: Excellent write throughput and handling of sparse data. (Note: For pure analytical aggregations, consider Column-Oriented DBs like ClickHouse.)


4. Graph Database

Structure: Nodes (entities) + Edges (relationships)

graph LR
    A[Alice] -->|FRIENDS_WITH| B[Bob]
    B -->|WORKS_AT| C[Google]
    A -->|LIKES| D[Coffee]
    B -->|LIKES| D
    
    style A fill:#3498db,color:#fff
    style B fill:#3498db,color:#fff
    style C fill:#27ae60,color:#fff
    style D fill:#f39c12,color:#fff
ProductUse Case
Neo4jSocial networks, fraud detection
Amazon NeptuneKnowledge graphs, recommendation engines

Why use it: Finding relationships is O(1) via index-free adjacency — each node directly points to its neighbors without index lookups, unlike SQL JOINs that must scan indexes.


5. Comparison Summary

TypeData ModelBest ForExample Query
DocumentJSON objectsFlexible schemas”Get user with all their orders”
Key-ValueKey → ValueCaching”Get session by ID”
ColumnColumn familiesAnalytics”Sum of all sales this month”
GraphNodes + EdgesRelationships”Friends who also bought X”

Part B: CAP Theorem

6. The Impossible Triangle

In a distributed system, you can only guarantee two of three properties:

💡 Modern Understanding of CAP

In reality, P (Partition Tolerance) is non-negotiable — networks WILL fail. So when a partition occurs, you must choose between C (Consistency) and A (Availability). “Pick 2” is a simplification; the real choice is C vs A during network failures.

graph TD
    subgraph "CAP Theorem"
        C[Consistency<br/>All nodes see same data]
        A[Availability<br/>Every request gets response]
        P[Partition Tolerance<br/>System works despite network splits]
    end
    
    C --- A
    A --- P
    P --- C
    
    style C fill:#3498db,color:#fff
    style A fill:#27ae60,color:#fff
    style P fill:#e74c3c,color:#fff
CombinationSacrificeExample
CAPartition ToleranceTraditional SQL (single server)
CPAvailabilityMongoDB (with strict write concern)
APConsistencyCassandra, DynamoDB

7. Real-World Example

Scenario: Network splits your 3 MongoDB servers into two groups.

sequenceDiagram
    participant Client
    participant Server1 as Server 1 (Primary)
    participant Server2 as Server 2
    participant Server3 as Server 3
    
    Note over Server1,Server3: Network Partition!
    rect rgb(255, 200, 200)
        Server1--xServer2: Cannot reach
        Server1--xServer3: Cannot reach
    end
    
    Client->>Server1: Write order
    
    alt CP Mode (Consistency Priority)
        Server1-->>Client: ERROR - Cannot confirm write
        Note right of Client: Available = NO
    else AP Mode (Availability Priority)
        Server1-->>Client: OK - Written locally
        Note right of Client: Consistent = NO (other servers outdated)
    end

Part C: BASE vs ACID

8. ACID (SQL Databases)

PropertyMeaningExample
AtomicityAll or nothingBank transfer: both debit and credit succeed, or neither
ConsistencyValid state → Valid stateTotal money in system stays same
IsolationTransactions don’t interfereTwo users can’t buy the last item
DurabilityOnce committed, permanentSurvives power failure

9. BASE (NoSQL Databases)

PropertyMeaning
Basically AvailableSystem always responds (maybe stale data)
Soft stateData may change over time (syncing)
Eventual consistencyGiven time, all nodes will agree

10. Comparison

graph LR
    subgraph "ACID (SQL)"
        A1[Strong Consistency]
        A2[Immediate]
        A3[Slower writes]
    end
    
    subgraph "BASE (NoSQL)"
        B1[Eventual Consistency]
        B2[Faster at scale]
        B3[May read stale data]
    end
    
    A1 -->|Trade-off| B2
    
    style A1 fill:#e74c3c,color:#fff
    style B1 fill:#27ae60,color:#fff
ACIDBASE
PriorityCorrectnessAvailability
ScaleHarder to scale outBuilt for scale out
Use CaseBanking, inventorySocial feeds, analytics

Part D: MongoDB Storage Engine (WiredTiger)

11. What is WiredTiger?

WiredTiger is MongoDB’s default storage engine since version 3.2. Think of it as the “V8 engine” that powers MongoDB.

graph TD
    subgraph "MongoDB Architecture"
        APP[Application] --> DRIVER[MongoDB Driver]
        DRIVER --> QUERY[Query Engine]
        QUERY --> STORAGE[WiredTiger Engine]
        STORAGE --> DISK[Disk Storage]
    end
    
    style STORAGE fill:#27ae60,color:#fff

12. Document-Level Locking

The Problem: Early MongoDB used database-level locking — if one user writes, the entire database is locked.

WiredTiger’s Solution: Document-level locking — only the specific document being modified is locked.

graph LR
    subgraph "Database-Level Lock (Old)"
        DB1[Entire Database LOCKED]
        U1[User 1 writes Doc A] --> DB1
        U2[User 2 wants Doc B] -->|WAIT| DB1
    end
    
    subgraph "Document-Level Lock (WiredTiger)"
        D1[Doc A LOCKED]
        D2[Doc B FREE]
        U3[User 1 writes Doc A] --> D1
        U4[User 2 writes Doc B] --> D2
    end
    
    style DB1 fill:#e74c3c,color:#fff
    style D1 fill:#f39c12,color:#fff
    style D2 fill:#27ae60,color:#fff

💡 MVCC: Why Reads Don’t Block Writes

WiredTiger uses MVCC (Multi-Version Concurrency Control): readers see the old version while writers create a new version. This is why reads and writes don’t block each other — true non-blocking concurrency.

13. Compression

WiredTiger compresses data on disk:

CompressionCPU UsageSpace Savings
snappy (default)Low~50%
zlibMedium~70%
zstdLow-Medium~60%
// Check current engine
db.serverStatus().storageEngine
// { "name": "wiredTiger", ... }

14. Journaling & Checkpoints

FeaturePurpose
JournalWrite-ahead log (WAL) for crash recovery
CheckpointPeriodic flush to disk (every 60s or 2GB)

This is similar to SQL Server’s transaction log!


Summary

NoSQL Type Selection Guide

NeedChoose
Flexible product catalogDocument (MongoDB)
Super-fast cachingKey-Value (Redis)
Time-series / IoT dataColumn (Cassandra)
Social network relationshipsGraph (Neo4j)

CAP/BASE Quick Reference

ConceptMeaning
CAPPick 2: Consistency, Availability, Partition Tolerance
CPStrong consistency, sacrifice availability during partition
APAlways available, sacrifice consistency during partition
BASEEventual consistency model for distributed systems

WiredTiger Benefits

  • ✅ Document-level locking (high concurrency)
  • ✅ Compression (50-70% space savings)
  • ✅ Journaling (crash recovery)

💡 Practice Questions

Conceptual

  1. Name the 4 types of NoSQL databases and give one use case for each.

  2. Explain the CAP theorem. Why can’t a distributed system have all three properties?

  3. What is the difference between ACID and BASE consistency models?

  4. Describe two features of MongoDB’s WiredTiger storage engine.

Scenario

  1. Decision: Your company needs to build a social network. Which NoSQL database type would you recommend for storing friend relationships, and why?

  2. Trade-off: An e-commerce site needs both fast reads and strong consistency for inventory counts. According to CAP theorem, what challenges might you face with a distributed database?