Mar 17, 2025

MongoDB Replica Set: High Availability, Elections & Read Preference

nosql mongodb database high-availability replication

Prerequisites: Understanding of MongoDB basics. See NOSQL 02 MongoDB Basics.

A Replica Set is MongoDB’s built-in high availability solution — multiple copies of your data with automatic failover.

Part A: Replica Set Architecture

1. What is a Replica Set?

A group of MongoDB servers that maintain the same data set, providing:

Redundancy (data safety)
High Availability (automatic failover)
Read Scaling (read from secondaries)

graph TD
    subgraph "Replica Set (3 members)"
        P[Primary<br/>Read + Write]
        S1[Secondary 1<br/>Read Only]
        S2[Secondary 2<br/>Read Only]
    end
    
    P -->|Replicates| S1
    P -->|Replicates| S2
    
    APP[Application] -->|Write| P
    APP -.->|Read| S1
    APP -.->|Read| S2
    
    style P fill:#27ae60,color:#fff
    style S1 fill:#3498db,color:#fff
    style S2 fill:#3498db,color:#fff

2. Member Roles

Role	Can Vote?	Can Become Primary?	Purpose
Primary	✅	(current)	Handles all writes
Secondary	✅	✅	Maintains copy, can serve reads
Arbiter	✅	❌	Vote only, no data
Hidden	✅	❌	Backup/reporting, invisible to apps
Delayed	✅	❌	Time-delayed copy (recover from mistakes)

3. Recommended Configurations

Scenario	Configuration	Why
Standard	1 Primary + 2 Secondaries	Tolerates 1 failure
Budget	1 Primary + 1 Secondary + 1 Arbiter	Saves disk space
Disaster Recovery	P + S + S (different datacenters)	Geographic redundancy
Reporting	P + S + Hidden Secondary	Reports don’t affect production

Part B: Replication Process

4. How Data Replicates

sequenceDiagram
    participant C as Client
    participant P as Primary
    participant O as Oplog
    participant S as Secondary
    
    C->>P: Insert document
    P->>P: Write to data files
    P->>O: Write to oplog
    O-->>S: Tail oplog (continuous)
    S->>S: Apply operation
    P-->>C: Acknowledge write

5. The Oplog (Operation Log)

The oplog is a capped collection that records all write operations.

// View oplog on primary
use local
db.oplog.rs.find().sort({ ts: -1 }).limit(5)

// Output:
{
  "ts": Timestamp(1710500000, 1),    // Timestamp
  "op": "i",                           // i=insert, u=update, d=delete
  "ns": "mydb.users",                  // Namespace
  "o": { "_id": ObjectId("..."), "name": "Alice" }  // Document
}

Field	Meaning
`ts`	Timestamp (for ordering)
`op`	Operation type (i/u/d/c/n)
`ns`	Database.collection
`o`	The actual document/operation

Part C: Elections & Automatic Failover

6. What Triggers an Election?

Trigger	Scenario
Primary failure	Server crashes, network disconnect
Maintenance	Admin steps down primary
Priority change	Higher priority member joins
Network partition	Primary isolated from majority

7. Election Process

sequenceDiagram
    participant S1 as Secondary 1
    participant S2 as Secondary 2
    participant S3 as Secondary 3
    
    Note over S1,S3: Primary fails!
    
    S1->>S1: Start election timer
    S2->>S2: Start election timer
    S3->>S3: Start election timer
    
    Note over S1: Timer expires first
    
    S1->>S2: Request vote (term 2)
    S1->>S3: Request vote (term 2)
    
    S2-->>S1: Vote YES
    S3-->>S1: Vote YES
    
    Note over S1: Majority reached!
    
    S1->>S1: Become PRIMARY
    S1->>S2: Notify: I am Primary
    S1->>S3: Notify: I am Primary

8. Election Rules

Rule	Explanation
Majority required	2 of 3, 3 of 5, etc.
Higher priority wins	If equally up-to-date
Most recent data wins	Latest oplog timestamp
Term number	Prevents split-brain (like election seasons)

Why Odd Numbers?

graph LR
    subgraph "3 Members (Odd)"
        A1[Server A] 
        A2[Server B]
        A3[Server C]
    end
    
    subgraph "2 Members (Even) - BAD"
        B1[Server X]
        B2[Server Y]
    end
    
    A1 -.->|2 votes = majority| A2
    B1 x-.->|1 vote each = TIE| B2

Always use odd numbers to avoid tie votes!

Part D: Write Concern

9. What is Write Concern?

Write Concern specifies how many nodes must acknowledge a write before returning success.

graph LR
    subgraph "Write Concern: 1 (Fast)"
        W1[Write to Primary]
        W1 --> ACK1[Return OK]
    end
    
    subgraph "Write Concern: majority (Safe)"
        W2[Write to Primary]
        W2 --> REP[Replicate to Secondary]
        REP --> ACK2[Return OK]
    end
    
    style ACK1 fill:#f39c12,color:#fff
    style ACK2 fill:#27ae60,color:#fff

10. Write Concern Options

Value	Meaning	Speed	Data Safety
`w: 0`	Fire and forget	⚡⚡⚡	❌ None
`w: 1`	Primary acknowledged	⚡⚡	⚠️ May lose on failure
`w: "majority"`	Majority acknowledged	⚡	✅ High
`w: 3`	3 nodes acknowledged	Slow	✅ Very high

Code Examples

// Default: w: 1 (primary only)
db.users.insertOne({ name: "Alice" });

// Majority write concern
db.users.insertOne(
  { name: "Bob" },
  { writeConcern: { w: "majority", wtimeout: 5000 } }
);

// With journal acknowledgment
db.users.insertOne(
  { name: "Carol" },
  { writeConcern: { w: "majority", j: true } }
);

[!NOTE] Journal vs Memory: w: "majority" ensures data exists in memory across multiple nodes. Adding j: true ensures data is written to the on-disk journal. In a datacenter-wide power outage, only journaled writes survive. Use j: true for critical data like financial transactions.

11. Write Concern Decision Guide

Scenario	Recommended	Reason
Logging/metrics	`w: 1`	Speed over durability
User data	`w: "majority"`	Balanced
Financial transactions	`w: "majority", j: true`	Maximum safety

Part E: Read Preference

12. What is Read Preference?

Read Preference specifies which members can serve read queries.

graph TD
    subgraph "Read Preference Options"
        P[primary<br/>Only Primary]
        PS[primaryPreferred<br/>Primary, fallback to Secondary]
        S[secondary<br/>Only Secondary]
        SP[secondaryPreferred<br/>Secondary, fallback to Primary]
        N[nearest<br/>Lowest latency]
    end
    
    style P fill:#27ae60,color:#fff
    style S fill:#3498db,color:#fff
    style N fill:#9b59b6,color:#fff

13. Read Preference Options

Mode	Reads From	Use Case
`primary`	Primary only	Default, consistent reads
`primaryPreferred`	Primary, then secondary	Consistency with failover
`secondary`	Secondaries only	Offload reads from primary
`secondaryPreferred`	Secondary, then primary	Read scaling with fallback
`nearest`	Lowest network latency	Geo-distributed apps

14. Code Examples

// In MongoDB shell
db.users.find().readPref("secondary");

// In connection string
"mongodb://host1,host2,host3/mydb?readPreference=secondaryPreferred"

// In Node.js driver
const client = new MongoClient(uri, {
  readPreference: 'secondaryPreferred'
});

15. Read Preference Trade-offs

graph LR
    subgraph "Consistency vs Performance"
        C[Strong Consistency<br/>primary]
        P[Read Performance<br/>secondaryPreferred]
    end
    
    C <-->|Trade-off| P
    
    style C fill:#e74c3c,color:#fff
    style P fill:#27ae60,color:#fff

Preference	Consistency	Read Performance	Availability
`primary`	✅ Strong	⚠️ Limited	⚠️ Single point
`secondary`	⚠️ Eventual	✅ Scaled	✅ High
`nearest`	⚠️ Eventual	✅ Low latency	✅ High

[!WARNING] Read-Your-Own-Writes Problem: If your app does a write then immediately reads, using secondary Read Preference may return stale data due to replication lag. The newly written document may not have replicated yet. In this pattern, you must use primary for the read, or use “read your writes” consistency with caution.

Part F: Practical Operations

16. Setting Up a Replica Set

// Connect to first server
mongosh --port 27017

// Initiate replica set
rs.initiate({
  _id: "myReplicaSet",
  members: [
    { _id: 0, host: "server1:27017", priority: 2 },  // Higher priority = preferred primary
    { _id: 1, host: "server2:27017", priority: 1 },
    { _id: 2, host: "server3:27017", priority: 1 }
  ]
})

// Check status
rs.status()

17. Common Operations

// Check replica set status
rs.status()

// Check which member is primary
rs.isMaster()

// Step down primary (trigger election)
rs.stepDown()

// Add a new member
rs.add("server4:27017")

// Remove a member
rs.remove("server4:27017")

// Add an arbiter
rs.addArb("arbiter:27017")

// Check replication lag
rs.printSecondaryReplicationInfo()

18. Monitoring Replication Lag

// Check secondary lag
db.adminCommand({ replSetGetStatus: 1 }).members.forEach(m => {
  if (m.stateStr === "SECONDARY") {
    print(`${m.name}: ${m.optimeDate}`)
  }
})

// In production, alert if lag > 10 seconds

Summary

Replica Set Checklist

□ Minimum 3 members (odd number)
□ Members in different availability zones
□ Appropriate write concern for workload
□ Read preference configured
□ Monitoring for replication lag
□ Backup strategy (even with replication!)

[!IMPORTANT] Connection String Matters: The power of Replica Set is automatic failover, but your application must connect using a connection string that includes all members or the Replica Set name. If you connect to a single node and it goes down, your app loses connectivity. Use: mongodb://host1,host2,host3/mydb?replicaSet=myReplicaSet

Quick Reference

Concept	Key Points
Replica Set	1 Primary + N Secondaries
Oplog	Capped collection of operations
Election	Automatic, needs majority vote
Write Concern	How many ack before success
Read Preference	Which members serve reads

Comparison with SQL Server Always On

Feature	MongoDB Replica Set	SQL Server Always On AG
Minimum nodes	3	2 (with witness)
Failover	Automatic (seconds)	Automatic (seconds)
Read replicas	Read Preference	Yes (readable secondary)
Write scaling	No (single primary)	No (single primary)
Configuration	Built-in	Requires WSFC

💡 Practice Questions

Conceptual

What is a Replica Set and what are the three main benefits?
Explain the election process in MongoDB. Why do you need an odd number of voting members?
What is the difference between Write Concern w: 1 and w: "majority"?
Describe the five Read Preference options and when to use each.

Hands-on

// You have a 3-node replica set and need to:
// 1. Check replica set status
// 2. Find which member is primary
// 3. Add a new member at server4:27017
// Write the commands.

💡 View Answer

// 1. Check replica set status
rs.status()

// 2. Find which member is primary
rs.isMaster()
// Or: rs.status().members.filter(m => m.stateStr === "PRIMARY")

// 3. Add a new member
rs.add("server4:27017")

Scenario

Failover: Your primary goes down. Describe what happens automatically and how long it typically takes for the application to recover.
Trade-off: A developer wants to use Read Preference secondary for all reads. What are the benefits and risks?

← Previous MongoDB Aggregation Pipeline: Data Processing & $lookup

Next → MongoDB Indexing & Sharding: Compound Indexes, Geospatial, TTL & Scaling