Mapping Knowledge: Global Library Density & National Wealth Analysis

The Challenge

Does a country’s investment in public libraries correlate with its innovation output? To answer this, I needed to map global library density. However, there is no single global database of libraries.

The technical challenge was significant:

API Limitations: Querying OpenStreetMap (OSM) for massive countries like China or the USA caused frequent server timeouts (504 errors) and rate-limiting rejections (429 errors).
Data Fragmentation: Geopolitical data gaps (e.g., missing data for Taiwan) and inconsistent naming conventions (e.g., “Czechia” vs. “Czech Republic”) made merging datasets impossible without intervention.

My Approach

I built a robust “polite scraper” and data pipeline using Python to handle these complexities.

Fault-Tolerant Architecture: I implemented a double-engine redundancy system. If the primary API node failed, the script automatically switched to a backup node. I also added exponential backoff logic to handle rate limits gracefully.
Resumable Pipeline: By using mode='a' (append) for data saving, I ensured the scraper possessed “Idempotency”—if it crashed at the 41st country, it could resume immediately without re-scraping the previous 40.
Data Enrichment “Patches”:
- The Taiwan Patch: Manually injected missing economic data for Taiwan using official government statistics to ensure East Asian analysis was complete.
- The Literacy Patch: Imputed “99.0%” literacy rates for developed nations that no longer report these statistics, correcting a bias that skewed the regression model.
Verification: Integrated Folium to generate interactive HTML maps for immediate visual validation of coordinate accuracy.

Key Insights

After analyzing over 80,000 data points across 40+ economies, the data revealed distinct cultural models of development.

Strongest Link with HDI ( $r=0.67$ ): Library density is a powerful proxy for a nation’s well-being. The “European Cluster” (e.g., Finland, Czechia) demonstrates that high human development relies heavily on decentralized public infrastructure.
The “Asian Anomaly”: Tech leaders like Singapore and South Korea show lower per-capita library density despite high innovation. This reflects an urban efficiency model where single large institutions serve dense populations, contrasting with the dispersed European model.
The Czech Outlier: Historical context matters. Czechia has the world’s highest library density due to a 1919 law mandating libraries in every community, a legacy that persists today.

Gallery

Interactive Global Library Map — Figure 1: The visual result of scraping 80,000+ library coordinates to verify spatial accuracy.

HDI vs Library Density Regression — Figure 2: Regression analysis showing a strong positive correlation (r=0.67) between Library Density and the Human Development Index.

Figure 3: The National Library of the Czech Republic. Historical infrastructure plays a major role in the data outliers found in this study.

Mapping Knowledge: Global Library Density & National Wealth Analysis

Executive Summary

The Challenge

My Approach

Key Insights

Gallery

Project Info

Project Access