AI That Doesn’t Know About World War I
Researchers have built language models that genuinely don’t know about World War I. The History LLMs project, which hit 798 points on Hacker News this week, trains AI exclusively on pre-1913 texts—creating systems that can’t access information about WWI, the Spanish flu, or anything after their temporal cutoff. The goal: eliminate hindsight contamination, a critical flaw that makes modern LLMs unsuitable for historical research and social science.
Modern AI knows how the story ends. When GPT-5 analyzes 1912 political discourse, it already knows the League of Nations will fail. That omniscience corrupts research requiring authentic temporal constraints.
The Hindsight Problem
Hindsight bias occurs when people believe an event is more predictable after it becomes known than before. In War and Peace, Tolstoy accused Russian historians of claiming they “tricked” Napoleon into marching to Moscow—when victory was likely luck, not foresight. Kahneman’s studies showed people consistently overestimated their prediction accuracy after events like Nixon’s 1972 China visit and the 9/11 attacks.
This isn’t just a psychology curiosity. Hindsight bias affects clinical diagnosis, legal adjudication, historical analysis, and safety engineering. When modern LLMs analyze historical documents, they carry the same contamination. They can’t simulate 1912 perspectives because they already know what 1913 brings.
Time-Locked Models: Constraints as Features
History LLMs take a different approach. The Ranke-4B family—4 billion parameter models based on Qwen3 architecture—trains exclusively on 80 billion tokens from a curated 600 billion token historical corpus. The project created five variants with hard cutoffs: 1913, 1929, 1933, 1939, and 1946.
The key differentiator: “Time-locked models don’t roleplay; they embody their training data.” Unlike instruction-tuned models that simulate historical perspectives, these genuinely lack future knowledge because it doesn’t exist in their training universe. Ranke-4B-1913 doesn’t know about WWI because WWI hasn’t happened in its textual universe.
This isn’t a limitation—it’s the feature. Similar approaches like ChronoBERT achieve GLUE scores of 84.71-85.54, outperforming standard BERT despite temporal constraints.
Real-World Applications
The applications span multiple domains. In behavioral science, research published in PNAS shows time-locked LLMs create “populations of simulated historical participants”—enabling scientists to apply modern psychological instruments to past societies and study cultural psychology during major historical events.
Finance benefits from eliminating lookahead bias. ChronoBERT generates investment performance comparable to Llama 3.1 (Sharpe ratio p-value: 0.315) while ensuring temporal accuracy in backtesting strategies. Historical research gains tools to study discourse “as it was”—understanding what was “thinkable, predictable, or sayable” at specific moments without modern interpretation overlay.
When Knowing Less Makes AI Better
For developers, this represents a paradigm shift. The “more data equals better AI” assumption breaks down when temporal accuracy matters. Sometimes knowing less makes AI better.
Temporal constraints improve reliability for historical research requiring period-accurate perspectives, financial modeling eliminating lookahead bias, behavioral science studying cultural shifts, and legal contexts requiring temporal compliance. The industry trend moves from “collect everything” to “curate strategically”—domain-specific models with intentional limitations becoming competitive advantages.
The History LLMs project challenges a fundamental assumption: that omniscient AI is always beneficial. When you need to understand the past without the contamination of knowing the future, constraints aren’t bugs. They’re precisely engineered features that make impossible research suddenly possible.











