Insights · May 14th, 2026

When your team asks ChatGPT, Gemini, or Claude a question about politics, geopolitics, or international institutions, you might assume you’re getting a reasonably neutral response shaped by the world’s collective written knowledge. New research published in Nature suggests something more troubling: the answer you receive depends partly on the language you ask in — and that linguistic dependency tracks the degree of press freedom in the countries where that language is spoken.The paper, State Media Control Influences Large Language Models (Waight, Yang, Yuan, Messing, Roberts, Stewart, and Tucker, 2026), is the first large-scale empirical demonstration that authoritarian information ecosystems are leaving measurable fingerprints on the commercial AI systems used by hundreds of millions of people, including inside enterprises.

Summary of Findings

The authors run six interlocking studies. Taken together, they form a chain of evidence linking state-controlled media to the behavior of frontier LLMs.

1. State-coordinated content is already inside the training data. A five-word-gram analysis of CulturaX — one of the largest open multilingual training corpora — found that roughly 3.1 million Chinese-language documents (about 1.64% of the Chinese subset) match Chinese state-coordinated media sources. That is roughly 41 times the match rate of Chinese Wikipedia. For documents that mention political leaders or institutions, the match rate climbs as high as 24%.

2. Commercial LLMs have memorized that content. When prompted with the first half of distinctive phrases from state-coordinated outlets, leading commercial models reproduce the expected continuation 3–10% of the time, matching or exceeding the rate at which they complete generic web text.

3. Adding state media to pretraining demonstrably shifts model output. Using the open-weight Llama-2-13B as a testbed, the authors show that with as few as 6,400 additional training documents, the model produces more pro-government responses to questions about Chinese leaders and institutions nearly 80% of the time when prompted in Chinese. The effect also spills over into other languages, with the largest spillover where writing systems (and therefore tokens) overlap.

4. Production models behave the same way. When the same questions are asked of major commercial models in Chinese versus English, human annotators judged the Chinese-language response more favorable to the Chinese government in 75.3% of comparisons. The effect holds with real-world user prompts harvested from WildChat and Chinese Q&A platforms, scales with model size (bigger models, bigger gap), and extends beyond China to prompts about Russia and North Korea.

5. The pattern generalizes globally. Across 37 countries where a single language dominates, models prompted in the local language produce more regime-favorable answers in countries with lower press freedom. In high-press-freedom countries, the local-language effect disappears or even reverses. Media freedom, in other words, is now a statistically significant predictor of LLM behavior.

Why It Matters

For two decades, debates about state media influence have centered on television, search rankings, and social platforms. This research moves the conversation into a far less visible and far harder-to-audit layer of the information stack: the pretraining substrate of general-purpose AI.

Three implications stand out.

First, neutrality is language-dependent. A multinational company querying the same model in English, Mandarin, Russian, and Turkish may receive systematically different framings of the same geopolitical reality — not because anyone configured it to, but because the underlying training data reflects the asymmetries of each country’s media environment. The model is not “biased” in the colloquial sense; it is faithfully reflecting the corpus it was fed.

Second, the attack surface is now upstream. Historically, influence operations targeted the consumer: a Facebook ad, a YouTube recommendation, a trending hashtag. State coordinated media in pretraining data influences the model itself, which then mediates every downstream interaction — search, customer support agents, internal copilots, research assistants. The authors put it bluntly: states and powerful institutions now have increased strategic incentives to flood the open web with content designed to be ingested by the next generation of training pipelines.

Third, the cost of influence is shockingly low. The pretraining experiment shifted model behavior with 6,400 documents. That is not a national-scale propaganda budget. That is a small team with a coordinated content operation.

What This Means for CEOs

Most enterprise AI strategies today focus on productivity, cost, and data privacy. This research adds a new category to the executive risk register: information provenance in generative AI. Five concrete implications for senior leaders:

1. Treat LLM outputs about geopolitics, regulation, and foreign markets as analytically suspect by default. If your strategy, due diligence, or country-risk briefings are being drafted or summarized by an LLM, the language of the prompt and the country in question now matter. Ask the same question in multiple languages and triangulate. A cross-lingual divergence is itself a useful signal.

2. Build a vendor due-diligence question set around training data and post-training mitigations. Every major model provider should be able to tell you, at least in broad strokes, how they identify and downweight state-coordinated sources, whether they have language-specific safety evaluations for politically sensitive content, and how they audit for the kind of cross-lingual valence gaps documented here. If they cannot answer, that is the answer.

3. Consider linguistic risk in any market-entry or M&A workflow that uses AI assistance. Analysts evaluating a target in a low-press-freedom market may be working with a model that has absorbed that country’s preferred narrative about its own institutions, regulators, and state-owned competitors. This is not hypothetical — the paper shows it across 37 countries.

4. Invest in retrieval-augmented and source-grounded architectures for high-stakes use cases. When the underlying model leans, grounding it in verified, attributed sources at inference time becomes more than a hallucination mitigation. It becomes an editorial control. For any application where the answer could influence a regulatory filing, an investment thesis, a board memo, or a public statement, retrieval over a curated corpus should be the default.

5. Recognize this as the early surface area of a much larger problem. Today the evidence is clearest for Chinese, Russian, and a handful of other languages. But the mechanism — coordinated content + opportunistic scraping + downstream model deployment — is general. As more actors realize how cheap it is to influence pretraining, the volume of strategically placed content will rise. The companies that build governance for this now will be better positioned than those who treat it as a content-moderation afterthought.

Final Thoughts

This study does not argue that LLMs are propaganda machines. It argues something more subtle and more durable: that the AI systems we are wiring into the global economy inherit the information asymmetries of the world that built them — and that those asymmetries are now legible, measurable, and exploitable.

The executives who understand that early will make better decisions than those who don’t.

Source: Waight, H., Yang, E., Yuan, Y., Messing, S., Roberts, M. E., Stewart, B. M., & Tucker, J. A. (2026). State Media Control Influences Large Language Models. Nature. https://doi.org/10.1038/s41586-026-10506-7. Interactive companion: https://state-media-influence-llm.github.io/

Other articles in ‘The CEO’s guide to AI’ series

Nikolas carefully scans and curates worthwhile research for the ‘The CEO’s guide to AI’ series – read more below:

About Nikolas Badminton

Nikolas Badminton is the Chief Futurist & Hope Engineer at futurist.com. He’s a world-renowned futurist keynote speaker, consultant, author, media producer, and executive advisor that has spoken to, and worked with, over 500 of the world’s most impactful organizations and governments.

Nikolas is an artificial intelligence expert and his 2026 keynote ‘The AI Leader: Create Incredible Productivity, Profit & Growth’ is the level up for the modern CEO and executive leader.

Please contact futurist speaker and consultant Nikolas Badminton to discuss your engagement.

Category
Artificial Intelligence Facing Our Futures Futurist Keynote
Nikolas Badminton – Chief Futurist

Nikolas Badminton

Nikolas is the Chief Futurist of the Futurist Think Tank. He is world-renowned futurist speaker, a Fellow of The RSA, and has worked with over 300 of the world’s most impactful companies to establish strategic foresight capabilities, identify trends shaping our world, help anticipate unforeseen risks, and design equitable futures for all. In his new book – ‘Facing Our Futures’ – he challenges short-term thinking and provides executives and organizations with the foundations for futures design and the tools to ignite curiosity, create a framework for futures exploration, and shift their mindset from what is to WHAT IF…

Contact Nikolas