Why you can trust this

This section is currently available in English only.

Every summary on this site is generated by a language model. Language models are fluent. They produce confident, well-structured text. They can also make things up ^[1]. Current estimates put hallucination rates at 3–27% depending on the model and task ^[2]. We can't eliminate that entirely. But we can make the credibility mechanisms overt: visible, inspectable, and verifiable by you. Not "trust us." See for yourself.

AI transparency

All summaries on ovr.news are generated by AI using Gemma 3 (27B), an open-source model running on our own hardware. Translation is currently switched off — the site runs English-first; when enabled it uses DeepL and Gemini Flash (Google) as cloud services. Each article shows an AI summary label. AI can make mistakes. That's why we always link to the original article, so you can verify.

Layer 1: Ground the model

The most effective defense against hallucination is also the simplest: constrain what the model may say. Every summary prompt includes explicit grounding instructions:

Use facts from the article. Do not invent statistics, quotes, or claims. Do not add context, background, or interpretation beyond what the source article states.

Research on retrieval-augmented generation shows that anchoring responses in source documents substantially reduces hallucination rates ^[3]. Our prompts also ban words that signal editorializing rather than reporting: "groundbreaking," "innovative," "significant," "highlights," "showcases," "underscores."

Layer 2: Control the temperature

Language models have a parameter called temperature that controls randomness. Higher values produce more creative, more drift-prone output. Lower values keep the model closer to its input. A 2026 study across 172 billion tokens found that hallucination rates increase measurably with temperature ^[4]. Our summaries run at 0.7 — a middle setting, not a low one: constrained enough to stay close to the source material while allowing natural phrasing. The grounding work is done by the prompt constraints and render-time checks, not by the temperature alone.

We also require the model to reason through the prompt constraints before generating output. Rather than jumping straight to fluent text, it first works through the rules: what the article says, what the grounding instructions allow, and what the word limits require.

Layer 3: Clean the input

A model can only be as faithful as its input. Before any article reaches the language model, it passes through a content quality gate — density-based heuristics that check for:

CSS and HTML leakage — if more than 3–5% of the text is markup, the extraction failed. Rejected.
Cookie banners — short articles dominated by consent language. Rejected.
Paywall stubs — articles under 800 characters with phrases like "subscribe to continue." Rejected.
Navigation debris — when most lines are shorter than 30 characters, you're reading a menu, not an article. Rejected.

Articles that fail the quality gate never reach the language model. Better to show nothing than a confident summary of garbage.

Layers 1 through 3 are about prevention. The next three layers are about verification: making it possible for you to check.

Layer 4: Link to the source

Every article on this site has a "Read Original" link. The original URL, the source name, and the publication domain are preserved from the moment an article enters the pipeline to the moment it appears on your screen.

Research on AI transparency shows that source attribution is one of the strongest predictors of user trust ^[5]. When other outlets report the same story, we show those too. Independent corroboration from multiple sources is a stronger trust signal than any single summary.

Layer 5: Show the scores

Every article shows its weighted average score. Open the article, and you can see which dimensions were scored and how. The filter definitions are published on GitHub, and two of the trained filters are available on Hugging Face.

You might disagree with a score. That's fine. The point is that the judgment is inspectable.

Layer 6: Editorial rules

After scoring and summarization, an editorial layer makes final decisions. Most of its rules are deterministic TypeScript checks with configurable thresholds: they ensure scientific and research sources get representation and promote corroborated stories, on top of the pipeline's duplicate removal.

One rule is AI, and it matters enough to name: the editorial gate. The same local model that writes summaries reads each summarized article and classifies what kind of content it is — a delivered outcome, a press release, an obituary, off-topic for this site. By default the gate only observes: its decisions are logged, nothing is removed. A category gains the power to drop articles only after we review its logged decisions against real data and explicitly enable that category. Two are enforced today: obituaries and off-topic items. Every decision, enforced or not, is logged with a reason and the phrase from the article that triggered it.

Six layers of credibility scaffolding
Layer	What it does
Grounded prompts	Constrain what the model can say
Low temperature	Favor fidelity over creativity
Content quality gate	Reject junk before it reaches the model
Source links	Make verification one click away
Visible scores	Make the reasoning inspectable
Editorial rules	Deterministic checks plus an audited AI gate, human-enabled

What we're honest about

We can't verify facts. The model summarizes what the article says. If the article contains an error, the summary will too.
Summaries can still drift. Despite grounding prompts and low temperature, subtle distortion happens. A nuance gets lost. An emphasis shifts. This is why the source link exists ^[6].
Quality depends on extraction. Some websites make it hard to extract clean text. The quality gate usually catches poor extraction. Sometimes it doesn't.
Scores reflect our lens definitions. The scoring system encodes what we think "belonging" or "discovery" means. Those definitions are published, but they're still editorial choices.

What AI doesn't do

AI is a tool, not an editor. There are things we deliberately don't leave to AI:

Whether numbers in an article are accurate
Whether a source is reliable
Whether a claim is proven

We leave that judgment to you. We give you the context to decide for yourself.

We'd rather you trust us because you verified, not because we asked you to.

Source quality

Not all news sources are equal. We assess each source for reliability, so you know where the news comes from.

The tiers

Verified

Reliability confirmed by independent databases or reviewed by our editor. These sources have a credibility score from 0 to 10.

Examples: Reuters, BBC, Nature, AP News, The Lancet, public broadcasters

Curated

Deliberately added to our source collection, but not externally verified. These sources were chosen because they fit our lenses, but don't have an independent credibility score.

Examples: specialized publications, regional media, non-profit news services

Unknown

Source is not in our database. This doesn't mean the source is unreliable. It only means we haven't been able to establish its reliability.

Credibility score

Verified sources receive a score from 0 to 10, based on independent assessments:

Credibility score ranges and examples
Score	Rating	Examples
9.0 – 10.0	Very high	Nature, The Lancet, NIH, EU institutions
7.5 – 8.9	High	Reuters, BBC, AP, arXiv, public broadcasters
6.0 – 7.4	Medium	Major newspapers, think tanks
4.0 – 5.9	Neutral	Mixed factual reporting
< 4.0	Low	State media, tabloids. Rarely in our selection.

Where do the scores come from?

Credibility scores are computed as a weighted average across three independent databases:

IDIAP Research Institute: Academic database with NewsGuard scores and reliability labels for ~5,300 domains
Media Bias/Fact Check: Independent assessment of factual reporting and political bias for ~4,400 domains
Wikipedia Perennial Sources: Community-consensus reliability ratings maintained by Wikipedia editors for ~420 domains

For sources not covered by these databases, our editor assigns scores manually — ovr.news is run by one person, so these are one editor's judgment calls, not a committee's. Where a manual score overlaps with an external database, we run automated checks to flag significant disagreements.

Current coverage

We currently track ~1,000 source domains (the "1,400 sources" elsewhere counts feeds — some domains provide several):

Source coverage by verification method
Method	Domains	What it means
External databases	~270 (27%)	Score backed by IDIAP, MBFC, and/or Wikipedia. Nearly all confirmed by 2+ independent sources
Editorial review	~650 (65%)	Score assigned by our editor. These are our judgment calls, not independently verified
Unscored	~80 (8%)	In our collection but no credibility data available. Shown without a score.

Source type

Beyond reliability, we also classify sources by type:

Wire service: Reuters, AP, AFP
Academic: Nature, The Lancet, arXiv
Public broadcaster: BBC, NOS, NPO
NGO / non-profit: Positive News, Solutions Journalism Network
Newspaper: The Guardian, El País, de Volkskrant
Government / institutional: EU, WHO, NIH

Our editorial stance

We show all tiers. We don't hide articles from curated or unknown sources.
No score doesn't mean unreliable. It means we couldn't verify.
A good source can publish a bad article. We assess at the domain level, not per article.

References

Pesaranghader, A. & Li, E. (2026). "Hallucination Detection and Mitigation in Large Language Models." arXiv:2601.09929.
Saxena, H. (2025). "Hallucination in Generative Artificial Intelligence: Challenges, Causes, and Mitigation Strategies." SSRN 5976335.
Li, Y. et al. (2025). "Mitigating Hallucination in Large Language Models: An Application-Oriented Survey on RAG, Reasoning, and Agentic Systems." arXiv:2510.24476.
Roig, J.V. (2026). "How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms." arXiv:2603.08274.
Zerilli, J., Bhatt, U. & Weller, A. (2022). "How transparency modulates trust in artificial intelligence." Patterns, 3(4). doi:10.1016/j.patter.2022.100455.
Dang, A.-H., Tran, V. & Nguyen, L.-M. (2025). "Survey and analysis of hallucinations in large language models: attribution to prompting strategies or model behavior." Frontiers in Artificial Intelligence. doi:10.3389/frai.2025.1622292.

Last updated: July 2026