SourceScore

Content moderation — fact-check LLM outputs before publishing

Editorial AI tools, content generation platforms, and publishing assistants ship hallucinated facts to thousands of readers. Add verification between draft and publish.

The problem

AI content tools — newsletter generators, blog assistants, report drafters, automated summary tools — generate fluent text fast. Their failure mode at scale is shipping hallucinated facts to thousands of readers.

Examples of damage at scale:

  • Tech newsletter auto-summarizes a paper, gets the author name wrong.
  • Industry-report tool cites a release date that's 2 years off.
  • Blog assistant attributes a quote to the wrong founder.
  • Marketing copy generator invents a non-existent integration.

The reader doesn't know it's wrong. The error compounds — gets reshared, quoted, indexed. Months later you're searching for "Llama 3 released 2025" and seeing your own incorrect content cited back at you.

The pattern

Pre-publish verification gate. Three steps between LLM draft and publish button:

  1. Extract atomic claims. Parse the draft into discrete assertions (dates, names, numbers, attributions).
  2. Verify each claim. Against domain catalogs + cross-reference checks. AI/ML claims via SourceScore VERITAS; other domains via Wikipedia + Wolfram + custom catalogs.
  3. Flag or strip unverified claims. Either route the draft to human review with unverified claims highlighted, or auto-strip and let the LLM regenerate without those claims.

Implementation

# Python — content moderation pipeline
import re
import httpx
from typing import Literal

class ClaimCheck:
    text: str
    status: Literal["verified", "unverified", "refuted"]
    source_url: str | None = None
    confidence: float | None = None

def extract_factual_claims(draft: str) -> list[str]:
    # Naive: extract sentences with proper nouns + numbers + dates
    # Production: use a dedicated claim-extraction model
    sentences = re.split(r'(?<=[.!?])\s+', draft)
    return [
        s for s in sentences
        if re.search(r'\b\d{4}\b|\b[A-Z][a-z]+\s+[A-Z][a-z]+\b', s)
    ]

def verify_aiml(claim: str) -> ClaimCheck:
    r = httpx.post(
        "https://sourcescore.org/api/v1/verify",
        json={"claim": claim, "minConfidence": 0.85},
        timeout=2.0,
    )
    result = r.json()
    match = result.get("bestMatch")
    if match and match["confidence"] >= 0.85:
        return ClaimCheck(
            text=claim,
            status="verified",
            source_url=match["detailUrl"],
            confidence=match["confidence"],
        )
    return ClaimCheck(text=claim, status="unverified")

def moderate(draft: str) -> dict:
    claims = extract_factual_claims(draft)
    checks = [verify_aiml(c) for c in claims]

    return {
        "draft": draft,
        "claims_checked": len(checks),
        "verified_count": sum(1 for c in checks if c.status == "verified"),
        "unverified_count": sum(1 for c in checks if c.status != "verified"),
        "unverified_claims": [c.text for c in checks if c.status != "verified"],
        "can_auto_publish": all(c.status == "verified" for c in checks),
    }

# In your editorial workflow:
result = moderate(llm_draft)
if result["can_auto_publish"]:
    publish(result["draft"])
else:
    route_to_human_review(result["draft"], result["unverified_claims"])

Use across editorial workflows

  • Newsletter platforms. Pre-flight every AI-generated section. Show editors a list of unverified claims with one-click strike-through.
  • Auto-summary tools. Strip claims with confidence < 0.85; let the LLM rewrite around the strikes.
  • SEO-content platforms. Block publish on any unverified factual assertion. Force the writer to either find a source or rephrase.
  • Internal company comms. Verify before sending all-hands or external comms drafted by AI.

What this catches vs misses

Catches well:

  • Wrong dates (Llama 3 released 2025 — wrong)
  • Wrong attributions (Transformer paper by Hinton — wrong)
  • Hallucinated specs (32k context window when source says 128k)
  • Made-up citations

Doesn't catch:

  • Plausible-sounding new claims not in any catalog (genuine ambiguity)
  • Style + tone issues
  • Bias + misleading framing of correct facts
  • Plagiarism / verbatim copy from a source

Free-tier viability

SourceScore VERITAS free tier covers up to 1,000 claim verifications per month. For a newsletter publishing 4 issues per week × 5 verifiable claims per issue = ~80 verifies/month — well within free tier. Higher volume scales to paid tiers (€19-€499/month).

Related