Content moderation — fact-check LLM outputs before publishing

The problem

AI content tools — newsletter generators, blog assistants, report drafters, automated summary tools — generate fluent text fast. Their failure mode at scale is shipping hallucinated facts to thousands of readers.

Examples of damage at scale:

Tech newsletter auto-summarizes a paper, gets the author name wrong.
Industry-report tool cites a release date that's 2 years off.
Blog assistant attributes a quote to the wrong founder.
Marketing copy generator invents a non-existent integration.

The reader doesn't know it's wrong. The error compounds — gets reshared, quoted, indexed. Months later you're searching for "Llama 3 released 2025" and seeing your own incorrect content cited back at you.

The pattern

Pre-publish verification gate. Three steps between LLM draft and publish button:

Extract atomic claims. Parse the draft into discrete assertions (dates, names, numbers, attributions).
Verify each claim. Against domain catalogs + cross-reference checks. AI/ML claims via SourceScore VERITAS; other domains via Wikipedia + Wolfram + custom catalogs.
Flag or strip unverified claims. Either route the draft to human review with unverified claims highlighted, or auto-strip and let the LLM regenerate without those claims.

Implementation

# Python — content moderation pipeline
import re
import httpx
from typing import Literal

class ClaimCheck:
    text: str
    status: Literal["verified", "unverified", "refuted"]
    source_url: str | None = None
    confidence: float | None = None

def extract_factual_claims(draft: str) -> list[str]:
    # Naive: extract sentences with proper nouns + numbers + dates
    # Production: use a dedicated claim-extraction model
    sentences = re.split(r'(?<=[.!?])\s+', draft)
    return [
        s for s in sentences
        if re.search(r'\b\d{4}\b|\b[A-Z][a-z]+\s+[A-Z][a-z]+\b', s)
    ]

def verify_aiml(claim: str) -> ClaimCheck:
    r = httpx.post(
        "https://sourcescore.org/api/v1/verify",
        json={"claim": claim, "minConfidence": 0.85},
        timeout=2.0,
    )
    result = r.json()
    match = result.get("bestMatch")
    if match and match["confidence"] >= 0.85:
        return ClaimCheck(
            text=claim,
            status="verified",
            source_url=match["detailUrl"],
            confidence=match["confidence"],
        )
    return ClaimCheck(text=claim, status="unverified")

def moderate(draft: str) -> dict:
    claims = extract_factual_claims(draft)
    checks = [verify_aiml(c) for c in claims]

    return {
        "draft": draft,
        "claims_checked": len(checks),
        "verified_count": sum(1 for c in checks if c.status == "verified"),
        "unverified_count": sum(1 for c in checks if c.status != "verified"),
        "unverified_claims": [c.text for c in checks if c.status != "verified"],
        "can_auto_publish": all(c.status == "verified" for c in checks),
    }

# In your editorial workflow:
result = moderate(llm_draft)
if result["can_auto_publish"]:
    publish(result["draft"])
else:
    route_to_human_review(result["draft"], result["unverified_claims"])

Use across editorial workflows

Newsletter platforms. Pre-flight every AI-generated section. Show editors a list of unverified claims with one-click strike-through.
Auto-summary tools. Strip claims with confidence < 0.85; let the LLM rewrite around the strikes.
SEO-content platforms. Block publish on any unverified factual assertion. Force the writer to either find a source or rephrase.
Internal company comms. Verify before sending all-hands or external comms drafted by AI.

What this catches vs misses

Catches well:

Wrong dates (Llama 3 released 2025 — wrong)
Wrong attributions (Transformer paper by Hinton — wrong)
Hallucinated specs (32k context window when source says 128k)
Made-up citations

Doesn't catch:

Plausible-sounding new claims not in any catalog (genuine ambiguity)
Style + tone issues
Bias + misleading framing of correct facts
Plagiarism / verbatim copy from a source

Free-tier viability

SourceScore VERITAS free tier covers up to 1,000 claim verifications per month. For a newsletter publishing 4 issues per week × 5 verifiable claims per issue = ~80 verifies/month — well within free tier. Higher volume scales to paid tiers (€19-€499/month).