SourceScore

Research citation — programmatic citations for AI/ML research tools

Stable claim IDs, primary sources with verbatim excerpts, HMAC signatures for reproducibility. The verification layer for academic AI assistants, literature-review agents, and citation-required tooling.

The problem

You're building an AI tool for researchers — literature review assistant, citation finder, paper-summary chatbot, academic search. Your users care about citations more than your typical LLM-app user. Wrong dates or fabricated authors aren't a UX bug — they're a credibility-destroying incident.

Standard RAG over arXiv produces fluent summaries but hallucinates dates, authors, and methodology details. Researchers notice. Trust collapses fast.

The pattern

Three properties researchers need that VERITAS provides out-of-the-box:

  1. Stable claim IDs. Every claim has a 16-hex identifier (e.g., a1b2c3d4...) derived from SHA-256 of canonical fields. Cite the ID in a paper and it resolves to the same envelope in 3 years.
  2. Verbatim excerpts. Every source includes a quoted excerpt from the primary source, captured at verification time. Even if the source URL rots, you have the original text.
  3. HMAC signatures. Every envelope is signed with HMAC-SHA256. Tampering detectable. Audit-trail-friendly.

Example: literature-review assistant

import httpx

# User asks: "What pretraining methods preceded BERT?"
# Your assistant retrieves relevant papers from arXiv.
# Before responding, verify each factual assertion.

assertions_to_check = [
    "BERT was introduced in 2019 by Devlin et al.",
    "T5 was introduced by Raffel et al. in 2020",
    "RoBERTa was introduced by Liu et al. at Facebook AI in 2019",
]

verified_citations = []
for claim in assertions_to_check:
    r = httpx.post(
        "https://sourcescore.org/api/v1/verify",
        json={"claim": claim, "minConfidence": 0.85},
    )
    result = r.json()
    if result.get("bestMatch"):
        verified_citations.append({
            "claim": claim,
            "id": result["bestMatch"]["id"],
            "source_urls": [s["url"] for s in result["bestMatch"]["sources"]],
            "excerpts": [s.get("excerpt") for s in result["bestMatch"]["sources"]],
            "confidence": result["bestMatch"]["confidence"],
            "signature": result["signature"],
        })

# Now your assistant cites:
#   "BERT (Devlin et al., 2019) [^1]"
# Where [^1] resolves to a citation block with:
#   - Stable ID: a1b2c3d4...
#   - Primary source: https://arxiv.org/abs/1810.04805
#   - Verbatim excerpt from the abstract
#   - HMAC signature verifiable against did:web:sourcescore.org

Citation export format

For researchers who need machine-readable citations:

# BibTeX-style export for a VERITAS claim
@misc{sourcescore_a1b2c3d4,
  title = {SourceScore VERITAS verified claim a1b2c3d4},
  publisher = {SourceScore},
  year = {2026},
  url = {https://sourcescore.org/claims/a1b2c3d4/},
  note = {Verified against primary sources: [URL1, URL2]. HMAC-SHA256 signature.},
}

What the catalog covers

v0.1 catalog (~206 claims spanning 1997-2025) covers AI/ML research:

  • Foundational papers — Transformer, LSTM, BERT, RLHF, RAG, LoRA, etc.
  • Model releases — GPT family, Claude family, Llama family, Gemini, Mistral, DeepSeek, Phi, etc.
  • Benchmarks + datasets — MMLU, GLUE, ImageNet, C4, The Pile, etc.
  • Frameworks + libraries — PyTorch, TensorFlow, JAX, LangChain, LlamaIndex, etc.
  • Organizations — OpenAI, Anthropic, DeepMind, Mistral, Hugging Face, etc.

Out of scope for v0: papers in scientific computing, cybersecurity, biology (Y2). Performance comparisons (see why we don't ship those).

License

Verified-claim data is CC-BY 4.0. Cite as: SourceScore Claim <id>, sourcescore.org. You can redistribute, re-publish, derive — under the attribution condition. The methodology is proprietary; the claim data is open.

For academic submissions

When citing VERITAS-verified claims in formal papers, the recommended citation is:

SourceScore VERITAS (2026). Verified claim <id>. https://sourcescore.org/claims/<id>/

The stable URL + verbatim excerpt + HMAC signature mean a reviewer in 2030 can re-verify the claim against the same primary sources you cited.

Integration guides

  • DSPy — for research workflows with optimizers
  • LangChain — retrieve-then-cite pattern
  • LlamaIndex — for paper-corpus RAG
  • Pydantic AI — type-safe structured citation output

Related