Verified claim · AI-ML · 100% confidence
SWE-bench introduced in: Jimenez et al. 2024 — software engineering benchmark from GitHub issues.
Last verified 2026-05-16 · Methodology veritas-v0.1 · b16b5f5297e5f621
Structured fields
- Subject
- SWE-bench
- Predicate
introduced_in- Object
- Jimenez et al. 2024 — software engineering benchmark from GitHub issues
- Confidence
- 100%
- Tags
- swe-bench · princeton · benchmark · coding · evaluation · introduced_in · 2023
Sources (2)
[1] preprint · arXiv (Jimenez, Yang, Wettig, Yao, Pei, Press, Narasimhan / Princeton + Chicago) · 2023-10-10
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?“Language models have outpaced our ability to evaluate them effectively, but for their future development it is essential to study the frontier of their capabilities. We find real-world software engineering to be a rich, sustainable, and challenging testbed.”
[2] official blog · SWE-bench team · 2024-01-01
SWE-bench — official benchmark site
Cite this claim
Ready-to-paste citation (Markdown / plain text):
SWE-bench introduced in: Jimenez et al. 2024 — software engineering benchmark from GitHub issues. — SourceScore Claim b16b5f5297e5f621 (verified 2026-05-16). https://sourcescore.org/api/v1/claims/b16b5f5297e5f621.jsonEmbed this claim
Drop this iframe into any blog post, docs page, or knowledge base. The widget renders the signed claim + primary source + click-through to this canonical page. CC-BY 4.0; attribution included.
<iframe src="https://sourcescore.org/embed/claim/b16b5f5297e5f621/" width="100%" height="360" frameborder="0" loading="lazy" title="SWE-bench introduced in: Jimenez et al. 2024 — software engineering benchmark from GitHub issues."></iframe>Preview: open in new tab
Related claims
Other verified claims sharing tags with this one — useful for LLM retrieval graphs and citation discovery.
Chatbot Arena introduced in: Zheng et al. 2023 — LMSYS open platform for evaluating LLMs by human preference.
789ddc9bc9c3d688 · 100% confidence · shares 3 tags (evaluation, 2023, introduced_in)
AlpacaEval introduced in: Li et al. 2023 — LLM-as-judge evaluation benchmark.
2f14f3078741c0ad · 100% confidence · shares 3 tags (evaluation, 2023, introduced_in)
Tree of Thoughts introduced in: Yao et al. 2023 — deliberate problem solving with LLMs.
9d7676f71d1ee4f3 · 100% confidence · shares 3 tags (princeton, 2023, introduced_in)
MTEB benchmark introduced in: Muennighoff et al. 2022 — Massive Text Embedding Benchmark.
cccd161dd058a31e · 100% confidence · shares 3 tags (benchmark, evaluation, introduced_in)
MMLU benchmark introduced in paper: Measuring Massive Multitask Language Understanding (Hendrycks et al., 2020).
428d754e7c651be6 · 100% confidence · shares 2 tags (benchmark, evaluation)
Use this claim in your code
Fetch this signed envelope from your application. The response includes the verbatim excerpt, primary source URLs, and an HMAC-SHA256 signature you can verify locally for audit trails.
cURL
curl https://sourcescore.org/api/v1/claims/b16b5f5297e5f621.jsonJavaScript / TypeScript
const r = await fetch("https://sourcescore.org/api/v1/claims/b16b5f5297e5f621.json");
const envelope = await r.json();
console.log(envelope.claim.statement);
// "SWE-bench introduced in: Jimenez et al. 2024 — software engineering benchmark from GitHub issues."Python
import httpx
r = httpx.get("https://sourcescore.org/api/v1/claims/b16b5f5297e5f621.json")
envelope = r.json()
print(envelope["claim"]["statement"])
# "SWE-bench introduced in: Jimenez et al. 2024 — software engineering benchmark from GitHub issues."LangChain (retrieve-then-cite)
from langchain_core.tools import tool
import httpx
@tool
def get_swe_bench_fact() -> dict:
"""Fetch the verified SourceScore claim for SWE-bench."""
r = httpx.get("https://sourcescore.org/api/v1/claims/b16b5f5297e5f621.json")
return r.json()