Back to Projects
SWE-bench
LiveThe authoritative benchmark for evaluating AI coding agents
Project Information
Category:Competition
Released:2024
Developer:Princeton, CMU
License:Open Source
benchmarkevaluationcoding
About
Collects real software engineering problems from GitHub repositories. Accepted as an oral at ICLR 2024. Includes multiple subsets: SWE-bench (full), Lite, Verified (500 problems), and Pro (1,865 enterprise-level problems).
Now the de facto standard benchmark for AI coding agents. Top agents improved from 20% to 74%+ between 2024-2025.