Back to Projects

SWE-bench

Live

The authoritative benchmark for evaluating AI coding agents

Project Information

Category:Competition
Released:2024
Developer:Princeton, CMU
License:Open Source
benchmarkevaluationcoding

About

Collects real software engineering problems from GitHub repositories. Accepted as an oral at ICLR 2024. Includes multiple subsets: SWE-bench (full), Lite, Verified (500 problems), and Pro (1,865 enterprise-level problems).

Now the de facto standard benchmark for AI coding agents. Top agents improved from 20% to 74%+ between 2024-2025.

Related Projects