SWE-bench

Live

The authoritative benchmark for evaluating AI coding agents

Visit Website View on GitHub

Project Information

Category:Competition

Released:2024

Developer:Princeton, CMU

License:Open Source

benchmarkevaluationcoding

About

Collects real software engineering problems from GitHub repositories. Accepted as an oral at ICLR 2024. Includes multiple subsets: SWE-bench (full), Lite, Verified (500 problems), and Pro (1,865 enterprise-level problems).

Now the de facto standard benchmark for AI coding agents. Top agents improved from 20% to 74%+ between 2024-2025.

Related Projects

Multi-Agent Debate

Multiple LLMs debate to improve reasoning accuracy