None defined yet.
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?