davidpomerenke's picture
Upload from GitHub Actions: eval: check runtime budget per-batch so a slow model can't blow the 6h cap
594d28a verified