code-reranker-v1
A cross-encoder reranker for code search, trained on CodeSearchNet pairs. Experimental — does not improve retrieval in our benchmarks. Published for reproducibility.
Status: Negative Result
This reranker regresses retrieval quality on our hard eval (55 confusable function pairs):
| Config | Recall@1 | Delta |
|---|---|---|
| No reranker | 90.9% | — |
| Web-trained cross-encoder | 80.0% | -10.9pp |
| This model (code-trained) | 9.1% | -81.8pp |
Root cause: Trained with random same-language negatives, which are too easy for cross-encoders. The model learns surface-level language patterns instead of semantic code discrimination. A V2 with BM25 hard negatives may fix this.
Training
- Architecture: Cross-encoder (BERT-base)
- Data: 50,000 CodeSearchNet pairs + 7,500 docstring pairs
- Epochs: 3
- Negatives: Random same-language (this was the mistake)
Usage (if you want to experiment)
# In cqs — NOT default, opt-in only
CQS_RERANKER_MODEL=jamie8johnson/code-reranker-v1 cqs "query" --rerank
License
Apache 2.0.
- Downloads last month
- 20