SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
Paper • 2405.19715 • Published
How to use hacky/acchead-llama2-chat-7bx70b with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("hacky/acchead-llama2-chat-7bx70b", dtype="auto")The Acceptance Prediction Head for Llama-2-chat 7B and 70B model pair trained with weight_mismatch=6 and resnet_num_layers=3. It is recommended to be used with stop_threshold=0.7. See arxiv: 2405.19715 for more details.
Usage: GitHub
@article{huang2024specdec++,
title={SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths},
author={Huang, Kaixuan and Guo, Xudong and Wang, Mengdi},
journal={arXiv preprint arXiv:2405.19715},
year={2024}
}