Instructions to use papahawk/keya-560m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use papahawk/keya-560m with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="papahawk/keya-560m")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("papahawk/keya-560m") model = AutoModelForCausalLM.from_pretrained("papahawk/keya-560m") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use papahawk/keya-560m with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "papahawk/keya-560m" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "papahawk/keya-560m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/papahawk/keya-560m
- SGLang
How to use papahawk/keya-560m with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "papahawk/keya-560m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "papahawk/keya-560m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "papahawk/keya-560m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "papahawk/keya-560m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use papahawk/keya-560m with Docker Model Runner:
docker model run hf.co/papahawk/keya-560m
Using the inferance API with papahawk/keya-560m giving me the error Error while deserializing header: HeaderTooLarge
This is why I moved to the larger GPT2-1.5 model for KeyaAI and continued development.
If you suspect that the token size is the issue, there are a few steps you can take to address this:
Verify the Token: Ensure that the token you are using is correct. Sometimes, when copying and pasting, additional characters or spaces might be included inadvertently.
Regenerate the Token: If the token seems unusually long or if you suspect it might be corrupted, you can regenerate a new token from the Hugging Face platform. Go to your account settings on the Hugging Face website and generate a new API token.
Token Usage: Make sure you're using the token correctly in your request. The token should be included in the Authorization header as a Bearer token. For example:
Authorization: Bearer YOUR_TOKEN.Test with a Minimal Request: Create a simple and minimal request with just the necessary headers, including the token, and see if you still encounter the error. This will help isolate if the token is indeed the problem.
Check for Hidden Characters: Sometimes, hidden characters (like newline characters) can sneak into a token when copying from certain interfaces. Inspect the token in a text editor that reveals all characters, or use a script to print each character and its ASCII value.
Limit Header Data: Aside from the token, ensure that other header data in your request is minimal and necessary. Excessive or large header fields can contribute to the issue.
Use Token Efficiently: If you're making multiple requests, ensure you're reusing the token efficiently and not generating a new token for each request.
If after these checks and adjustments you still face the same issue, it might be helpful to reach out to Hugging Face support for further assistance, as they might provide more context-specific guidance.
