Instructions to use alpindale/landmark-33b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use alpindale/landmark-33b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="alpindale/landmark-33b", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("alpindale/landmark-33b", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("alpindale/landmark-33b", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use alpindale/landmark-33b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "alpindale/landmark-33b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alpindale/landmark-33b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/alpindale/landmark-33b
- SGLang
How to use alpindale/landmark-33b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "alpindale/landmark-33b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alpindale/landmark-33b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "alpindale/landmark-33b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alpindale/landmark-33b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use alpindale/landmark-33b with Docker Model Runner:
docker model run hf.co/alpindale/landmark-33b
Landmark Attention LLaMA 33B
This model has been trained using the PEFT LoRA technique with the Landmark Attention method over 200 steps. Model will likely be trained further and updated later on.
Usage
Requires trust_remote_code to be set to True. In oobabooga, you can simply add the --trust_remote_code flag.
You will also need to disable the Add the bos_token to the beginning of prompts option in the settings.
PEFT Checkpoint
You can probably merge the checkpoint with any other LLaMA-based model (provided they're 33B, of course). This repo contains the merged weights, but you can grab the adapter here.
Training Code
You can find the training code here.
- Downloads last month
- 277