Instructions to use alpindale/landmark-33b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use alpindale/landmark-33b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="alpindale/landmark-33b", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("alpindale/landmark-33b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("alpindale/landmark-33b", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use alpindale/landmark-33b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "alpindale/landmark-33b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alpindale/landmark-33b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/alpindale/landmark-33b

SGLang

How to use alpindale/landmark-33b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "alpindale/landmark-33b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alpindale/landmark-33b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "alpindale/landmark-33b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alpindale/landmark-33b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use alpindale/landmark-33b with Docker Model Runner:
```
docker model run hf.co/alpindale/landmark-33b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Landmark Attention LLaMA 33B

This model has been trained using the PEFT LoRA technique with the Landmark Attention method over 200 steps. Model will likely be trained further and updated later on.

Usage

Requires trust_remote_code to be set to True. In oobabooga, you can simply add the --trust_remote_code flag.

You will also need to disable the Add the bos_token to the beginning of prompts option in the settings.

PEFT Checkpoint

You can probably merge the checkpoint with any other LLaMA-based model (provided they're 33B, of course). This repo contains the merged weights, but you can grab the adapter here.

Training Code

You can find the training code here.

Downloads last month: 277

Model tree for alpindale/landmark-33b

Quantizations

2 models

Paper for alpindale/landmark-33b

Landmark Attention: Random-Access Infinite Context Length for Transformers

Paper • 2305.16300 • Published May 25, 2023