Instructions to use windprak/open_steuerllm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use windprak/open_steuerllm with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="windprak/open_steuerllm") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("windprak/open_steuerllm") model = AutoModelForCausalLM.from_pretrained("windprak/open_steuerllm") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use windprak/open_steuerllm with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "windprak/open_steuerllm" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "windprak/open_steuerllm", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/windprak/open_steuerllm
- SGLang
How to use windprak/open_steuerllm with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "windprak/open_steuerllm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "windprak/open_steuerllm", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "windprak/open_steuerllm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "windprak/open_steuerllm", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use windprak/open_steuerllm with Docker Model Runner:
docker model run hf.co/windprak/open_steuerllm
Benchmark Submission: HTTP 429 on all IPs (Bug?)
Hi,
I've been trying to submit predictions to the SteuerEx Benchmark
(https://steuerllm.i5.ai.fau.de/benchmark) for Claude Opus 4,
but every attempt returns:
"You have already submitted 1 time(s). Maximum 1 submission(s) per IP allowed."
I've tried 4 completely different networks/IPs (home WiFi, mobile hotspot,
VPN Zurich, VPN Tirana) β all return the same 429 error. This suggests
the server might be checking something other than the IP, or there's
a bug in the submission system.
My predictions.json is validated (115 answers, all IDs 1001-1115,
UTF-8, no empty entries).
Could you look into this? I'd love to test multiple models on the benchmark.
Also β is there any way to get access to the reference statements
for local evaluation?
Thanks for the great benchmark!
It was a configuration issue on our side. Can you try:
curl -s -X POST https://steuerllm.i5.ai.fau.de/benchmark/submit -F "model_name=TestModel" -F "key=XX" -F "file=@/home/user/test_submission.json" 2>&1
{"message":"Submission received and queued for evaluation","queue_position":1,"status_url":"/status/4c112a8e38783049","submission_id":"4c112a8e38783049","success":true}
curl -s https://steuerllm.i5.ai.fau.de/benchmark/status/ID 2>&1 | python3 -m json.tool
{
"model_name": "TestModel",
"progress": 42,
"queue_position": 1,
"queue_size": 0,
"status": "evaluating",
"timestamp": "2026-02-13T10:18:57.815690"
}
The gold answers will be released when the service is taken down. For now we want to prevent training polution.
thanks