Instructions to use hanzla/Falcon3-Mamba-R1-v0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use hanzla/Falcon3-Mamba-R1-v0 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="hanzla/Falcon3-Mamba-R1-v0") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("hanzla/Falcon3-Mamba-R1-v0") model = AutoModelForCausalLM.from_pretrained("hanzla/Falcon3-Mamba-R1-v0") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use hanzla/Falcon3-Mamba-R1-v0 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "hanzla/Falcon3-Mamba-R1-v0" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hanzla/Falcon3-Mamba-R1-v0", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/hanzla/Falcon3-Mamba-R1-v0
- SGLang
How to use hanzla/Falcon3-Mamba-R1-v0 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "hanzla/Falcon3-Mamba-R1-v0" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hanzla/Falcon3-Mamba-R1-v0", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "hanzla/Falcon3-Mamba-R1-v0" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hanzla/Falcon3-Mamba-R1-v0", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use hanzla/Falcon3-Mamba-R1-v0 with Docker Model Runner:
docker model run hf.co/hanzla/Falcon3-Mamba-R1-v0
Ollama support
#1
by ayan4m1 - opened
Currently, when loading this model with ollama, the following error is produced (this is CPU inference, GPU inference also fails with a similar error):
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: ggml.c:4484: GGML_ASSERT(ggml_is_matrix(c)) failed
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: /usr/local/lib/ollama/runners/cpu_avx2/ollama_llama_server(+0x3df3c8)[0x5647481303c8]
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: /usr/local/lib/ollama/runners/cpu_avx2/ollama_llama_server(+0x3dfa45)[0x564748130a45]
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: /usr/local/lib/ollama/runners/cpu_avx2/ollama_llama_server(+0x3e8134)[0x564748139134]
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: /usr/local/lib/ollama/runners/cpu_avx2/ollama_llama_server(+0x4e7006)[0x564748238006]
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: /usr/local/lib/ollama/runners/cpu_avx2/ollama_llama_server(+0x4ed2e9)[0x56474823e2e9]
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: /usr/local/lib/ollama/runners/cpu_avx2/ollama_llama_server(+0x50a7e6)[0x56474825b7e6]
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: /usr/local/lib/ollama/runners/cpu_avx2/ollama_llama_server(+0x50f115)[0x564748260115]
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: /usr/local/lib/ollama/runners/cpu_avx2/ollama_llama_server(+0x37d1df)[0x5647480ce1df]
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: /usr/local/lib/ollama/runners/cpu_avx2/ollama_llama_server(+0x13ebe1)[0x564747e8fbe1]
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: SIGABRT: abort
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: PC=0x7f6610c8bd4c m=4 sigcode=18446744073709551610
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: signal arrived during cgo execution
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: goroutine 36 gp=0xc000106700 m=4 mp=0xc000057508 [syscall]:
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: runtime.cgocall(0x5647480ce190, 0xc000067b78)
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: runtime/cgocall.go:167 +0x4b fp=0xc000067b50 sp=0xc000067b18 pc=0x564747e8252b
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: github.com/ollama/ollama/llama._Cfunc_llama_load_model_from_file(0x7f65b4000c20, {0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x5647480cdba0, 0xc000014>Mar 28 15:12:14 bulletlogic.com ollama[2117519]: _cgo_gotypes.go:691 +0x50 fp=0xc000067b78 sp=0xc000067b50 pc=0x564747f2cd70
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: github.com/ollama/ollama/llama.LoadModelFromFile.func1({0x7ffc519b8d78?, 0x0?}, {0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x5647480cdba0, 0xc0000140>Mar 28 15:12:14 bulletlogic.com ollama[2117519]: github.com/ollama/ollama/llama/llama.go:311 +0x127 fp=0xc000067c78 sp=0xc000067b78 pc=0x564747f2f987
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: github.com/ollama/ollama/llama.LoadModelFromFile({0x7ffc519b8d78, 0x6e}, {0x0, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0}, 0xc000116170, ...})
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: github.com/ollama/ollama/llama/llama.go:311 +0x2d6 fp=0xc000067dc8 sp=0xc000067c78 pc=0x564747f2f676
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: github.com/ollama/ollama/llama/runner.(*Server).loadModel(0xc00013c1b0, {0x0, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0}, 0xc000116170, 0x0}, ...)
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: github.com/ollama/ollama/llama/runner/runner.go:850 +0xc5 fp=0xc000067f10 sp=0xc000067dc8 pc=0x5647480cb605
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: github.com/ollama/ollama/llama/runner.Execute.gowrap1()
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: github.com/ollama/ollama/llama/runner/runner.go:970 +0xda fp=0xc000067fe0 sp=0xc000067f10 pc=0x5647480ccf5a
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: runtime.goexit({})
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: runtime/asm_amd64.s:1700 +0x1 fp=0xc000067fe8 sp=0xc000067fe0 pc=0x564747e8ff61
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: github.com/ollama/ollama/llama/runner/runner.go:970 +0xd0d
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: runtime/proc.go:424 +0xce fp=0xc0000277b0 sp=0xc000027790 pc=0x564747e8832e
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: runtime.netpollblock(0x10?, 0x47e20b86?, 0x47?)
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: runtime/netpoll.go:575 +0xf7 fp=0xc0000277e8 sp=0xc0000277b0 pc=0x564747e4d097
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: internal/poll.runtime_pollWait(0x7f65c8a39ef0, 0x72)
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: runtime/netpoll.go:351 +0x85 fp=0xc000027808 sp=0xc0000277e8 pc=0x564747e87625
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: internal/poll.(*pollDesc).wait(0xc000176100?, 0x10?, 0x0)
Mar 28 15:12:14 bulletlogic.com ollama[2117519]: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000027830 sp=0xc000027808 pc=0x564747edd467
ollama is not currently supported. will definitely work this one out in the next checkpoint
ayan4m1 changed discussion status to closed