Instructions to use tuandunghcmut/BitNet-cpp-compiled-with-custom-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use tuandunghcmut/BitNet-cpp-compiled-with-custom-model with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="tuandunghcmut/BitNet-cpp-compiled-with-custom-model", filename="3rdparty/llama.cpp/models/ggml-vocab-aquila.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use tuandunghcmut/BitNet-cpp-compiled-with-custom-model with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf tuandunghcmut/BitNet-cpp-compiled-with-custom-model # Run inference directly in the terminal: llama-cli -hf tuandunghcmut/BitNet-cpp-compiled-with-custom-model
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf tuandunghcmut/BitNet-cpp-compiled-with-custom-model # Run inference directly in the terminal: llama-cli -hf tuandunghcmut/BitNet-cpp-compiled-with-custom-model
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf tuandunghcmut/BitNet-cpp-compiled-with-custom-model # Run inference directly in the terminal: ./llama-cli -hf tuandunghcmut/BitNet-cpp-compiled-with-custom-model
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf tuandunghcmut/BitNet-cpp-compiled-with-custom-model # Run inference directly in the terminal: ./build/bin/llama-cli -hf tuandunghcmut/BitNet-cpp-compiled-with-custom-model
Use Docker
docker model run hf.co/tuandunghcmut/BitNet-cpp-compiled-with-custom-model
- LM Studio
- Jan
- Ollama
How to use tuandunghcmut/BitNet-cpp-compiled-with-custom-model with Ollama:
ollama run hf.co/tuandunghcmut/BitNet-cpp-compiled-with-custom-model
- Unsloth Studio new
How to use tuandunghcmut/BitNet-cpp-compiled-with-custom-model with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tuandunghcmut/BitNet-cpp-compiled-with-custom-model to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tuandunghcmut/BitNet-cpp-compiled-with-custom-model to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for tuandunghcmut/BitNet-cpp-compiled-with-custom-model to start chatting
- Docker Model Runner
How to use tuandunghcmut/BitNet-cpp-compiled-with-custom-model with Docker Model Runner:
docker model run hf.co/tuandunghcmut/BitNet-cpp-compiled-with-custom-model
- Lemonade
How to use tuandunghcmut/BitNet-cpp-compiled-with-custom-model with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull tuandunghcmut/BitNet-cpp-compiled-with-custom-model
Run and chat with the model
lemonade run user.BitNet-cpp-compiled-with-custom-model-{{QUANT_TAG}}List all available models
lemonade list
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Guide: Using a Custom Fine-Tuned Model with bitnet.cpp
This document outlines the process of downloading a custom fine-tuned model, converting it to the GGUF format, compiling the necessary C++ code, and running inference.
Prerequisites
Before you begin, ensure you have the following prerequisites installed and configured:
- Python 3.9 or later
- CMake 3.22 or later
- A C++ compiler (e.g., clang, g++)
- The Hugging Face Hub CLI (
huggingface-cli)
Step 1: Download the Custom Model
In this guide, we will use the tuandunghcmut/BitNET-Summarization model as an example. This model was fine-tuned by tuandunghcmut for summarization tasks. We will download it and place it in a directory that the setup_env.py script can recognize.
huggingface-cli download tuandunghcmut/BitNET-Summarization --local-dir models/BitNet-b1.58-2B-4T
This command downloads the model and places it in the models/BitNet-b1.58-2B-4T directory. This is a workaround to make the existing scripts recognize the custom model.
Step 2: Convert the Model to GGUF Format
The downloaded model is in the .safetensors format. We need to convert it to the GGUF format to be used with bitnet.cpp. We will use the convert-helper-bitnet.py script for this.
However, the script needs some modifications to work with this custom model.
Modifications to the Conversion Scripts
utils/convert-helper-bitnet.py: Add the--skip-unknownflag to thecmd_convertlist to ignore unknown tensor names.cmd_convert = [ sys.executable, str(convert_script), str(model_dir), "--vocab-type", "bpe", "--outtype", "f32", "--concurrency", "1", "--outfile", str(gguf_f32_output), "--skip-unknown" ]utils/convert-hf-to-gguf-bitnet.py:- Add the
BitNetForCausalLMarchitecture to the@Model.registerdecorator for theBitnetModelclass. - Change the
set_vocabmethod in theBitnetModelclass to use_set_vocab_gpt2().
@Model.register("BitNetForCausalLM", "BitnetForCausalLM") class BitnetModel(Model): model_arch = gguf.MODEL_ARCH.BITNET def set_vocab(self): self._set_vocab_gpt2()- Add the
Running the Conversion
After making these changes, run the conversion script:
python utils/convert-helper-bitnet.py models/BitNet-b1.58-2B-4T
This will create the ggml-model-i2s-bitnet.gguf file in the model directory.
Step 3: Compile bitnet.cpp
Now, we need to compile the C++ code. We will use the setup_env.py script for this. We will use the i2_s quantization type.
python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s
This command will compile the C++ code and create the necessary binaries.
Step 4: Run Inference
Finally, we can run inference with the converted model.
python run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "Hello"
This will load the model and generate a response to the prompt "Hello".
Build Environment
This project was built and compiled on a CPU-only machine with the following specifications:
- CPU: AMD EPYC 9754 128-Core Processor
- Memory: 251Gi
Fine-Tuning
The tuandunghcmut/BitNET-Summarization model was fine-tuned using a special Quantization-Aware Training (QAT) process. This was done with the support of the BitNet layer from the Hugging Face library.
- Downloads last month
- -
We're not able to determine the quantization variants.