Instructions to use tuandunghcmut/BitNet-cpp-compiled-with-custom-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tuandunghcmut/BitNet-cpp-compiled-with-custom-model with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="tuandunghcmut/BitNet-cpp-compiled-with-custom-model",
	filename="3rdparty/llama.cpp/models/ggml-vocab-aquila.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use tuandunghcmut/BitNet-cpp-compiled-with-custom-model with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf tuandunghcmut/BitNet-cpp-compiled-with-custom-model
# Run inference directly in the terminal:
llama-cli -hf tuandunghcmut/BitNet-cpp-compiled-with-custom-model

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf tuandunghcmut/BitNet-cpp-compiled-with-custom-model
# Run inference directly in the terminal:
llama-cli -hf tuandunghcmut/BitNet-cpp-compiled-with-custom-model

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf tuandunghcmut/BitNet-cpp-compiled-with-custom-model
# Run inference directly in the terminal:
./llama-cli -hf tuandunghcmut/BitNet-cpp-compiled-with-custom-model

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf tuandunghcmut/BitNet-cpp-compiled-with-custom-model
# Run inference directly in the terminal:
./build/bin/llama-cli -hf tuandunghcmut/BitNet-cpp-compiled-with-custom-model

Use Docker

docker model run hf.co/tuandunghcmut/BitNet-cpp-compiled-with-custom-model

LM Studio
Jan
Ollama
How to use tuandunghcmut/BitNet-cpp-compiled-with-custom-model with Ollama:
```
ollama run hf.co/tuandunghcmut/BitNet-cpp-compiled-with-custom-model
```

Unsloth Studio new

How to use tuandunghcmut/BitNet-cpp-compiled-with-custom-model with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for tuandunghcmut/BitNet-cpp-compiled-with-custom-model to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for tuandunghcmut/BitNet-cpp-compiled-with-custom-model to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for tuandunghcmut/BitNet-cpp-compiled-with-custom-model to start chatting

Docker Model Runner
How to use tuandunghcmut/BitNet-cpp-compiled-with-custom-model with Docker Model Runner:
```
docker model run hf.co/tuandunghcmut/BitNet-cpp-compiled-with-custom-model
```

Lemonade

How to use tuandunghcmut/BitNet-cpp-compiled-with-custom-model with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull tuandunghcmut/BitNet-cpp-compiled-with-custom-model

Run and chat with the model

lemonade run user.BitNet-cpp-compiled-with-custom-model-{{QUANT_TAG}}

List all available models

lemonade list

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Guide: Using a Custom Fine-Tuned Model with bitnet.cpp

This document outlines the process of downloading a custom fine-tuned model, converting it to the GGUF format, compiling the necessary C++ code, and running inference.

Prerequisites

Before you begin, ensure you have the following prerequisites installed and configured:

Python 3.9 or later
CMake 3.22 or later
A C++ compiler (e.g., clang, g++)
The Hugging Face Hub CLI (huggingface-cli)

Step 1: Download the Custom Model

In this guide, we will use the tuandunghcmut/BitNET-Summarization model as an example. This model was fine-tuned by tuandunghcmut for summarization tasks. We will download it and place it in a directory that the setup_env.py script can recognize.

huggingface-cli download tuandunghcmut/BitNET-Summarization --local-dir models/BitNet-b1.58-2B-4T

This command downloads the model and places it in the models/BitNet-b1.58-2B-4T directory. This is a workaround to make the existing scripts recognize the custom model.

Step 2: Convert the Model to GGUF Format

The downloaded model is in the .safetensors format. We need to convert it to the GGUF format to be used with bitnet.cpp. We will use the convert-helper-bitnet.py script for this.

However, the script needs some modifications to work with this custom model.

Modifications to the Conversion Scripts

utils/convert-helper-bitnet.py: Add the --skip-unknown flag to the cmd_convert list to ignore unknown tensor names.

cmd_convert = [
    sys.executable,
    str(convert_script),
    str(model_dir),
    "--vocab-type", "bpe",
    "--outtype", "f32",
    "--concurrency", "1",
    "--outfile", str(gguf_f32_output),
    "--skip-unknown"
]

utils/convert-hf-to-gguf-bitnet.py:
- Add the BitNetForCausalLM architecture to the @Model.register decorator for the BitnetModel class.
- Change the set_vocab method in the BitnetModel class to use _set_vocab_gpt2().
```
@Model.register("BitNetForCausalLM", "BitnetForCausalLM")
class BitnetModel(Model):
    model_arch = gguf.MODEL_ARCH.BITNET

    def set_vocab(self):
        self._set_vocab_gpt2()
```

Running the Conversion

After making these changes, run the conversion script:

python utils/convert-helper-bitnet.py models/BitNet-b1.58-2B-4T

This will create the ggml-model-i2s-bitnet.gguf file in the model directory.

Step 3: Compile bitnet.cpp

Now, we need to compile the C++ code. We will use the setup_env.py script for this. We will use the i2_s quantization type.

python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s

This command will compile the C++ code and create the necessary binaries.

Step 4: Run Inference

Finally, we can run inference with the converted model.

python run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "Hello"

This will load the model and generate a response to the prompt "Hello".

Build Environment

This project was built and compiled on a CPU-only machine with the following specifications:

CPU: AMD EPYC 9754 128-Core Processor
Memory: 251Gi

Fine-Tuning

The tuandunghcmut/BitNET-Summarization model was fine-tuned using a special Quantization-Aware Training (QAT) process. This was done with the support of the BitNet layer from the Hugging Face library.

Downloads last month: -

GGUF

Model size

2B params

Architecture

bitnet-b1.58

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support