| # Guide: Using a Custom Fine-Tuned Model with bitnet.cpp |
|
|
| This document outlines the process of downloading a custom fine-tuned model, converting it to the GGUF format, compiling the necessary C++ code, and running inference. |
|
|
| ## Prerequisites |
|
|
| Before you begin, ensure you have the following prerequisites installed and configured: |
|
|
| - Python 3.9 or later |
| - CMake 3.22 or later |
| - A C++ compiler (e.g., clang, g++) |
| - The Hugging Face Hub CLI (`huggingface-cli`) |
|
|
| ## Step 1: Download the Custom Model |
|
|
| In this guide, we will use the `tuandunghcmut/BitNET-Summarization` model as an example. This model was fine-tuned by `tuandunghcmut` for summarization tasks. We will download it and place it in a directory that the `setup_env.py` script can recognize. |
|
|
| ```bash |
| huggingface-cli download tuandunghcmut/BitNET-Summarization --local-dir models/BitNet-b1.58-2B-4T |
| ``` |
|
|
| This command downloads the model and places it in the `models/BitNet-b1.58-2B-4T` directory. This is a workaround to make the existing scripts recognize the custom model. |
|
|
| ## Step 2: Convert the Model to GGUF Format |
|
|
| The downloaded model is in the `.safetensors` format. We need to convert it to the GGUF format to be used with `bitnet.cpp`. We will use the `convert-helper-bitnet.py` script for this. |
|
|
| However, the script needs some modifications to work with this custom model. |
|
|
| ### Modifications to the Conversion Scripts |
|
|
| 1. **`utils/convert-helper-bitnet.py`**: Add the `--skip-unknown` flag to the `cmd_convert` list to ignore unknown tensor names. |
|
|
| ```python |
| cmd_convert = [ |
| sys.executable, |
| str(convert_script), |
| str(model_dir), |
| "--vocab-type", "bpe", |
| "--outtype", "f32", |
| "--concurrency", "1", |
| "--outfile", str(gguf_f32_output), |
| "--skip-unknown" |
| ] |
| ``` |
| |
| 2. **`utils/convert-hf-to-gguf-bitnet.py`**: |
| - Add the `BitNetForCausalLM` architecture to the `@Model.register` decorator for the `BitnetModel` class. |
| - Change the `set_vocab` method in the `BitnetModel` class to use `_set_vocab_gpt2()`. |
|
|
| ```python |
| @Model.register("BitNetForCausalLM", "BitnetForCausalLM") |
| class BitnetModel(Model): |
| model_arch = gguf.MODEL_ARCH.BITNET |
| |
| def set_vocab(self): |
| self._set_vocab_gpt2() |
| ``` |
| |
| ### Running the Conversion |
|
|
| After making these changes, run the conversion script: |
|
|
| ```bash |
| python utils/convert-helper-bitnet.py models/BitNet-b1.58-2B-4T |
| ``` |
|
|
| This will create the `ggml-model-i2s-bitnet.gguf` file in the model directory. |
|
|
| ## Step 3: Compile bitnet.cpp |
|
|
| Now, we need to compile the C++ code. We will use the `setup_env.py` script for this. We will use the `i2_s` quantization type. |
|
|
| ```bash |
| python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s |
| ``` |
|
|
| This command will compile the C++ code and create the necessary binaries. |
|
|
| ## Step 4: Run Inference |
|
|
| Finally, we can run inference with the converted model. |
|
|
| ```bash |
| python run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "Hello" |
| ``` |
|
|
| This will load the model and generate a response to the prompt "Hello". |
|
|
| ## Build Environment |
|
|
| This project was built and compiled on a CPU-only machine with the following specifications: |
|
|
| - **CPU:** AMD EPYC 9754 128-Core Processor |
| - **Memory:** 251Gi |
|
|
| ## Fine-Tuning |
|
|
| The `tuandunghcmut/BitNET-Summarization` model was fine-tuned using a special Quantization-Aware Training (QAT) process. This was done with the support of the BitNet layer from the Hugging Face library. |