Instructions to use xtuner/llava-phi-3-mini-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use xtuner/llava-phi-3-mini-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="xtuner/llava-phi-3-mini-gguf",
	filename="llava-phi-3-mini-f16.gguf",
)

llm.create_chat_completion(
	messages = "\"cats.jpg\""
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use xtuner/llava-phi-3-mini-gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf xtuner/llava-phi-3-mini-gguf:F16
# Run inference directly in the terminal:
llama-cli -hf xtuner/llava-phi-3-mini-gguf:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf xtuner/llava-phi-3-mini-gguf:F16
# Run inference directly in the terminal:
llama-cli -hf xtuner/llava-phi-3-mini-gguf:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf xtuner/llava-phi-3-mini-gguf:F16
# Run inference directly in the terminal:
./llama-cli -hf xtuner/llava-phi-3-mini-gguf:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf xtuner/llava-phi-3-mini-gguf:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf xtuner/llava-phi-3-mini-gguf:F16

Use Docker

docker model run hf.co/xtuner/llava-phi-3-mini-gguf:F16

LM Studio
Jan
Ollama
How to use xtuner/llava-phi-3-mini-gguf with Ollama:
```
ollama run hf.co/xtuner/llava-phi-3-mini-gguf:F16
```

Unsloth Studio new

How to use xtuner/llava-phi-3-mini-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for xtuner/llava-phi-3-mini-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for xtuner/llava-phi-3-mini-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for xtuner/llava-phi-3-mini-gguf to start chatting

Docker Model Runner
How to use xtuner/llava-phi-3-mini-gguf with Docker Model Runner:
```
docker model run hf.co/xtuner/llava-phi-3-mini-gguf:F16
```

Lemonade

How to use xtuner/llava-phi-3-mini-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull xtuner/llava-phi-3-mini-gguf:F16

Run and chat with the model

lemonade run user.llava-phi-3-mini-gguf-F16

List all available models

lemonade list

Modl showing up as Llama instead of phi3 in LMstudio

by Tirin - opened Apr 25, 2024

Discussion

Tirin

Apr 25, 2024

Maybe its a user error on my part. but I wanted to just see if thats how it should be structured. Since when using the system prompt it does not like Phi3 or Llama.

LZHgrla

xtuner org Apr 26, 2024

Hi @Tirin
We manually converted the phi-3 weights to llama, for ease of model conversion.
The specific conversion process can be seen here

Additionally, I'd like to know if this conversion will affect the deployment in LM Studio. Is there a way to manually set chat template?

Joseph717171

Apr 26, 2024

•

edited Apr 26, 2024

@LZHgrla Yes there is a way to configure chat templates in LM Studio. 😁

LZHgrla

xtuner org Apr 26, 2024

@LZHgrla Yes there is configurable chat templates in LM Studio. 😁

Great!!

saishf

Apr 28, 2024

@LZHgrla Yes there is configurable chat templates in LM Studio. 😁

Great!!

Does this model use Phi-3 or Llava chat templates?

saishf

Apr 28, 2024

Additionally, I'd like to know if this conversion will affect the deployment in LM Studio. Is there a way to manually set chat template?

Also in LM Studio due to the naming of the gguf files the model shows up confusingly in the model selection dropdown

I usually name mine something like "llava-phi-3-mini-Q4_K_M.gguf"

LZHgrla

xtuner org Apr 28, 2024

Additionally, I'd like to know if this conversion will affect the deployment in LM Studio. Is there a way to manually set chat template?

Also in LM Studio due to the naming of the gguf files the model shows up confusingly in the model selection dropdown

I usually name mine something like "llava-phi-3-mini-Q4_K_M.gguf"

@saishf
Hi! Thanks for your advice.

I have modified the file names, and can you help me check if it's suitable?

https://huggingface.co/xtuner/llava-phi-3-mini-gguf/tree/main

LZHgrla

xtuner org Apr 28, 2024

@LZHgrla Yes there is configurable chat templates in LM Studio. 😁

Great!!

Does this model use Phi-3 or Llava chat templates?

Please use Phi-3 chat template

saishf

Apr 28, 2024

@saishf
Hi! Thanks for your advice.

I have modified the file names, and can you help me check if it's suitable?

https://huggingface.co/xtuner/llava-phi-3-mini-gguf/tree/main

Looks good!

LZHgrla

xtuner org Apr 28, 2024

Thanks! @saishf

saishf

Apr 28, 2024

@LZHgrla Yes there is configurable chat templates in LM Studio. 😁

Great!!

Does this model use Phi-3 or Llava chat templates?

Please use Phi-3 chat template

Phi 3 template stopped the end token popping up, Thanks 😸

Jeff32768

May 3, 2024

Came to say thanks. Had that issue, too.
I still have to load the model every time I want it to analyze an new image. If not it will talk about fantastic abstract art and pixels, and weirdly about wine bottles a lot of times (there are no bottles in my images). Works after I reload the model. Is that expected behavior? Or is it part of the config and I can correct it somehow?

cmp-nct

Jul 22, 2024

I've not verified issues about it yet but there are "quirks" inside of the llama.cpp engine which look for strings in the model name, one of those strings would be "phi3".
That changes pretokenization to be phi3 compatible.
I'd assume using llama as model name will cause tokenization errors (handling of newlines, stripping before special tokens, etc)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment