Instructions to use dphn/dolphin-2.1-mistral-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dphn/dolphin-2.1-mistral-7b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dphn/dolphin-2.1-mistral-7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("dphn/dolphin-2.1-mistral-7b")
model = AutoModelForCausalLM.from_pretrained("dphn/dolphin-2.1-mistral-7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use dphn/dolphin-2.1-mistral-7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dphn/dolphin-2.1-mistral-7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dphn/dolphin-2.1-mistral-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dphn/dolphin-2.1-mistral-7b

SGLang

How to use dphn/dolphin-2.1-mistral-7b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dphn/dolphin-2.1-mistral-7b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dphn/dolphin-2.1-mistral-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dphn/dolphin-2.1-mistral-7b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dphn/dolphin-2.1-mistral-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use dphn/dolphin-2.1-mistral-7b with Docker Model Runner:
```
docker model run hf.co/dphn/dolphin-2.1-mistral-7b
```

How do I try this out?

by henke443 - opened Oct 15, 2023

Discussion

henke443

Oct 15, 2023

•

edited Oct 15, 2023

I tried to deploy it using gradle but it is infinitely loading and doesn't seem to work, neither does the other gradle endpoints that other people have made.

I want to host it in an (huggingface) inference api preferably, which I managed to get working for other models but I get an error when trying to run this.

I think this is the most relevant part of the error:

tokenizer = LlamaTokenizerFast.from_pretrained(\n\n File "/opt/conda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained\n return cls._from_pretrained(\n\n File "/opt/conda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1886, in _from_pretrained\n slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(\n\n File "/opt/conda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2073, in _from_pretrained\n raise ValueError(\n\nValueError: Non-consecutive added token '' found. Should have index 32000 but has index 0 in saved vocabulary.\n"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
2023/10/15 12:24:56 ~ Error: ShardCannotStart

henke443

Oct 15, 2023

•

edited Oct 15, 2023

It said "Non-consecutive added token ' < u n k > ' found" but it seems like html escaping removed it.

ehartford

Dolphin org Oct 15, 2023

I don't know
It works in oobabooga for me
@TheBloke do you recognize that error message?

markpreemo

Oct 17, 2023

@henke443 This should give you some guidance https://github.com/huggingface/text-generation-inference/issues/1132

samulrich1

Oct 26, 2023

Remove these lines from added_tokens.json

  "</s>": 2,
  "<s>": 1,
  "<unk>": 0,

The link above says to delete the file but it is important for the chatml format

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment