Instructions to use Deci/DeciLM-7B-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Deci/DeciLM-7B-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Deci/DeciLM-7B-instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Deci/DeciLM-7B-instruct", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Deci/DeciLM-7B-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Deci/DeciLM-7B-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Deci/DeciLM-7B-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Deci/DeciLM-7B-instruct

SGLang

How to use Deci/DeciLM-7B-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Deci/DeciLM-7B-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Deci/DeciLM-7B-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Deci/DeciLM-7B-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Deci/DeciLM-7B-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Deci/DeciLM-7B-instruct with Docker Model Runner:
```
docker model run hf.co/Deci/DeciLM-7B-instruct
```

how to save the model as a pytorch or tensorflow model

by Rick-29 - opened Dec 27, 2023

Discussion

Rick-29

Dec 27, 2023

Is there a wat to save the model as a pytorch model instance instead of loading it every time with the transformers module? I have tried with wrapping the code inside a class that inherits torch.nn.Module but when I try to save the model (all the model not only the state dict) it throws an error.
Thanks

NajeebDeci

Dec 31, 2023

Hi Rick,

Please provide more details on what you need exactly and code snippets in order for us to help.
Are you having issues with .save_pretrained ?

Thanks

Rick-29

Dec 31, 2023

Hi Najeeb,

No, I ment saving completly the model architecture and weigths as a .pth file, I tried using a model wrapper using torch.nn.Module, something like this:

import torch
from torch import nn

class Wrapper(nn.Module):
    def __init__(self, model, tokenizer):
        # The model and tokenizer are loaded exaclty like in the `DeciLM-7B-Instruct.ipynb` colab notebook 
        super().__init__()
        self.model = model
        self.tokenizer = tokenizer
    
    def forward(self, x):
        inputs = self.tokenizer(SYSTEM_PROMPT_TEMPLATE.format(instruction=x), return_tensors="pt")
        if torch.cuda.is_available():  # Ensure input tensors are on the GPU if model is on GPU
            inputs = inputs.to('cuda')
        output = self.model.generate(**inputs,
                                max_new_tokens=3000,
                                num_beams=5,
                                no_repeat_ngram_size=4,
                                early_stopping=True
                                )
        return self.tokenizer.decode(output[0], skip_special_tokens=True)

wrapper = Wrapper(model, tokenizer)
model = torch.jit.script(wrapper)
torch.jit.save(model, "model.pth")

But it doesn't work.

What should I do?

Thanks

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment