Instructions to use Deci/DeciLM-7B-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Deci/DeciLM-7B-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Deci/DeciLM-7B-instruct", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Deci/DeciLM-7B-instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Deci/DeciLM-7B-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Deci/DeciLM-7B-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Deci/DeciLM-7B-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Deci/DeciLM-7B-instruct
- SGLang
How to use Deci/DeciLM-7B-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Deci/DeciLM-7B-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Deci/DeciLM-7B-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Deci/DeciLM-7B-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Deci/DeciLM-7B-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Deci/DeciLM-7B-instruct with Docker Model Runner:
docker model run hf.co/Deci/DeciLM-7B-instruct
how to save the model as a pytorch or tensorflow model
Is there a wat to save the model as a pytorch model instance instead of loading it every time with the transformers module? I have tried with wrapping the code inside a class that inherits torch.nn.Module but when I try to save the model (all the model not only the state dict) it throws an error.
Thanks
Hi Rick,
Please provide more details on what you need exactly and code snippets in order for us to help.
Are you having issues with .save_pretrained ?
Thanks
Hi Najeeb,
No, I ment saving completly the model architecture and weigths as a .pth file, I tried using a model wrapper using torch.nn.Module, something like this:
import torch
from torch import nn
class Wrapper(nn.Module):
def __init__(self, model, tokenizer):
# The model and tokenizer are loaded exaclty like in the `DeciLM-7B-Instruct.ipynb` colab notebook
super().__init__()
self.model = model
self.tokenizer = tokenizer
def forward(self, x):
inputs = self.tokenizer(SYSTEM_PROMPT_TEMPLATE.format(instruction=x), return_tensors="pt")
if torch.cuda.is_available(): # Ensure input tensors are on the GPU if model is on GPU
inputs = inputs.to('cuda')
output = self.model.generate(**inputs,
max_new_tokens=3000,
num_beams=5,
no_repeat_ngram_size=4,
early_stopping=True
)
return self.tokenizer.decode(output[0], skip_special_tokens=True)
wrapper = Wrapper(model, tokenizer)
model = torch.jit.script(wrapper)
torch.jit.save(model, "model.pth")
But it doesn't work.
What should I do?
Thanks