Instructions to use jphme/phi-1_5_Wizard_Vicuna_uncensored with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jphme/phi-1_5_Wizard_Vicuna_uncensored with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="jphme/phi-1_5_Wizard_Vicuna_uncensored", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("jphme/phi-1_5_Wizard_Vicuna_uncensored", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use jphme/phi-1_5_Wizard_Vicuna_uncensored with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jphme/phi-1_5_Wizard_Vicuna_uncensored" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jphme/phi-1_5_Wizard_Vicuna_uncensored", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/jphme/phi-1_5_Wizard_Vicuna_uncensored
- SGLang
How to use jphme/phi-1_5_Wizard_Vicuna_uncensored with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jphme/phi-1_5_Wizard_Vicuna_uncensored" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jphme/phi-1_5_Wizard_Vicuna_uncensored", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jphme/phi-1_5_Wizard_Vicuna_uncensored" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jphme/phi-1_5_Wizard_Vicuna_uncensored", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use jphme/phi-1_5_Wizard_Vicuna_uncensored with Docker Model Runner:
docker model run hf.co/jphme/phi-1_5_Wizard_Vicuna_uncensored
Use Docker
docker model run hf.co/jphme/phi-1_5_Wizard_Vicuna_uncensoredPhi 1.5 Wizard Vicuna Experimental
Experimental Finetune on Microsoft's Phi 1.5. This is highly experimental, only trained on a subset of the 70k Wizard Vicuna dataset and not meant for production use.
This model also runs reasonably fast on CPU!
Will update with later checkpoints later.
Prompt Format
ShareGPT / Vicuna (without newlines):
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: This is a question? ASSISTANT: Here is my answer"
Code Example
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("jphme/phi-1_5_wizzard_vicuna_uncensored", trust_remote_code=True, torch_dtype=torch.bfloat16).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("jphme/phi-1_5_wizzard_vicuna_uncensored", trust_remote_code=True, torch_dtype=torch.bfloat16)
prompt_template=("A chat between a curious user and an artificial intelligence assistant. "
"The assistant gives helpful, detailed, and polite answers to the user's questions. "
"USER: {prompt} ASSISTANT:")
inputs = tokenizer(
prompt_template.format(prompt="What is 1+1?"),
return_tensors="pt", return_attention_mask=False).to('cuda')
outputs = model.generate(
**inputs, max_length=200,
do_sample=True,
temperature=0.5,
top_p=0.95,
use_cache=True,
repetition_penalty=1.1,
#can try to uncomment this if model doesn't end
#eos_token_id=tokenizer.eos_token_id
)
text = tokenizer.batch_decode(outputs,skip_special_tokens=True)[0]
print(text)
The model doesn't stop by itself (this could be an issue due to eos/bos tokens being identical), so you can try to explicitly set the eos_token_id (see above) or use a custom stopping criteria.
You can remove .to("cuda") to let the model run without a GPU!
Examples
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Who was president of the USA in 1995? ASSISTANT: The president of the United States at that time was Bill Clinton (...)
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Write a python function that returns the fibonacci sequence ASSISTANT:The Fibonacci sequence is a series of numbers where each number after the first two are obtained by adding the two preceding ones together (starting from 0). So the sequence goes 0, 1, 1, 2, 3, 5, 8, 13, 21... Here's a Python function that generates the nth term in the Fibonacci sequence:
def fibonacci(n):
if n <= 1:
return n
else:
f = [0,1]
for i in range(2, n+1):
f.append(f[-1]+f[-2])
return f[-1]
This function takes as input
n, which represents the (...) (this function works with corrected intendation)
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: What is (500+12)/4? ASSISTANT:The answer to this expression is 7.5. (...) Some potential here ;-)...
Thanks
Thanks to Microsoft for releaseing Phi 1.5, Eric Hartford for releasing the Wizard Vicuna dataset, Winglian for his great work on Axolotl that I use for finetuning and Teknium for some Phi finetuning discussion.
License
The original licenses of the dataset and model applies. No warranty whatsoever, this model is only intended for research purposes.
- Downloads last month
- 21
Install from pip and serve model
# Install vLLM from pip: pip install vllm# Start the vLLM server: vllm serve "jphme/phi-1_5_Wizard_Vicuna_uncensored"# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jphme/phi-1_5_Wizard_Vicuna_uncensored", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'