Instructions to use PygmalionAI/pygmalion-6b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PygmalionAI/pygmalion-6b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PygmalionAI/pygmalion-6b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("PygmalionAI/pygmalion-6b")
model = AutoModelForMultimodalLM.from_pretrained("PygmalionAI/pygmalion-6b")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use PygmalionAI/pygmalion-6b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PygmalionAI/pygmalion-6b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PygmalionAI/pygmalion-6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PygmalionAI/pygmalion-6b

SGLang

How to use PygmalionAI/pygmalion-6b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PygmalionAI/pygmalion-6b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PygmalionAI/pygmalion-6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PygmalionAI/pygmalion-6b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PygmalionAI/pygmalion-6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use PygmalionAI/pygmalion-6b with Docker Model Runner:
```
docker model run hf.co/PygmalionAI/pygmalion-6b
```

Model with only API

#24

by octopusta - opened Apr 4, 2023

Discussion

octopusta

Apr 4, 2023

•

edited Apr 4, 2023

dear all

we intent to use this model for a conversational chat with our users

is there any way with the simplest implementation to run this model with only API interface

thank you in advance

octopusta

Apr 6, 2023

any help please ??

we found ColossalAI with EnergonAI that can achieve the API interface

is there anyway to run this model with it ?

jini1114

Apr 12, 2023

what about try FastAPI?
i've set up chatbot api server using FastAPI.

octopusta

Apr 12, 2023

can you help please with steps or the main idea how to do it ?

do you mean to build the API inside the python file before make the text generation using the model ?

i thought this way but i think it's too pare hands way so i asked if there any ready to use package or component

thank you for sharing your idea <3 appreciate it and i will give it a try

jini1114

Apr 12, 2023

here is my code as a short version

~~~
device = 'cuda' if torch.cuda.is_available() else 'cpu'

app = FastAPI()

model_name = "PygmalionAI/pygmalion-6b"
gpt = transformers.AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
gpt.to(device)

@app .post('/completion/')
async def chat(data:Data, request: Request):
    prompt = tokenizer(data.prompt, return_tensors='pt')
    prompt = {key: value.to(device) for key, value in prompt.items()}
    out = gpt.generate(**prompt, min_length=128, max_length=256, do_sample=True)
    completion = tokenizer.decode(out[0][len(prompt["input_ids"][0]):])
    return completion

then you post the request to your server like below

url = "http://your.ser.ver.ip:port/completion/"
res = requests.post(url, data=json.dumps(data))
print(res.text)

you must match the format between post data and api data

octopusta

Apr 12, 2023

thank you very much <3

Mariano234

Feb 1, 2024

Please can you provide the full code for me and how i would implement this coming from a golang background. I appreciate the help

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment