Instructions to use PygmalionAI/pygmalion-6b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PygmalionAI/pygmalion-6b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PygmalionAI/pygmalion-6b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("PygmalionAI/pygmalion-6b")
model = AutoModelForCausalLM.from_pretrained("PygmalionAI/pygmalion-6b")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use PygmalionAI/pygmalion-6b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PygmalionAI/pygmalion-6b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PygmalionAI/pygmalion-6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PygmalionAI/pygmalion-6b

SGLang

How to use PygmalionAI/pygmalion-6b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PygmalionAI/pygmalion-6b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PygmalionAI/pygmalion-6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PygmalionAI/pygmalion-6b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PygmalionAI/pygmalion-6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use PygmalionAI/pygmalion-6b with Docker Model Runner:
```
docker model run hf.co/PygmalionAI/pygmalion-6b
```

The Dataset

#32

by bibiicekill - opened May 8, 2023

Discussion

bibiicekill

May 8, 2023

Hello PygmalionAI Team,
Amazing work! I tried Pygmalion6B (GPT-J), already pretty good, but even better fine-tuned on a specific domain!

I have the 2 following questions:

is it possible to have the data your trained Pygmalion6B (GPT-J) to try on other models myself?
-Do you plan to train MPT-7B with this data?

11b

Pygmalion org May 8, 2023

For the data, sure! Shoot me a message on Discord - I'm 0x000011b#4223 there. As for MPT-7B, I don't have any plans to train it at the moment. I'm not a fan of the current 7B (XOR + license + model itself isn't as good as I'd like), so I'm keeping an eye on all the new foundational models that are coming out, but my current thoughts are:

MPT-7B looks strong performance-wise, but the fact that it's a custom architecture full of NotImplementedErrors when training doesn't inspire confidence for me to use it just yet.
RedPajama's 7B looks great! However, for whatever reason, LLaMA is about 40% faster than NeoX (the architecture that RedPajama used), so this is also not 100% ideal.
OpenLLaMA seems to be the most promising: will use the normal LLaMA architecture (so won't fall victim to the two pitfalls above), plus they're training on the same data as RedPajama, so once done, they should all be competitive when it comes to model quality. However, since it's not done training yet I'd rather not rush anything since the current checkpoints underperform LLaMA quite strongly in some tasks.