Instructions to use stepfun-ai/Step-3.5-Flash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use stepfun-ai/Step-3.5-Flash with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="stepfun-ai/Step-3.5-Flash", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("stepfun-ai/Step-3.5-Flash", trust_remote_code=True, dtype="auto")

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use stepfun-ai/Step-3.5-Flash with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "stepfun-ai/Step-3.5-Flash"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stepfun-ai/Step-3.5-Flash",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/stepfun-ai/Step-3.5-Flash

SGLang

How to use stepfun-ai/Step-3.5-Flash with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "stepfun-ai/Step-3.5-Flash" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stepfun-ai/Step-3.5-Flash",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "stepfun-ai/Step-3.5-Flash" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stepfun-ai/Step-3.5-Flash",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use stepfun-ai/Step-3.5-Flash with Docker Model Runner:
```
docker model run hf.co/stepfun-ai/Step-3.5-Flash
```

How to use PaCoRe?

by JJKiks - opened Feb 2

Discussion

JJKiks

Feb 2

•

edited Feb 2

Can you provide some information on how to use PaCoRe with the model? Thanks.

reign12

StepFun org Feb 2

•

edited Feb 2

Many thanks for your attention!
For more info on using Step 3.5 Flash with PaCoRe, you can refer to this repo first: https://github.com/stepfun-ai/PaCoRe
We will update a script in this PaCoRe repo on how to use the free open router Step 3.5 Flash with PaCoRe, and will cc you in this thread once we get it done:-)

reign12

StepFun org Feb 2

Please check about this doc for how to setup a PaCoRe server:-)
https://github.com/stepfun-ai/PaCoRe?tab=readme-ov-file#pacore-server-mode-recommended

JJKiks

Feb 2

Thanks.

nervous-sloth

Feb 23

That website appears to discuss how to use the 8B parameter model you trained with PaCoRe. Was Step 3.5 Flash also trained with PaCoRe? If one throws Step 3.5 Flash into a PaCoRe server it will benefit (unlike untrained models run with PaCoRe?

reign12

StepFun org Feb 25

Hi, this inference pipeline actually supports using Step 3.5 Flash OR version to run it!
https://github.com/stepfun-ai/PaCoRe?tab=readme-ov-file#-releases

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment