Instructions to use deepseek-ai/DeepSeek-V4-Flash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use deepseek-ai/DeepSeek-V4-Flash with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-V4-Flash")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Flash")
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-V4-Flash")

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use deepseek-ai/DeepSeek-V4-Flash with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "deepseek-ai/DeepSeek-V4-Flash"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-V4-Flash",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/deepseek-ai/DeepSeek-V4-Flash

SGLang

How to use deepseek-ai/DeepSeek-V4-Flash with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "deepseek-ai/DeepSeek-V4-Flash" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-V4-Flash",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "deepseek-ai/DeepSeek-V4-Flash" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-V4-Flash",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use deepseek-ai/DeepSeek-V4-Flash with Docker Model Runner:
```
docker model run hf.co/deepseek-ai/DeepSeek-V4-Flash
```

Is 158B or 284b params ?

#17

by celsowm - opened 27 days ago

Discussion

celsowm

27 days ago

its confusing because on this model card is 158B size but on README is 284b

Anaya3D

27 days ago

have no idea, maybe the news fp8 tensors types cheats de hugginface count.

richardhundt

27 days ago

It's 284B. The huggingface count gets confused with compression

floratopia

26 days ago

Its an 158b-sized 284b model with fp8/fp4-fused weights.

fsaudm

26 days ago

•

edited 26 days ago

Its an 158b-sized 284b model with fp8/fp4-fused weights.

what does this even mean?? genuine question, "158b-sized 284b model"?

DUOWEN

26 days ago

•

edited 26 days ago

what does this even mean?? genuine question, "158b-sized 284b model"?

This is a natively quantized model with about 158GB weight files which is the same as a standard 158b model in fp8 precision

floratopia

26 days ago

Its an 158b-sized 284b model with fp8/fp4-fused weights.

what does this even mean?? genuine question, "158b-sized 284b model"?

Theoretically, flash is indeed a 284b parameter model.
However, due to the excellent mixing and quantization design, thanks to the extensive use of the latest fp4 floating point precision, the file size of flash is only as large as a 158b model, so the HF system mistakenly thinks that it only has 158b parameters.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment