Add SGLang serving instructions

#5
by MickJ - opened
Files changed (1) hide show
  1. README.md +48 -45
README.md CHANGED
@@ -9,54 +9,11 @@ tags:
9
  - cosmos
10
  - cosmos3
11
  - vllm-omni
 
 
12
  - diffusers
13
  - image-to-video
14
  - video-generation
15
- countDownloads:
16
- - checkpoint.json
17
- - config.json
18
- - generation_config.json
19
- - model.safetensors.index.json
20
- - model_index.json
21
- - tokenizer.json
22
- - tokenizer_config.json
23
- - sound_tokenizer/config.json
24
- - sound_tokenizer/diffusion_pytorch_model.safetensors
25
- - text_tokenizer/tokenizer.json
26
- - text_tokenizer/tokenizer_config.json
27
- - transformer/config.json
28
- - transformer/diffusion_pytorch_model-00001-of-00027.safetensors
29
- - transformer/diffusion_pytorch_model-00002-of-00027.safetensors
30
- - transformer/diffusion_pytorch_model-00003-of-00027.safetensors
31
- - transformer/diffusion_pytorch_model-00004-of-00027.safetensors
32
- - transformer/diffusion_pytorch_model-00005-of-00027.safetensors
33
- - transformer/diffusion_pytorch_model-00006-of-00027.safetensors
34
- - transformer/diffusion_pytorch_model-00007-of-00027.safetensors
35
- - transformer/diffusion_pytorch_model-00008-of-00027.safetensors
36
- - transformer/diffusion_pytorch_model-00009-of-00027.safetensors
37
- - transformer/diffusion_pytorch_model-00010-of-00027.safetensors
38
- - transformer/diffusion_pytorch_model-00011-of-00027.safetensors
39
- - transformer/diffusion_pytorch_model-00012-of-00027.safetensors
40
- - transformer/diffusion_pytorch_model-00013-of-00027.safetensors
41
- - transformer/diffusion_pytorch_model-00014-of-00027.safetensors
42
- - transformer/diffusion_pytorch_model-00015-of-00027.safetensors
43
- - transformer/diffusion_pytorch_model-00016-of-00027.safetensors
44
- - transformer/diffusion_pytorch_model-00017-of-00027.safetensors
45
- - transformer/diffusion_pytorch_model-00018-of-00027.safetensors
46
- - transformer/diffusion_pytorch_model-00019-of-00027.safetensors
47
- - transformer/diffusion_pytorch_model-00020-of-00027.safetensors
48
- - transformer/diffusion_pytorch_model-00021-of-00027.safetensors
49
- - transformer/diffusion_pytorch_model-00022-of-00027.safetensors
50
- - transformer/diffusion_pytorch_model-00023-of-00027.safetensors
51
- - transformer/diffusion_pytorch_model-00024-of-00027.safetensors
52
- - transformer/diffusion_pytorch_model-00025-of-00027.safetensors
53
- - transformer/diffusion_pytorch_model-00026-of-00027.safetensors
54
- - transformer/diffusion_pytorch_model-00027-of-00027.safetensors
55
- - transformer/diffusion_pytorch_model.safetensors.index.json
56
- - vae/config.json
57
- - vae/diffusion_pytorch_model.safetensors
58
- - vision_encoder/config.json
59
- - vision_encoder/model.safetensors
60
  ---
61
 
62
  # **Cosmos 3: Omnimodal World Models for Physical AI**
@@ -457,6 +414,52 @@ python scripts/upsample_prompt.py \
457
  --output-path scripts/upsampled.json
458
  ```
459
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
460
  ### Diffusers
461
 
462
  Cosmos3 is fully supported within the popular HuggingFace Diffusers package. This integration makes it a supported inference backend, allowing developers to easily incorporate Cosmos3's capabilities - such as text-to-image generation - into their pipelines using the Cosmos3OmniPipeline class, as demonstrated by the provided code examples (see examples for other modalities on the HuggingFace Cosmos3 page).
 
9
  - cosmos
10
  - cosmos3
11
  - vllm-omni
12
+ - sglang
13
+ - sglang-diffusion
14
  - diffusers
15
  - image-to-video
16
  - video-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ---
18
 
19
  # **Cosmos 3: Omnimodal World Models for Physical AI**
 
414
  --output-path scripts/upsampled.json
415
  ```
416
 
417
+ ### SGLang
418
+
419
+ SGLang-Diffusion can serve `nvidia/Cosmos3-Super-Image2Video` through the OpenAI-compatible async video endpoint. Install SGLang from the main branch with diffusion dependencies, then start the server:
420
+
421
+ ```bash
422
+ git clone --branch main https://github.com/sgl-project/sglang.git
423
+ cd sglang
424
+ pip install -e "python[diffusion]"
425
+ pip install "cosmos-guardrail==0.3.1"
426
+
427
+ sglang serve \
428
+ --model-path nvidia/Cosmos3-Super-Image2Video \
429
+ --num-gpus 4
430
+ ```
431
+
432
+ Cosmos 3 support in SGLang Diffusion currently requires the SGLang main branch. Switch to a stable SGLang release once Cosmos 3 support is included there.
433
+
434
+ Example image-to-video request:
435
+
436
+ ```bash
437
+ job_id=$(curl -sS -X POST http://localhost:30000/v1/videos \
438
+ --form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
439
+ --form-string "negative_prompt=blurry, distorted, low quality" \
440
+ --form-string "size=1280x720" \
441
+ --form-string "num_frames=81" \
442
+ --form-string "fps=24" \
443
+ --form-string "num_inference_steps=35" \
444
+ --form-string "guidance_scale=4.0" \
445
+ --form-string "flow_shift=10.0" \
446
+ --form-string "seed=42" \
447
+ --form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \
448
+ -F "input_reference=@input.png" \
449
+ | python -c 'import json, sys; print(json.load(sys.stdin)["id"])')
450
+
451
+ while true; do
452
+ status=$(curl -sS "http://localhost:30000/v1/videos/${job_id}" \
453
+ | python -c 'import json, sys; print(json.load(sys.stdin)["status"])')
454
+ [ "$status" = "completed" ] && break
455
+ [ "$status" = "failed" ] && exit 1
456
+ sleep 1
457
+ done
458
+
459
+ curl -sS -L "http://localhost:30000/v1/videos/${job_id}/content" \
460
+ -o cosmos3_super_i2v_output.mp4
461
+ ```
462
+
463
  ### Diffusers
464
 
465
  Cosmos3 is fully supported within the popular HuggingFace Diffusers package. This integration makes it a supported inference backend, allowing developers to easily incorporate Cosmos3's capabilities - such as text-to-image generation - into their pipelines using the Cosmos3OmniPipeline class, as demonstrated by the provided code examples (see examples for other modalities on the HuggingFace Cosmos3 page).