Instructions to use wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO")
model = AutoModelForImageTextToText.from_pretrained("wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO

SGLang

How to use wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO with Docker Model Runner:
```
docker model run hf.co/wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO
```

Qwen3-VL-4B-Thinking-SafeGRPO

This repository contains a safety-aligned multimodal reasoning model fine-tuned from Qwen/Qwen3-VL-4B-Thinking using GRPO with the verl reinforcement learning framework.

The model is designed for research on post-training safety alignment of multimodal large language models, especially in scenarios involving image-text understanding, reasoning, and safe response generation.

Model Details

Base model: Qwen/Qwen3-VL-4B-Thinking
Fine-tuning method: GRPO
Training framework: verl
Rollout engine: vLLM
Model type: Vision-Language Model
Training objective: Multimodal safety alignment through reinforcement learning
License: Apache-2.0

Training Setup

The model was fine-tuned using Group Relative Policy Optimization, implemented with the verl framework.

The main training configuration is shown below:

#!/usr/bin/env bash

set -x

PROJECT_NAME=verl_grpo
EXPERIMENT_NAME=qwen3_vl_4b_thinking_safegrpo

ENGINE=${1:-vllm}

GPU_UTILIZATION=0.6

MODEL_PATH=Qwen/Qwen3-VL-4B-Thinking
TRAIN_FILES=./train_data/safetygrpo_train.parquet
VAL_FILES=./train_data/safetygrpo_test.parquet

TRAIN_BATCH_SIZE=256
MAX_PROMPT_LENGTH=2048
MAX_RESPONSE_LENGTH=4096
ROLLOUT_N=8
PPO_MINI_BATCH_SIZE=64
PPO_MICRO_BATCH_SIZE_PER_GPU=16
LOG_PROB_MICRO_BATCH_SIZE_PER_GPU=16
TENSOR_MODEL_PARALLEL_SIZE=1

SAVE_FREQ=3000
TEST_FREQ=10
TOTAL_EPOCHS=15

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$TRAIN_FILES \
    data.val_files=$VAL_FILES \
    data.train_batch_size=$TRAIN_BATCH_SIZE \
    data.max_prompt_length=$MAX_PROMPT_LENGTH \
    data.max_response_length=$MAX_RESPONSE_LENGTH \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.image_key=images \
    actor_rollout_ref.model.path=$MODEL_PATH \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.model.use_fused_kernels=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=$PPO_MINI_BATCH_SIZE \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=$PPO_MICRO_BATCH_SIZE_PER_GPU \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.01 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=$LOG_PROB_MICRO_BATCH_SIZE_PER_GPU \
    actor_rollout_ref.rollout.tensor_model_parallel_size=$TENSOR_MODEL_PARALLEL_SIZE \
    actor_rollout_ref.rollout.name=$ENGINE \
    +actor_rollout_ref.rollout.engine_kwargs.vllm.disable_mm_preprocessor_cache=True \
    actor_rollout_ref.rollout.gpu_memory_utilization=$GPU_UTILIZATION \
    actor_rollout_ref.rollout.enable_chunked_prefill=True \
    actor_rollout_ref.rollout.enforce_eager=False \
    actor_rollout_ref.rollout.free_cache_engine=True \
    actor_rollout_ref.rollout.n=$ROLLOUT_N \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=$LOG_PROB_MICRO_BATCH_SIZE_PER_GPU \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    reward_model.reward_manager=batch \
    custom_reward_function.path=./reward/safetygrpo_qwen3.py \
    custom_reward_function.name=compute_score_batch \
    trainer.critic_warmup=0 \
    trainer.logger=wandb \
    trainer.project_name=$PROJECT_NAME \
    trainer.experiment_name=$EXPERIMENT_NAME \
    trainer.n_gpus_per_node=4 \
    trainer.nnodes=1 \
    trainer.save_freq=$SAVE_FREQ \
    trainer.test_freq=$TEST_FREQ \
    trainer.total_epochs=$TOTAL_EPOCHS \
    trainer.default_local_dir=./checkpoints/$PROJECT_NAME/$EXPERIMENT_NAME $@

Downloads last month: 18

Safetensors

Model size

5B params

Tensor type

BF16

Model tree for wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO

Base model

Qwen/Qwen3-VL-4B-Thinking

Finetuned

(23)

this model

wgyhhh
/

Qwen3-VL-4B-Thinking-SafeGRPO

Qwen3-VL-4B-Thinking-SafeGRPO

Model Details

Training Setup

Model tree for wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO

Dataset used to train wgyhhh/Qwen3-VL-4B-Thinking-SafeGRPO