Instructions to use prithivMLmods/Qwen3-VL-8B-Heretic-Stable with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use prithivMLmods/Qwen3-VL-8B-Heretic-Stable with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="prithivMLmods/Qwen3-VL-8B-Heretic-Stable") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("prithivMLmods/Qwen3-VL-8B-Heretic-Stable") model = AutoModelForImageTextToText.from_pretrained("prithivMLmods/Qwen3-VL-8B-Heretic-Stable") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use prithivMLmods/Qwen3-VL-8B-Heretic-Stable with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "prithivMLmods/Qwen3-VL-8B-Heretic-Stable" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/Qwen3-VL-8B-Heretic-Stable", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/prithivMLmods/Qwen3-VL-8B-Heretic-Stable
- SGLang
How to use prithivMLmods/Qwen3-VL-8B-Heretic-Stable with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "prithivMLmods/Qwen3-VL-8B-Heretic-Stable" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/Qwen3-VL-8B-Heretic-Stable", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "prithivMLmods/Qwen3-VL-8B-Heretic-Stable" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/Qwen3-VL-8B-Heretic-Stable", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use prithivMLmods/Qwen3-VL-8B-Heretic-Stable with Docker Model Runner:
docker model run hf.co/prithivMLmods/Qwen3-VL-8B-Heretic-Stable
Qwen3-VL-8B-Heretic-Stable
Qwen3-VL-8B-Heretic-Stable is a stability-focused abliterated evolution built on top of prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX, originally derived from Qwen/Qwen3-VL-8B-Instruct. This model applies advanced abliteration and refusal-suppression training strategies while emphasizing improved output consistency, multimodal reasoning stability, and reliable instruction adherence across complex visual and textual tasks.
[Base: Qwen/Qwen3-VL-8B-Instruct]
└───► [Intermediate: Qwen3-VL-8B-Instruct-Unredacted-MAX]
└───► [Current: Qwen3-VL-8B-Heretic-Stable]
This model is materialized for research and learning purposes only. The model has reduced internal refusal behaviors, and any content generated by it is used at the user’s own risk. The authors and hosting page disclaim any liability for content generated by this model. Users are responsible for ensuring that the model is used in a safe, ethical, and lawful manner.
Evaluation [Self Reported]
| Metric | Result |
|---|---|
| Refusal Rate (harm_bench) | 0 / 250 |
| Test Setup | 250 random harmful prompts |
| Inference Pipeline | Transformers |
| Inference Type | text-generation |
| Dataset | harm_bench |
Note: This model was tested on 250 randomly sampled harmful prompts based on the harm_bench dataset. The result shows 0 refusals out of 250. For more details, refer to the dataset page linked above.
Key Highlights
- Heretic Stable Training: Refined to reduce internal refusal behaviors while improving response stability and coherent long-form multimodal generation.
- 8B Multimodal Architecture: Based on Qwen3-VL-8B-Instruct, delivering strong vision-language understanding and detailed reasoning capabilities.
- Enhanced Visual Reasoning: Optimized for deep analysis of artistic, technical, forensic, abstract, and research-oriented visual content.
- High-Fidelity Captioning: Generates rich and descriptive captions suitable for metadata generation, accessibility pipelines, and dataset enrichment.
- Dynamic Resolution Handling: Maintains native Qwen3-VL support for multiple aspect ratios and high-resolution image processing.
- Stable Instruction Following: Tuned to preserve conversational coherence and reduce generation instability during extended reasoning tasks.
Quick Start with Transformers
pip install transformers==5.9.0
# or
pip install git+https://github.com/huggingface/transformers.git
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch
# Load the Heretic Stable model
model = Qwen3VLForConditionalGeneration.from_pretrained(
"prithivMLmods/Qwen3-VL-8B-Heretic-Stable",
torch_dtype="auto",
device_map="auto"
)
processor = AutoProcessor.from_pretrained(
"prithivMLmods/Qwen3-VL-8B-Heretic-Stable"
)
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{
"type": "text",
"text": "Provide a detailed caption and reasoning for this image."
},
],
}
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generated_ids = model.generate(
**inputs,
max_new_tokens=256
)
generated_ids_trimmed = [
out_ids[len(in_ids):]
for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
print(output_text)
Intended Use
- Advanced Multimodal Research: Exploring reasoning behavior and multimodal robustness across diverse prompts.
- Visual Dataset Enrichment: Producing detailed captions for historical, artistic, scientific, or technical datasets.
- Behavioral Alignment Research: Studying the effects of refusal-reduction and abliteration-based fine-tuning strategies.
- Creative Vision-Language Applications: Supporting storytelling, world-building, visual narration, and scene interpretation workflows.
Limitations & Risks
Important Notice: This model intentionally minimizes conventional refusal mechanisms.
- Sensitive Output Generation: The model may produce explicit, controversial, or unrestricted outputs depending on prompts.
- User Responsibility: Outputs should be used responsibly and in accordance with applicable legal and ethical standards.
- Large Hardware Requirements: High-resolution multimodal inference may require substantial GPU memory and compute resources.
Model Lineage
- Base Model: Qwen/Qwen3-VL-8B-Instruct
- Intermediate Variant: prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX
- Current Release: prithivMLmods/Qwen3-VL-8B-Heretic-Stable
Acknowledgements
I would like to thank the works of the following:
- Maxime Labonne — Uncensor any LLM with abliteration
- NVIDIA Transformer Engine Docs — Using FP8 and FP4 with Transformer Engine
- Remove Refusals with Transformers by Sumandora
- LLM Compressor by vLLM Project
- NVIDIA FP8 Introduction
- Downloads last month
- 21
Model tree for prithivMLmods/Qwen3-VL-8B-Heretic-Stable
Base model
Qwen/Qwen3-VL-8B-Instruct