Instructions to use prithivMLmods/Qwen3-VL-8B-Heretic-Stable with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/Qwen3-VL-8B-Heretic-Stable with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="prithivMLmods/Qwen3-VL-8B-Heretic-Stable")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("prithivMLmods/Qwen3-VL-8B-Heretic-Stable")
model = AutoModelForImageTextToText.from_pretrained("prithivMLmods/Qwen3-VL-8B-Heretic-Stable")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use prithivMLmods/Qwen3-VL-8B-Heretic-Stable with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/Qwen3-VL-8B-Heretic-Stable"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Qwen3-VL-8B-Heretic-Stable",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/Qwen3-VL-8B-Heretic-Stable

SGLang

How to use prithivMLmods/Qwen3-VL-8B-Heretic-Stable with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/Qwen3-VL-8B-Heretic-Stable" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Qwen3-VL-8B-Heretic-Stable",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/Qwen3-VL-8B-Heretic-Stable" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Qwen3-VL-8B-Heretic-Stable",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/Qwen3-VL-8B-Heretic-Stable with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/Qwen3-VL-8B-Heretic-Stable
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Qwen3-VL-8B-Heretic-Stable

Qwen3-VL-8B-Heretic-Stable is a stability-focused abliterated evolution built on top of prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX, originally derived from Qwen/Qwen3-VL-8B-Instruct. This model applies advanced abliteration and refusal-suppression training strategies while emphasizing improved output consistency, multimodal reasoning stability, and reliable instruction adherence across complex visual and textual tasks.

[Base: Qwen/Qwen3-VL-8B-Instruct]
└───► [Intermediate: Qwen3-VL-8B-Instruct-Unredacted-MAX]
      └───► [Current: Qwen3-VL-8B-Heretic-Stable]

This model is materialized for research and learning purposes only. The model has reduced internal refusal behaviors, and any content generated by it is used at the user’s own risk. The authors and hosting page disclaim any liability for content generated by this model. Users are responsible for ensuring that the model is used in a safe, ethical, and lawful manner.

Evaluation [Self Reported]

Metric	Result
Refusal Rate (harm_bench)	0 / 250
Test Setup	250 random harmful prompts
Inference Pipeline	Transformers
Inference Type	text-generation
Dataset	harm_bench

Note: This model was tested on 250 randomly sampled harmful prompts based on the harm_bench dataset. The result shows 0 refusals out of 250. For more details, refer to the dataset page linked above.

Key Highlights

Heretic Stable Training: Refined to reduce internal refusal behaviors while improving response stability and coherent long-form multimodal generation.
8B Multimodal Architecture: Based on Qwen3-VL-8B-Instruct, delivering strong vision-language understanding and detailed reasoning capabilities.
Enhanced Visual Reasoning: Optimized for deep analysis of artistic, technical, forensic, abstract, and research-oriented visual content.
High-Fidelity Captioning: Generates rich and descriptive captions suitable for metadata generation, accessibility pipelines, and dataset enrichment.
Dynamic Resolution Handling: Maintains native Qwen3-VL support for multiple aspect ratios and high-resolution image processing.
Stable Instruction Following: Tuned to preserve conversational coherence and reduce generation instability during extended reasoning tasks.

Quick Start with Transformers

pip install transformers==5.9.0
# or
pip install git+https://github.com/huggingface/transformers.git

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch

# Load the Heretic Stable model
model = Qwen3VLForConditionalGeneration.from_pretrained(
    "prithivMLmods/Qwen3-VL-8B-Heretic-Stable",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "prithivMLmods/Qwen3-VL-8B-Heretic-Stable"
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {
                "type": "text",
                "text": "Provide a detailed caption and reasoning for this image."
            },
        ],
    }
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

image_inputs, video_inputs = process_vision_info(messages)

inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
).to("cuda")

generated_ids = model.generate(
    **inputs,
    max_new_tokens=256
)

generated_ids_trimmed = [
    out_ids[len(in_ids):]
    for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

output_text = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(output_text)

Intended Use

Advanced Multimodal Research: Exploring reasoning behavior and multimodal robustness across diverse prompts.
Visual Dataset Enrichment: Producing detailed captions for historical, artistic, scientific, or technical datasets.
Behavioral Alignment Research: Studying the effects of refusal-reduction and abliteration-based fine-tuning strategies.
Creative Vision-Language Applications: Supporting storytelling, world-building, visual narration, and scene interpretation workflows.

Limitations & Risks

Important Notice: This model intentionally minimizes conventional refusal mechanisms.

Sensitive Output Generation: The model may produce explicit, controversial, or unrestricted outputs depending on prompts.
User Responsibility: Outputs should be used responsibly and in accordance with applicable legal and ethical standards.
Large Hardware Requirements: High-resolution multimodal inference may require substantial GPU memory and compute resources.