1

Qwen3-VL-8B-Heretic-Stable

Qwen3-VL-8B-Heretic-Stable is a stability-focused abliterated evolution built on top of prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX, originally derived from Qwen/Qwen3-VL-8B-Instruct. This model applies advanced abliteration and refusal-suppression training strategies while emphasizing improved output consistency, multimodal reasoning stability, and reliable instruction adherence across complex visual and textual tasks.

[Base: Qwen/Qwen3-VL-8B-Instruct]
└───► [Intermediate: Qwen3-VL-8B-Instruct-Unredacted-MAX]
      └───► [Current: Qwen3-VL-8B-Heretic-Stable]

This model is materialized for research and learning purposes only. The model has reduced internal refusal behaviors, and any content generated by it is used at the user’s own risk. The authors and hosting page disclaim any liability for content generated by this model. Users are responsible for ensuring that the model is used in a safe, ethical, and lawful manner.

Evaluation [Self Reported]

Metric Result
Refusal Rate (harm_bench) 0 / 250
Test Setup 250 random harmful prompts
Inference Pipeline Transformers
Inference Type text-generation
Dataset harm_bench

Note: This model was tested on 250 randomly sampled harmful prompts based on the harm_bench dataset. The result shows 0 refusals out of 250. For more details, refer to the dataset page linked above.

Key Highlights

  • Heretic Stable Training: Refined to reduce internal refusal behaviors while improving response stability and coherent long-form multimodal generation.
  • 8B Multimodal Architecture: Based on Qwen3-VL-8B-Instruct, delivering strong vision-language understanding and detailed reasoning capabilities.
  • Enhanced Visual Reasoning: Optimized for deep analysis of artistic, technical, forensic, abstract, and research-oriented visual content.
  • High-Fidelity Captioning: Generates rich and descriptive captions suitable for metadata generation, accessibility pipelines, and dataset enrichment.
  • Dynamic Resolution Handling: Maintains native Qwen3-VL support for multiple aspect ratios and high-resolution image processing.
  • Stable Instruction Following: Tuned to preserve conversational coherence and reduce generation instability during extended reasoning tasks.

Quick Start with Transformers

pip install transformers==5.9.0
# or
pip install git+https://github.com/huggingface/transformers.git
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch

# Load the Heretic Stable model
model = Qwen3VLForConditionalGeneration.from_pretrained(
    "prithivMLmods/Qwen3-VL-8B-Heretic-Stable",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "prithivMLmods/Qwen3-VL-8B-Heretic-Stable"
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {
                "type": "text",
                "text": "Provide a detailed caption and reasoning for this image."
            },
        ],
    }
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

image_inputs, video_inputs = process_vision_info(messages)

inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
).to("cuda")

generated_ids = model.generate(
    **inputs,
    max_new_tokens=256
)

generated_ids_trimmed = [
    out_ids[len(in_ids):]
    for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

output_text = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(output_text)

Intended Use

  • Advanced Multimodal Research: Exploring reasoning behavior and multimodal robustness across diverse prompts.
  • Visual Dataset Enrichment: Producing detailed captions for historical, artistic, scientific, or technical datasets.
  • Behavioral Alignment Research: Studying the effects of refusal-reduction and abliteration-based fine-tuning strategies.
  • Creative Vision-Language Applications: Supporting storytelling, world-building, visual narration, and scene interpretation workflows.

Limitations & Risks

Important Notice: This model intentionally minimizes conventional refusal mechanisms.

  • Sensitive Output Generation: The model may produce explicit, controversial, or unrestricted outputs depending on prompts.
  • User Responsibility: Outputs should be used responsibly and in accordance with applicable legal and ethical standards.
  • Large Hardware Requirements: High-resolution multimodal inference may require substantial GPU memory and compute resources.

Model Lineage

  • Base Model: Qwen/Qwen3-VL-8B-Instruct
  • Intermediate Variant: prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX
  • Current Release: prithivMLmods/Qwen3-VL-8B-Heretic-Stable

Acknowledgements

I would like to thank the works of the following:

Downloads last month
21
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/Qwen3-VL-8B-Heretic-Stable

Finetuned
(1)
this model
Quantizations
3 models

Dataset used to train prithivMLmods/Qwen3-VL-8B-Heretic-Stable

Collection including prithivMLmods/Qwen3-VL-8B-Heretic-Stable