Qwen2.5 Coder 7B SecureCode

Parameters Dataset OWASP Method

Security-specialized code model fine-tuned on the SecureCode dataset

Dataset | Paper (arXiv:2512.18542) | Model Collection | perfecXion.ai


What This Model Does

This model generates secure code when developers ask about building features. Instead of producing vulnerable implementations (like 45% of AI-generated code does), it:

  • Identifies the security risks in common coding patterns
  • Provides vulnerable and secure implementations side by side
  • Explains how attackers would exploit the vulnerability
  • Includes defense-in-depth guidance: logging, monitoring, SIEM integration, infrastructure hardening

The model was fine-tuned on 2,185 security training examples covering both traditional web security (OWASP Top 10 2021) and AI/ML security (OWASP LLM Top 10 2025).

Model Details

Base Model Qwen2.5 Coder 7B Instruct
Parameters 7B
Architecture Qwen2
Tier Tier 2: Mid-size Code Specialist
Method QLoRA (4-bit NormalFloat quantization)
LoRA Rank 16 (alpha=32)
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj (7 modules)
Training Data scthornton/securecode (2,185 examples)
Hardware NVIDIA A100 40GB

Purpose-built code model with strong multi-language support. Excellent balance of capability and efficiency.

Quick Start

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

# Load with 4-bit quantization (matches training)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-7B-Instruct",
    quantization_config=bnb_config,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("scthornton/qwen2.5-coder-7b-securecode")
model = PeftModel.from_pretrained(base_model, "scthornton/qwen2.5-coder-7b-securecode")

# Ask a security-relevant coding question
messages = [
    {"role": "user", "content": "How do I implement JWT authentication with refresh tokens in Python?"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=2048, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Dataset

Trained on the full SecureCode unified dataset:

  • 2,185 total examples (1,435 web security + 750 AI/ML security)
  • 20 vulnerability categories across OWASP Top 10 2021 and OWASP LLM Top 10 2025
  • 12+ programming languages and 49+ frameworks
  • 4-turn conversational structure: feature request, vulnerable/secure implementations, advanced probing, operational guidance
  • 100% incident grounding: every example tied to real CVEs, vendor advisories, or published attack research

Hyperparameters

Parameter Value
LoRA rank 16
LoRA alpha 32
LoRA dropout 0.05
Target modules 7 linear layers
Quantization 4-bit NormalFloat (NF4)
Learning rate 2e-4
LR scheduler Cosine with 100-step warmup
Epochs 3
Per-device batch size 2
Gradient accumulation 8x
Effective batch size 16
Max sequence length 4096 tokens
Optimizer paged_adamw_8bit
Precision bf16

Notes: Extended 4096-token context for longer security conversations. Requires trust_remote_code=True.

Security Coverage

Web Security (1,435 examples)

OWASP Top 10 2021: Broken Access Control, Cryptographic Failures, Injection, Insecure Design, Security Misconfiguration, Vulnerable Components, Authentication Failures, Software Integrity Failures, Logging/Monitoring Failures, SSRF.

Languages: Python, JavaScript, Java, Go, PHP, C#, TypeScript, Ruby, Rust, Kotlin, YAML.

AI/ML Security (750 examples)

OWASP LLM Top 10 2025: Prompt Injection, Sensitive Information Disclosure, Supply Chain Vulnerabilities, Data/Model Poisoning, Improper Output Handling, Excessive Agency, System Prompt Leakage, Vector/Embedding Weaknesses, Misinformation, Unbounded Consumption.

Frameworks: LangChain, OpenAI, Anthropic, HuggingFace, LlamaIndex, ChromaDB, Pinecone, FastAPI, Flask, vLLM, CrewAI, and 30+ more.

SecureCode Model Collection

This model is part of the SecureCode collection of 8 security-specialized models:

Model Base Size Tier HuggingFace
Llama 3.2 SecureCode meta-llama/Llama-3.2-3B-Instruct 3B Accessible llama-3.2-3b-securecode
Qwen2.5 Coder SecureCode Qwen/Qwen2.5-Coder-7B-Instruct 7B Mid-size qwen2.5-coder-7b-securecode
DeepSeek Coder SecureCode deepseek-ai/deepseek-coder-6.7b-instruct 6.7B Mid-size deepseek-coder-6.7b-securecode
CodeGemma SecureCode google/codegemma-7b-it 7B Mid-size codegemma-7b-securecode
CodeLlama SecureCode codellama/CodeLlama-13b-Instruct-hf 13B Large codellama-13b-securecode
Qwen2.5 Coder 14B SecureCode Qwen/Qwen2.5-Coder-14B-Instruct 14B Large qwen2.5-coder-14b-securecode
StarCoder2 SecureCode bigcode/starcoder2-15b-instruct-v0.1 15B Large starcoder2-15b-securecode
Granite 20B Code SecureCode ibm-granite/granite-20b-code-instruct-8k 20B XL granite-20b-code-securecode

Choose based on your deployment constraints: 3B for edge/mobile, 7B for general use, 13B-15B for deeper reasoning, 20B for maximum capability.

SecureCode Dataset Family

Dataset Examples Focus Link
SecureCode 2,185 Unified (web + AI/ML) scthornton/securecode
SecureCode Web 1,435 Web security (OWASP Top 10 2021) scthornton/securecode-web
SecureCode AI/ML 750 AI/ML security (OWASP LLM Top 10 2025) scthornton/securecode-aiml

Intended Use

Use this model for:

  • Training AI coding assistants to write secure code
  • Security education and training
  • Vulnerability research and secure code review
  • Building security-aware development tools

Do not use this model for:

  • Offensive exploitation or automated attack generation
  • Circumventing security controls
  • Any activity that violates the base model's license

Citation

@misc{thornton2026securecode,
  title={SecureCode: A Production-Grade Multi-Turn Dataset for Training Security-Aware Code Generation Models},
  author={Thornton, Scott},
  year={2026},
  publisher={perfecXion.ai},
  url={https://huggingface.co/datasets/scthornton/securecode},
  note={arXiv:2512.18542}
}

Links

License

This model is released under the apache-2.0 license (inherited from the base model). The training dataset (SecureCode) is licensed under CC BY-NC-SA 4.0.

Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for scthornton/qwen2.5-coder-7b-securecode

Base model

Qwen/Qwen2.5-7B
Adapter
(357)
this model

Dataset used to train scthornton/qwen2.5-coder-7b-securecode

Paper for scthornton/qwen2.5-coder-7b-securecode