PySecPatch-7B

PySecPatch-7B is a defensive Python security adapter for Qwen/Qwen2.5-Coder-7B-Instruct. It is designed for vulnerability triage, CWE classification, security explanation, and candidate secure-code generation. It should be used with human review and automated verification.

Author: Ahmed Bin Khalid, Independent Researcher (ORCID 0000-0002-0616-2604)

This repository contains the PEFT adapter, tokenizer files, training metadata, and chat template. The Qwen base weights are not redistributed.

Evaluation

Family-disjoint holdout

The final adapter and unmodified base were evaluated on the same 3,200 records.

Metric Base PySecPatch
Classification accuracy 4.81% 90.72%
Classification F1 9.18% 93.40%
Strict JSON 91.53% 99.44%
Clean-negative preservation 0.00% 100.00%
Security-control pass 1.63% 88.08%
Normalized exact repair 0.17% 83.33%
Parseable fixed code 6.42% 99.21%

Paired classification yielded 2,770 adapter-only correct predictions and 21 base-only correct predictions across 3,200 records (log10(p) = -787.25, exact two-sided McNemar).

External and repository evaluation

On the pinned SALLM scored subset, PySecPatch achieved 26.77% functional pass, 31.88% security-test pass, and 9.58% secure-functional pass. Four of 100 prompts lacked upstream fixtures.

Repository-format holdout performance was 38.00% patch application and 34.38% security-control pass. On a frozen 24-case repository suite, vulnerability detection and clean preservation were perfect, but none of 12 vulnerable patches passed every acceptance gate. These results do not support autonomous repair or state-of-the-art claims.

Training

The adapter was trained in two consecutive QLoRA stages. Stage A used 8,400 train records from a 12,000-record corpus. Stage B continued from that adapter using 48,000 train records from a separate 60,000-record corpus. Validation, test, and holdout splits never entered optimization.

Both corpora contain generated Python examples and are released under Apache-2.0. The combined dataset repository contains 72,000 records spanning 43 CWEs in the second stage and 35 CWEs in the first stage.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter = "abkmystery/PySecPatch-7B"

tokenizer = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

Use the system instruction:

You are PySecPatch, a defensive Python secure coding model. Identify vulnerabilities, explain risk, and produce minimal safe patches. Return strict JSON only.

The expected response keys are is_vulnerable, cwe, vuln_type, vulnerable_lines, explanation, fixed_code, patch_summary, and safe_test.

Intended Use

  • Defensive analysis of user-supplied Python code.
  • Candidate finding classification and CWE identification.
  • Security explanations and review assistance.
  • Candidate snippet repairs subject to tests and human review.
  • Research on security specialization and generalization.

Limitations

  • External secure-functional generation is substantially weaker than controlled holdout performance.
  • Repository-level unified diffs often fail to apply or pass verification.
  • Line localization is moderate (F1 = 0.4407).
  • The training corpora are generated rather than mined from real repositories.
  • The model can miss vulnerabilities and can produce plausible but incomplete repairs.

Do not use PySecPatch for autonomous deployment, unauthorized scanning, exploit development, or offensive automation.

Reproducibility

Adapter model SHA-256:

4c2b5c7c0d2982b99de9c319e998274fc12f3aae5bf8d2c3b5db58c5864dc65b

Full evaluation reports, raw predictions, frozen hashes, and scripts are linked from the GitHub and archival releases.

Training dataset: 10.5281/zenodo.21016753.

Citation

@software{khalid2026pysecpatch,
  author  = {Bin Khalid, Ahmed},
  title   = {PySecPatch: Defensive Python Vulnerability Triage and Repair Research Artifacts},
  year    = {2026},
  version = {0.1.1},
  url     = {https://github.com/abkmystery/PySecPatch}
}

Released under the Apache License 2.0. See LICENSE.

Current software archive: 10.5281/zenodo.21015885. All versions: 10.5281/zenodo.21015503.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for abkmystery/PySecPatch-7B

Base model

Qwen/Qwen2.5-7B
Adapter
(711)
this model

Dataset used to train abkmystery/PySecPatch-7B