output-fujin

This model is a fine-tuned version of Qwen/Qwen3.5-9B.

W&B run: https://wandb.ai/cooawoo-personal/huggingface/runs/sr7glk4m

Training procedure

Hyperparameters

Parameter Value
Learning rate 0.0002
LR scheduler SchedulerType.COSINE
Per-device batch size 1
Gradient accumulation 8
Effective batch size 8
Epochs 1
Max sequence length 2048
Optimizer OptimizerNames.PAGED_ADEMAMIX_8BIT
Weight decay 0.01
Warmup ratio 0.05
Max gradient norm 1.0
Precision bf16
Loss type nll

LoRA configuration

Parameter Value
Rank (r) 128
Alpha 16
Dropout 0.05
Target modules attn.proj, down_proj, gate_proj, in_proj_a, in_proj_b, in_proj_qkv, in_proj_z, k_proj, linear_fc1, linear_fc2, o_proj, out_proj, q_proj, qkv, up_proj, v_proj
Quantization 4-bit (nf4)

Dataset statistics

Dataset Samples Total tokens Trainable tokens
rpDungeon/some-revised-datasets/rosier_inf_strict_text.parquet 36,438 65,084,381 65,084,381
Training config
model_name_or_path: Qwen/Qwen3.5-9B
bf16: true
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
use_liger: true
max_length: 2048
learning_rate: 0.0002
warmup_ratio: 0.05
weight_decay: 0.01
lr_scheduler_type: cosine
label_smoothing_factor: 0.1
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
optim: paged_ademamix_8bit
max_grad_norm: 1.0
use_peft: true
load_in_4bit: true
lora_r: 128
lora_alpha: 16
lora_dropout: 0.05
logging_steps: 1
disable_tqdm: true
save_strategy: steps
save_steps: 500
save_total_limit: 3
report_to: wandb
output_dir: output-fujin
data_config: data.yaml
prepared_dataset: prepared
num_train_epochs: 1
saves_per_epoch: 3
run_name: qwen35-9b-qlora
Data config
datasets:
- path: rpDungeon/some-revised-datasets
  data_files: rosier_inf_strict_text.parquet
  type: text
  truncation_strategy: split
shuffle_datasets: true
shuffle_combined: true
shuffle_seed: 42
eval_split: 0.0
split_seed: 42
assistant_only_loss: false

Framework versions

  • PEFT 0.18.1
  • Loft: 0.1.0
  • Transformers: 5.2.0
  • Pytorch: 2.10.0
  • Datasets: 4.5.0
  • Tokenizers: 0.22.2
Downloads last month
-
Safetensors
Model size
10B params
Tensor type
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Burnt-Toast/fujin-9b

Finetuned
Qwen/Qwen3.5-9B
Adapter
(13)
this model
Adapters
1 model