output-fujin

This model is a fine-tuned version of Qwen/Qwen3.5-9B.

W&B run: https://wandb.ai/cooawoo-personal/huggingface/runs/sr7glk4m

Training procedure

Hyperparameters

Parameter	Value
Learning rate	`0.0002`
LR scheduler	SchedulerType.COSINE
Per-device batch size	1
Gradient accumulation	8
Effective batch size	8
Epochs	1
Max sequence length	2048
Optimizer	OptimizerNames.PAGED_ADEMAMIX_8BIT
Weight decay	0.01
Warmup ratio	0.05
Max gradient norm	1.0
Precision	bf16
Loss type	nll

LoRA configuration

Parameter	Value
Rank (r)	128
Alpha	16
Dropout	0.05
Target modules	attn.proj, down_proj, gate_proj, in_proj_a, in_proj_b, in_proj_qkv, in_proj_z, k_proj, linear_fc1, linear_fc2, o_proj, out_proj, q_proj, qkv, up_proj, v_proj
Quantization	4-bit (nf4)

Dataset statistics

Dataset	Samples	Total tokens	Trainable tokens
rpDungeon/some-revised-datasets/rosier_inf_strict_text.parquet	36,438	65,084,381	65,084,381

Training config

model_name_or_path: Qwen/Qwen3.5-9B
bf16: true
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
use_liger: true
max_length: 2048
learning_rate: 0.0002
warmup_ratio: 0.05
weight_decay: 0.01
lr_scheduler_type: cosine
label_smoothing_factor: 0.1
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
optim: paged_ademamix_8bit
max_grad_norm: 1.0
use_peft: true
load_in_4bit: true
lora_r: 128
lora_alpha: 16
lora_dropout: 0.05
logging_steps: 1
disable_tqdm: true
save_strategy: steps
save_steps: 500
save_total_limit: 3
report_to: wandb
output_dir: output-fujin
data_config: data.yaml
prepared_dataset: prepared
num_train_epochs: 1
saves_per_epoch: 3
run_name: qwen35-9b-qlora

Data config

datasets:
- path: rpDungeon/some-revised-datasets
  data_files: rosier_inf_strict_text.parquet
  type: text
  truncation_strategy: split
shuffle_datasets: true
shuffle_combined: true
shuffle_seed: 42
eval_split: 0.0
split_seed: 42
assistant_only_loss: false

Framework versions

PEFT 0.18.1
Loft: 0.1.0
Transformers: 5.2.0
Pytorch: 2.10.0
Datasets: 4.5.0
Tokenizers: 0.22.2

Downloads last month: -

Safetensors

Model size

10B params

Tensor type

BF16

F32

Model tree for Burnt-Toast/fujin-9b

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Adapter

(13)

this model

Adapters

1 model