Instructions to use AlexWortega/Qwen1.7bnla with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AlexWortega/Qwen1.7bnla with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Universal NLA β one shared AV/AR across 18 LLM architectures
A single Activation Verbalizer + Activation Reconstructor pair that operates on hidden activations from a pool of structurally different small/medium LLMs (GPT-2, Bloom, Pythia, Qwen2/Qwen3, Gemma-4, SmolLM2/3, GPT-Neo, Nemotron, Phi, DeepSeek, LFM2, YandexGPT, rugpt3, Vikhr).
Extends Anthropic's Natural Language Autoencoders (https://transformer-circuits.pub/2026/nla/index.html) from per-model to cross-architecture: new models snap in via a small lstsq-fitted linear adapter pair (enc_M, dec_M) β no AV/AR fine-tune per new model.
ββ enc_M : d_M β d_shared (lstsq init) ββ
h_M (d_M) βββ€ βββ AV (Qwen3-1.7B+LoRA) ββΆ z (text)
ββ model_tag injected as plain text ββββ
z ββΆ AR (truncated Qwen3-1.7B + LoRA) ββΆ Δ₯_shared (d=2048)
β
ββ dec_M : d_shared β d_M ββΆ Δ₯_M
β
βΌ
FVE_meannorm(Δ₯_M, h_M)
Headline result (v6, production)
FVE_pipeline_meannorm β per-tag, train/eval 80/20 split, 200 passages, in M's native space via dec_M(AR(z)) vs h_M, both normalized to βd_M.
β
= held-out: trunks never saw this model; only enc_M + dec_M lstsq-fit.
| Tag | FVE | Status | Tag | FVE | Status | |
|---|---|---|---|---|---|---|
| β rugpt3-large | 0.995 | held-out (RU) | qwen3-4b | 0.908 | trained | |
| gpt-neo-1p3b | 0.991 | trained | qwen2p5-7b | 0.891 | trained | |
| gpt2-medium | 0.980 | trained | qwen2p5-0p5b | 0.880 | trained | |
| qwen3-0p6b | 0.970 | trained | nemotron-mini-4b | 0.871 | trained | |
| smollm2-360m | 0.970 | trained | β deepseek-llm-7b | 0.804 | held-out | |
| pythia-410m | 0.966 | trained | β vikhr-7b-01 | 0.758 | held-out (RU) | |
| gemma4-e4b | 0.933 | trained | smollm3-3b | 0.756 | trained | |
| bloom-560m | 0.914 | trained | β yagpt-5-8b | 0.755 | held-out (RU) | |
| phi-1p5 | 0.751 | trained | ||||
| β lfm-7b | 0.635 | held-out |
- Mean trained (13): 0.892
- Mean held-out (5): 0.789 β only ~10 pp gap, no architecture catastrophes
- Mean overall (18): 0.874
Anthropic per-model paper baseline on a single Qwen3-1.7B is 0.38, so this is **2.3Γ higher across an 18-architecture pool with one shared AV/AR**. The held-out generalisation is the load-bearing claim: 5 architectures (LFM2, DeepSeek, YandexGPT, rugpt3, Vikhr) cross 0.63 β and 4 of 5 cross 0.75 β with no trunk retraining, just an lstsq enc_M (30 s) + a direct-lstsq 2 min) per new model.dec_M (
Experiments
All versions share the same pipeline (extract activations β init enc_M β AV SFT β AR SFT β refit_dec_direct β joint RL). What changes is the training pool, the AV/AR trunk, and how dec_M is fit.
| Ver | Trunk (d_shared) | Trained | Held-out (eval) | dec_M fit | Mean FVE_pipe_mn | Notes | HF |
|---|---|---|---|---|---|---|---|
| v1 | Qwen3-1.7B (2048) | 5 | 2 (gemma4, phi) | pinv | 0.69 / 7 | first cross-arch run; phi crashes -0.64 | adapter_universal_rl_v1/ |
| v2 | Qwen3-4B (2560) | 5 | β | β | β | FAILED β AV mode-collapsed to canonical template | β |
| v3 | Qwen3-1.7B (2048) | 13 (50k) | 0 | pinv | 0.83 trained / -0.75 gemma4 | FAILED β mixed teacher z's (Qwen3-8B + Qwen2.5-7B) poisoned SFT | β |
| v4 | Qwen3-1.7B (2048) | 13 | 0 | pinv | 0.83 (some -ve on other held-out) | refit_dec on wrong objective dec(norm(enc(h))) β h |
precursor to v5 |
| v5 | Qwen3-1.7B (2048) | 13 | 3 (lfm, deepseek, yagpt) | direct-lstsq | 0.73 trained / 0.84 held-out | added phi/smollm3 to training (were broken held-out); dec fix | adapter_universal_v5_direct/ |
| v6 (prod) | Qwen3-1.7B (2048) | 13 | 5 (+ rugpt3, vikhr) | direct-lstsq | 0.89 trained / 0.79 held-out, 0.874 / 18 overall | gemma4 0.09 β 0.93; broad arch coverage; held-out RU + 7-8B | adapter_universal_v6/ |
| v7 | Qwen3-4B (2560) | 12 (+ 1.7B held-out) | 6 | direct-lstsq | 0.88 trained / 0.79 held-out, 0.849 / 18 | trunk upgrade rerun (no collapse this time, same teacher); RL OOMs on 32 GB V100; no measurable gain over v6 | adapter_universal_v7_sft/ |
Trained pool (v5 / v6 / v7, identical 13): bloom-560m, gpt2-medium, pythia-410m, qwen2p5-0p5b, smollm2-360m, gpt-neo-1p3b, qwen3-0p6b, qwen3-4b, qwen2p5-7b, nemotron-mini-4b, gemma4-e4b, smollm3-3b, phi-1p5.
Held-out (v6): lfm-7b (Liquid LFM2-1.2B), deepseek-llm-7b, yagpt-5-8b (YandexGPT-5-Lite-8B), rugpt3-large (Russian, GPT-2 family), vikhr-7b-01 (Russian, Mistral family).
Failed / abandoned experiments
- v2 β Qwen3-4B trunk + LoRA r=16: bigger trunk mode-collapsed to a canonical template (all z's identical regardless of h). Same LoRA rank is the wrong scaling axis here.
- per-token HeadTransformer + frozen v1 trunk: richer attention head over per-position activations; AV trained on the linear-adapter output distribution can't interpret the HeadTransformer distribution. Joint train heads + LoRA β collapse.
- v3 β 5Γ data (50k passages) with mixed teacher z: regressed trained pool 0.92 β 0.83; gemma4-e4b crashed 0.86 β -0.75. Mixing teachers in the same SFT corpus is poison.
- MLP
dec_Mhead: 4096-hidden 2-layer MLP initialised from lstsq solution; did not beat the pure linear baseline (e.g. lfm 0.76 MLP vs 0.79 linear). The residual is already linear; non-linearity overfits. - v7 β Qwen3-4B trunk rerun (consistent teacher): SFT loss clean (~0.6, no collapse). After direct-lstsq
dec_M, final pipeline FVE = 0.849 across 18 tags (vs v6 0.874) β 2.5 pp worse, with trained-pool mean dropping 0.892 β 0.877 while held-out is flat (0.789 β 0.792). RL phase OOMs on a single 32 GB V100 (AV + AV_init + AR = 3 Γ 4B copies don't fit), so v7 is SFT-only β but the SFT comparison alone is conclusive: trunk upgrade gives no measurable gain on this task. Mainline stays on Qwen3-1.7B (v6).
HuggingFace artifacts
Repo: AlexWortega/Qwen1.7bnla β https://huggingface.co/AlexWortega/Qwen1.7bnla
adapter_universal_v6/ β production, use this
av/ AV LoRA on Qwen3-1.7B + enc_M
ar/ AR LoRA on truncated Qwen3-1.7B + value_head.pt
adapters/ 18 (enc_M, dec_M) pairs + refit_direct_report.json
nla_meta.yaml d_shared, layer_index, anchor_tag, tag list
fve_report.json per-tag FVE table
adapter_universal_v7_sft/ v7 Qwen3-4B trunk, SFT-only (RL OOM); 18 tags @ 0.849 mean
adapter_universal_v5_direct/ v5 with direct-lstsq dec_M (13 tags)
adapter_universal_rl_v1/ v1 (5 tags + 2 held-out)
adapter_rl_mix_batched_v1/ single-model NLA (Qwen3-1.7B paper repro)
adapter_warmstart_9k/ pre-RL SFT checkpoint
Adding a new architecture (~20 minutes)
- Add the model to
configs/universal/extract_v1.yaml; runscripts/extract_multi.py(skips existing shards). ~10-15 min per 7 B model. scripts/extend_adapters.pyβ lstsq-fitenc_Magainst the anchor.scripts/refit_dec_direct.pyβ lstsq-fitdec_Magainst AR's actual predictions on the same passage corpus.scripts/eval_fve_multi.pyβ FVE typically β₯ 0.79 without touching the trunks. If the model has tokenizer quirks (Voxtral tekken, YaGPT custom BPE), passuse_fast=False;extract_multi.pyhas a fallback retry.
Quickstart (reproduce v6 inference)
# 1. Local .env with OpenRouter + HF tokens
cp .env.example .env # then edit
# 2. Sync repo to eva01
./infra/sync_to_eva01.sh
# 3. Build image
ssh eva01 'cd ~/vae_llm && docker compose build'
# 4. Pull v6 from HF and run universal AV on a held-out model
./infra/run_on_eva01.sh run_universal_av --tag deepseek-llm-7b --n-passages 25
Hardware
- eva01: 4Γ V100-SXM2-32GB, 251 GB RAM, 48 CPU. CUDA 535.230. sm_70 β no vLLM β₯ 0.8 (Qwen3 needs it), no flash-attn-2; use HF
.generatefor benchmarks vialm-eval-harness. - Most stages need only 1 GPU, fp16. Joint RL on Qwen3-1.7B trunk uses 3 GPUs (AV + AV_init + AR).
Implementation notes & developer docs
See CLAUDE.md for: pipeline-stage code map, load-bearing bug fixes (fp32 mean-pool, gelsy lstsq, identity-init value_head, direct dec_M), and day-to-day environment notes.
Citation
If you use this work, please cite the original NLA paper:
Anthropic β Natural Language Autoencoders (Transformer Circuits, 2026)
https://transformer-circuits.pub/2026/nla/index.html
- Downloads last month
- -