Post
88
From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output
I ran 6 experiments trying to use Anthropic's SAE steering for JSON generation.
- Base model: 86.8% valid JSON
- Steering only: 24.4%
- Fine-tuned: 96.6%
- FSM constrained: 100%
Steering is for semantics, not syntax.
https://huggingface.co/blog/MaziyarPanahi/sae-steering-json
I ran 6 experiments trying to use Anthropic's SAE steering for JSON generation.
- Base model: 86.8% valid JSON
- Steering only: 24.4%
- Fine-tuned: 96.6%
- FSM constrained: 100%
Steering is for semantics, not syntax.
https://huggingface.co/blog/MaziyarPanahi/sae-steering-json