Text Generation
Transformers
Safetensors
PyTorch
nvidia
nemotron-h
conversational

Fix: Support loading dt_bias and other trained-model parameters in modeling_nemotron_h.py

#7

dt_bias is a learned parameter but this custom modeling code overwrites the parameters when loading an already-trained huggingface model. I adopt the fix from NanoV3's modeling code: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/blob/main/modeling_nemotron_h.py

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment