Unable to load using vLLM
#3
by Nithishm2410wsfa - opened
I tried serving this model using this command:
vllm serve nvidia/Gemma-4-26B-A4B-NVFP4 --max-model-len 128
and got this error
(EngineCore pid=3012) File "/workspace/tyrec/.venv/lib/python3.13/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=3012) return func(*args, **kwargs)
(EngineCore pid=3012) File "/workspace/tyrec/.venv/lib/python3.13/site-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model
(EngineCore pid=3012) self.load_weights(model, model_config)
(EngineCore pid=3012) ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=3012) File "/workspace/tyrec/.venv/lib/python3.13/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=3012) return func(*args, **kwargs)
(EngineCore pid=3012) File "/workspace/tyrec/.venv/lib/python3.13/site-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights
(EngineCore pid=3012) loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(EngineCore pid=3012) File "/workspace/tyrec/.venv/lib/python3.13/site-packages/vllm/model_executor/models/gemma4_mm.py", line 1319, in load_weights
(EngineCore pid=3012) return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(EngineCore pid=3012) ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=3012) File "/workspace/tyrec/.venv/lib/python3.13/site-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=3012) return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=3012) File "/workspace/tyrec/.venv/lib/python3.13/site-packages/vllm/model_executor/models/utils.py", line 355, in load_weights
(EngineCore pid=3012) autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=3012) File "/workspace/tyrec/.venv/lib/python3.13/site-packages/vllm/model_executor/models/utils.py", line 302, in _load_module
(EngineCore pid=3012) yield from self._load_module(
(EngineCore pid=3012) prefix, child_modules[child_prefix], child_weights
(EngineCore pid=3012) )
(EngineCore pid=3012) File "/workspace/tyrec/.venv/lib/python3.13/site-packages/vllm/model_executor/models/utils.py", line 275, in _load_module
(EngineCore pid=3012) loaded_params = module_load_weights(weights)
(EngineCore pid=3012) File "/workspace/tyrec/.venv/lib/python3.13/site-packages/vllm/model_executor/models/gemma4.py", line 1239, in load_weights
(EngineCore pid=3012) return loader.load_weights(_weight_iterator())
(EngineCore pid=3012) ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=3012) File "/workspace/tyrec/.venv/lib/python3.13/site-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=3012) return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=3012) File "/workspace/tyrec/.venv/lib/python3.13/site-packages/vllm/model_executor/models/utils.py", line 355, in load_weights
(EngineCore pid=3012) autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=3012) File "/workspace/tyrec/.venv/lib/python3.13/site-packages/vllm/model_executor/models/utils.py", line 302, in _load_module
(EngineCore pid=3012) yield from self._load_module(
(EngineCore pid=3012) prefix, child_modules[child_prefix], child_weights
(EngineCore pid=3012) )
(EngineCore pid=3012) File "/workspace/tyrec/.venv/lib/python3.13/site-packages/vllm/model_executor/models/utils.py", line 275, in _load_module
(EngineCore pid=3012) loaded_params = module_load_weights(weights)
(EngineCore pid=3012) File "/workspace/tyrec/.venv/lib/python3.13/site-packages/vllm/model_executor/models/gemma4.py", line 1037, in load_weights
(EngineCore pid=3012) param = params_dict[name]
(EngineCore pid=3012) ~~~~~~~~~~~^^^^^^
(EngineCore pid=3012) KeyError: 'layers.0.experts.0.down_proj.input_scale'
Transformer version: 5.5.4
vLLM version: 0.19.0
Torch version: 2.10.0
try vLLM 0.20.0
I don't think this supports the downgraded Transformers frompip install vllm
This breaks the seamless operational migration.