add cache_position to mask_kwargs in modeling_step3p7.py

#13
by shifangxu2024 - opened

In transformers version 5.0.0, the create_causal_mask function requires the cache_position argument.

WinstonDeng changed pull request status to merged

Sign up or log in to comment