It looks like there is incorrect limit on the model context length. The fp16 like the original one have 131072 length. Updating this value resolved errors while processing longer prompts.
#2
by dtrawins - opened
No description provided.
This is a known issue and a current limitation of the INT4 model. When optimum-intel allows preserving the original max_position_embeddings, we will re-upload the model.
amokrov changed pull request status to closed