Instructions to use froggeric/Qwen-Fixed-Chat-Templates with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use froggeric/Qwen-Fixed-Chat-Templates with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen-Fixed-Chat-Templates froggeric/Qwen-Fixed-Chat-Templates
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
final v16 does not appear to work correctly, it stops after the first prompt.
VSCode + CC, Minachist/Qwen3.6-27B-Mixed-Autoround
I enter my prompt, a mid-size request to investigate a hardware issue in a MCU, I get a response:
I'll investigate this issue by exploring the boot process, hardware initialization, and pin configuration in your project. Let me launch multiple agents to explore different aspects of this problem simultaneously.
<think> </think>
And then nothing...
I will open issue for v18, as v16 seems to handle tooling...
I think I have finally solved it in v19. So far it has been flawless in 3 long agentic tests in a row. Previously, I had it happen in around 80% of my sessions.
This has been a tough one to crack. To fix it I had to resort to better prompt engineering:
<IMPORTANT>
Reminder:
- You can use the <think></think> block to plan your next tool call OR to synthesize data and formulate your final response to the user.
- ALL explanation and reasoning MUST be placed strictly inside the <think></think> block.
- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags.
- If you choose to call a tool, you MUST output the <tool_call> block IMMEDIATELY after closing </think>. Do NOT output any conversational text before the tool call.
- The <tool_call> and <function> tags MUST be at the very beginning of a new line, with NO spaces or indentation before them.
- To call multiple functions, output a separate, completely closed <tool_call></tool_call> block for EACH function. Do NOT nest <tool_call> blocks.
- If you have gathered all necessary data and do not need to call a tool, answer the question like normal and provide your final response to the user IMMEDIATELY after closing </think>.
</IMPORTANT>
It helped a bit, but did not solve it. What I think finally did it, was a complete rewrite of the KV cache handling, by setting preserve_thinking to true as default, and abolishing the empty think injection, which was poisoning the model's in-context learning.