final v16 does not appear to work correctly, it stops after the first prompt.

#19

by mancub - opened 15 days ago

VSCode + CC, Minachist/Qwen3.6-27B-Mixed-Autoround

I enter my prompt, a mid-size request to investigate a hardware issue in a MCU, I get a response:

I'll investigate this issue by exploring the boot process, hardware initialization, and pin configuration in your project. Let me launch multiple agents to explore different aspects of this problem simultaneously.

<think> </think>

And then nothing...

slepkaviba

13 days ago

•

edited 13 days ago

I will open issue for v18, as v16 seems to handle tooling...

froggeric

Owner 13 days ago

I think I have finally solved it in v19. So far it has been flawless in 3 long agentic tests in a row. Previously, I had it happen in around 80% of my sessions.

This has been a tough one to crack. To fix it I had to resort to better prompt engineering:

<IMPORTANT>
Reminder:
- You can use the <think></think> block to plan your next tool call OR to synthesize data and formulate your final response to the user.
- ALL explanation and reasoning MUST be placed strictly inside the <think></think> block.
- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags.
- If you choose to call a tool, you MUST output the <tool_call> block IMMEDIATELY after closing </think>. Do NOT output any conversational text before the tool call.
- The <tool_call> and <function> tags MUST be at the very beginning of a new line, with NO spaces or indentation before them.
- To call multiple functions, output a separate, completely closed <tool_call></tool_call> block for EACH function. Do NOT nest <tool_call> blocks.
- If you have gathered all necessary data and do not need to call a tool, answer the question like normal and provide your final response to the user IMMEDIATELY after closing </think>.
</IMPORTANT>

It helped a bit, but did not solve it. What I think finally did it, was a complete rewrite of the KV cache handling, by setting preserve_thinking to true as default, and abolishing the empty think injection, which was poisoning the model's in-context learning.

froggeric changed discussion status to closed 13 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment