Justinaquino: "Add OpenCoder Qwen3.5 agentic setup article (2026-03-30)"

2026-03-30T15:13:55Z

"Add OpenCoder Qwen3.5 agentic setup article (2026-03-30)"

New page

= OpenCoder v3.0 — Qwen3.5 Agentic Setup =

'''Date:''' 2026-03-30<br>
'''Author:''' Justin / Comfac IT / CGG R&D & BD<br>
'''Status:''' ✅ Working — <code>qwen3.5-4b-instruct</code> executes tools in OpenCode<br>
'''Part of:''' [[Project OpenCoder: AI Independence Initiative]]

----

== 1. Problem Statement ==

The goal was to run a '''locally-hosted agentic coding assistant''' equivalent to Claude Code, using quantized open-source models on consumer hardware (8GB VRAM), served via Ollama and accessed through OpenCode or Qwen Code.

Two tools were tested in parallel:
* '''OpenCode''' (v1.3.7) — open-source Claude Code equivalent, TUI-based
* '''Qwen Code''' (v0.13.2) — Alibaba's terminal agent, optimized for Qwen models

----

== 2. What Failed and Why ==

=== 2.1 Qwen 2.5 Coder (7B) — No Tool Execution ===

Models tested:
* <code>qwen2.5-coder:7b-instruct-q4_K_M</code>
* <code>qwen-opencode:latest</code> (custom Modelfile, tool dispatch tuned)
* <code>qwen-agent-7b:latest</code> (agent-tuned variant)

'''Symptom:''' Models generated syntactically correct bash commands as plain text in the response body. OpenCode displayed them but never executed them. The "Build" status showed green (model responded) but no tool was invoked.

'''Root cause:''' Qwen 2.5 Coder at 7B does not reliably emit structured tool call JSON. It understands the task and writes the correct command, but outputs it as prose rather than invoking the framework's tool interface. The model lacks robust function-calling at this parameter count.

After extensive testing across multiple sessions and model variants, these were removed from the config entirely.

=== 2.2 Qwen 3.5 (4B/9B) Base — Thinking Mode Interference ===

Models tested:
* <code>qwen3.5:4b</code>
* <code>qwen3.5:9b</code>

'''Symptom:''' Models showed visible <code>Thinking:</code> blocks in the output and generated commands as text. Same non-execution pattern as 2.5 Coder, but for a different reason.

'''Root cause:''' Qwen 3.5 operates in '''thinking mode by default''', generating <code><think>...</think></code> content before its final response. When served via Ollama's OpenAI-compatible <code>/v1</code> endpoint, the thinking block was being passed through in the response content. OpenCode's tool call parser received a response beginning with <code><think></code> text and either misidentified it, or the tool call was embedded inside the think block rather than emitted as a clean structured call after it.

'''Key finding from Qwen3.5 model card:'''
<blockquote>"Qwen3.5 models operate in thinking mode by default... Qwen3.5 does not officially support the soft switch of Qwen3, i.e., <code>/think</code> and <code>/nothink</code>."</blockquote>

This means the thinking mode cannot be toggled with a simple prompt prefix — it must be disabled at the API parameter level or suppressed via system prompt in a custom Modelfile.

'''Performance note:''' <code>qwen3.5:9b</code> took '''4–5 minutes''' to generate a response. Investigation revealed models had been migrated to a mechanical HDD. Benchmarked read speed: approximately '''8.3–16 MB/s''' (vs. 600 MB/s on SSD). A 5–6 GB model file at 16 MB/s ≈ 6 minutes load time. This explained the extreme latency — it was a storage bottleneck, not a model or GPU issue.

{{mbox | type = notice | text = ⚠️ '''Critical finding:''' Always store active Ollama models on SSD. HDD throughput at 8–16 MB/s makes even 4B models unusable in interactive agentic workflows.}}

=== 2.3 Qwen Code — Credential and Path Confusion ===

'''Symptom:''' <code>Failed to switch model to 'openai::ollama'. Missing credentials for modelProviders model 'ollama'.</code>

'''Root cause (two separate issues):'''

'''Issue 1:''' The <code>env</code> block declaring <code>OLLAMA_API_KEY</code> was missing from <code>settings.json</code>. Qwen Code requires the key to exist even though Ollama doesn't authenticate — it just checks that the env var is set.

'''Issue 2 — Distrobox path split:''' Qwen Code running inside a distrobox container reads <code>~/.qwen/settings.json</code> which resolves to <code>/home/justin/.qwen/</code> inside the container. The file being edited was at <code>/run/host/home/justin/.qwen/settings.json</code> on the host. These are two different files. Edits to the host path had no effect on what the running container read.

'''Fix:''' Write directly to <code>~/.qwen/settings.json</code> from inside the correct container, or use a heredoc to overwrite cleanly.

{{mbox | type = notice | text = ⚠️ '''Note:''' Qwen Code cannot be run using the standard distrobox burner container workflow due to this path resolution issue. OpenCode does not have this limitation. See [[Opencode isolation and burner workflow 260216]].}}

----

== 3. What Worked — The Modelfile Solution ==

=== 3.1 The Modelfile Approach ===

Since Qwen3.5 doesn't support <code>/nothink</code> and Ollama doesn't expose <code>chat_template_kwargs</code> at the API level, the solution was to create a '''custom Ollama Modelfile''' that:

# Bases on the existing <code>qwen3.5:4b</code> (or <code>:9b</code>) pulled model
# Sets coding-optimized sampling parameters per the official model card
# Injects a SYSTEM prompt that suppresses thinking output and instructs the model to use tools directly

'''Modelfile for 4B (save as <code>Modelfile.qwen3.5-4b-instruct</code>):'''

<pre>
FROM qwen3.5:4b

PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
PARAMETER repeat_penalty 1.0
PARAMETER num_ctx 32768

SYSTEM """You are a precise coding and system assistant running inside an agentic terminal environment.

CRITICAL RULES:
- Do NOT output <think> or </think> tags
- Do NOT show reasoning steps or internal monologue
- Do NOT explain what you are about to do before doing it
- Respond with tool calls ONLY — no prose, no narration

When asked to perform any file, shell, or system operation:
- Invoke the appropriate tool immediately (bash, write, read, etc.)
- Use the exact tool call format the framework expects
- Execute first, confirm after only if asked

Available tools: bash, file read/write, glob, grep, list, task, webfetch.
Always use tools. Never output commands as plain text."""
</pre>

'''Build command:'''

<pre>
ollama create qwen3.5-4b-instruct -f Modelfile.qwen3.5-4b-instruct
</pre>

=== 3.2 Why the Modelfile Made It Work ===

The Modelfile changes three things simultaneously:

{| class="wikitable"
! Layer !! What changed !! Effect
|-
| '''Sampling params''' || <code>temperature=0.6</code>, <code>top_p=0.95</code>, <code>top_k=20</code> || Matches Qwen3.5 model card recommendation for precise coding tasks — reduces token variance, more deterministic tool call format
|-
| '''System prompt''' || Explicit instruction to suppress <code><think></code> tags and use tools || Overrides the default thinking behavior at the prompt level even when API-level <code>enable_thinking: false</code> isn't available
|-
| '''Model identity''' || Registered as a new Ollama model tag || Can be added to OpenCode/Qwen Code config independently, tested in isolation, and versioned separately from the base model
|}

The key insight: Qwen3.5 '''does''' support tool calling (45.6 on TIR-Bench). The problem was never capability — it was '''output format'''. The Modelfile forces the model into a mode where it emits tool calls in the structured format OpenCode expects, rather than narrating what it would do.

=== 3.3 Confirmed Working ===

* <code>qwen3.5-4b-instruct</code> in '''OpenCode''' successfully executed <code>mkdir</code>, <code>echo</code>, file creation, and multi-step <code>mkdir && echo && cat</code> chains
* Response time: '''7–10 seconds''' per tool call on 4B (vs 4–5 minutes on 9B from HDD)
* Thinking blocks: suppressed — clean tool call output observed

----

== 4. Config Management Scripts ==

Three scripts were produced to manage the model ecosystem. These live in the OpenCoder working directory.

=== 4.1 sync-opencode-models.py ===

Syncs Ollama's model list into <code>opencode.json</code> under <code>provider.ollama.models</code> (dict format). Preserves custom names, auto-generates names using family/size/capability detection, supports <code>--dry-run</code> and <code>--verbose</code>.

=== 4.2 sync-qwencode-models.py ===

Same logic but targets <code>~/.qwen/settings.json</code> under <code>modelProviders.openai[id=ollama].models</code> (array format). Uses <code>NAME_RULES</code> lookup table with 3-letter SFX codes for model role identification. Also writes correct <code>contextLength</code> per model (262144 for Qwen3.5-9B, 32768 for others).

=== 4.3 fix-opencode-model-names.py ===

One-shot rename pass over an existing <code>opencode.json</code> to apply the <code>Family (Size) - Descriptor · SFX</code> naming convention without a full sync.

'''Important:''' These scripts should be run using Claude or another capable agent to keep the <code>NAME_RULES</code> and <code>PINNED_NAMES</code> tables current as new models are pulled.

=== 4.4 Naming Convention ===

<pre>Family (Size) - Descriptor [Quant] · SFX</pre>

{| class="wikitable"
! SFX !! Meaning
|-
| <code>AGT</code> || Agentic / Tool Dispatch
|-
| <code>COD</code> || Coding focused
|-
| <code>RSN</code> || Reasoning / Thinking
|-
| <code>VIS</code> || Vision / Multimodal
|-
| <code>INS</code> || Instruct (non-thinking)
|-
| <code>GRM</code> || Grammar / Style utility
|-
| <code>GEN</code> || General purpose
|-
| <code>DST</code> || Distilled from larger model
|}

'''Examples:'''
* <code>Qwen 3.5 (4B) - Instruct No-Think · INS</code>
* <code>Qwen 3.5 (9B) - Claude Opus Distill [Q4_K_M] · RSN·DST</code>
* <code>Qwen 2.5 Coder (7B) - Tool Dispatch · AGT</code>

----

== 5. Infrastructure Notes ==

=== 5.1 Storage Impact on Model Performance ===

{| class="wikitable"
! Storage !! Read speed !! 5GB model load !! Usable?
|-
| NVMe SSD || ~3,000 MB/s || ~2 seconds || ✅ Ideal
|-
| SATA SSD || ~600 MB/s || ~8 seconds || ✅ Good
|-
| HDD (observed) || ~8–16 MB/s || '''4–6 minutes''' || ❌ Not viable for interactive use
|}

'''Action:''' Always store active Ollama model files on SSD. Use HDD only for cold storage of models not currently in rotation.

=== 5.2 OpenCode vs Qwen Code ===

{| class="wikitable"
! Feature !! OpenCode !! Qwen Code
|-
| Config format || <code>opencode.json</code> — models as dict || <code>settings.json</code> — models as array
|-
| Config path || <code>~/.config/opencode/</code> || <code>~/.qwen/</code>
|-
| Distrobox compatible || ✅ Yes || ⚠️ Path issues — run from host
|-
| Burner container workflow || ✅ Yes || ❌ Not compatible
|-
| Ollama provider || <code>provider.ollama</code> with npm package || <code>modelProviders.openai[id=ollama]</code>
|-
| Credentials || Not required for Ollama || Requires <code>OLLAMA_API_KEY</code> env var
|}

----

== 6. Replication Steps (Clean Setup) ==

<pre>
# 1. Pull base model
ollama pull qwen3.5:4b

# 2. Create Modelfile (save as Modelfile.qwen3.5-4b-instruct)
# [see Section 3.1]

# 3. Build instruct variant
ollama create qwen3.5-4b-instruct -f Modelfile.qwen3.5-4b-instruct

# 4. Verify in Ollama
ollama list | grep qwen3.5-4b-instruct

# 5. Quick sanity check — should produce NO <think> tags
ollama run qwen3.5-4b-instruct "create a file called test.txt with hello world"

# 6. Add to opencode.json
# Under provider.ollama.models:
# "qwen3.5-4b-instruct:latest": {
# "name": "Qwen 3.5 (4B) - Instruct No-Think · INS"
# }

# 7. Run sync script to keep config current
python3 sync-opencode-models.py --verbose

# 8. Launch OpenCode and test
opencode
</pre>

----

== 7. Open Questions ==

* Does Ollama strip <code><think></code> tags at the API boundary in newer versions? (Would make the Modelfile SYSTEM prompt partially redundant but harmless)
* What is the practical context limit for <code>qwen3.5-4b-instruct</code> on 8GB VRAM before OOM?
* Can <code>qwen3.5:9b</code> (on SSD) match 4B tool call reliability with the same Modelfile approach?
* Is there a way to run Qwen Code inside the distrobox burner workflow with symlinked config paths?
* At what task complexity does the 4B start failing vs needing the 9B?

----

''Document generated: 2026-03-30''<br>
''Part of: Project OpenCoder v3.0 — CGG R&D & BD Unit''

== See Also ==

* [[Project OpenCoder: AI Independence Initiative]] — Strategic overview
* [[Opencode isolation and burner workflow 260216]] — Distrobox burner container workflow
* [[OpenCode in Android Termux 260303]] — Running OpenCode on Android
* [[Proceedural Agentic Development Methodology 260304]] — PAD methodology
* [[Lora Basics 260304]] — LoRA model training fundamentals
* [[🧠 Process: Selecting and Installing the Right Ollama Model for Your Hardware]] — Model selection guide

[[Category:Research]]
[[Category:AI]]
[[Category:OpenCode]]
[[Category:Ollama]]
[[Category:Tutorials]]
[[Category:Comfac IT]]

OpenCoder Qwen35 Agentic Setup 260330 - Revision history

Justinaquino: "Add OpenCoder Qwen3.5 agentic setup article (2026-03-30)"