OpenCoder Qwen35 Agentic Setup 260330
OpenCoder v3.0 — Qwen3.5 Agentic Setup
Date: 2026-03-30
Author: Justin / Comfac IT / CGG R&D & BD
Status: ✅ Working — qwen3.5-4b-instruct executes tools in OpenCode
Part of: Project OpenCoder: AI Independence Initiative
1. Problem Statement
The goal was to run a locally-hosted agentic coding assistant equivalent to Claude Code, using quantized open-source models on consumer hardware (8GB VRAM), served via Ollama and accessed through OpenCode or Qwen Code.
Two tools were tested in parallel:
- OpenCode (v1.3.7) — open-source Claude Code equivalent, TUI-based
- Qwen Code (v0.13.2) — Alibaba's terminal agent, optimized for Qwen models
2. What Failed and Why
2.1 Qwen 2.5 Coder (7B) — No Tool Execution
Models tested:
qwen2.5-coder:7b-instruct-q4_K_Mqwen-opencode:latest(custom Modelfile, tool dispatch tuned)qwen-agent-7b:latest(agent-tuned variant)
Symptom: Models generated syntactically correct bash commands as plain text in the response body. OpenCode displayed them but never executed them. The "Build" status showed green (model responded) but no tool was invoked.
Root cause: Qwen 2.5 Coder at 7B does not reliably emit structured tool call JSON. It understands the task and writes the correct command, but outputs it as prose rather than invoking the framework's tool interface. The model lacks robust function-calling at this parameter count.
After extensive testing across multiple sessions and model variants, these were removed from the config entirely.
2.2 Qwen 3.5 (4B/9B) Base — Thinking Mode Interference
Models tested:
qwen3.5:4bqwen3.5:9b
Symptom: Models showed visible Thinking: blocks in the output and generated commands as text. Same non-execution pattern as 2.5 Coder, but for a different reason.
Root cause: Qwen 3.5 operates in thinking mode by default, generating <think>...</think> content before its final response. When served via Ollama's OpenAI-compatible /v1 endpoint, the thinking block was being passed through in the response content. OpenCode's tool call parser received a response beginning with <think> text and either misidentified it, or the tool call was embedded inside the think block rather than emitted as a clean structured call after it.
Key finding from Qwen3.5 model card:
"Qwen3.5 models operate in thinking mode by default... Qwen3.5 does not officially support the soft switch of Qwen3, i.e.,
/thinkand/nothink."
This means the thinking mode cannot be toggled with a simple prompt prefix — it must be disabled at the API parameter level or suppressed via system prompt in a custom Modelfile.
Performance note: qwen3.5:9b took 4–5 minutes to generate a response. Investigation revealed models had been migrated to a mechanical HDD. Benchmarked read speed: approximately 8.3–16 MB/s (vs. 600 MB/s on SSD). A 5–6 GB model file at 16 MB/s ≈ 6 minutes load time. This explained the extreme latency — it was a storage bottleneck, not a model or GPU issue.
2.3 Qwen Code — Credential and Path Confusion
Symptom: Failed to switch model to 'openai::ollama'. Missing credentials for modelProviders model 'ollama'.
Root cause (two separate issues):
Issue 1: The env block declaring OLLAMA_API_KEY was missing from settings.json. Qwen Code requires the key to exist even though Ollama doesn't authenticate — it just checks that the env var is set.
Issue 2 — Distrobox path split: Qwen Code running inside a distrobox container reads ~/.qwen/settings.json which resolves to /home/justin/.qwen/ inside the container. The file being edited was at /run/host/home/justin/.qwen/settings.json on the host. These are two different files. Edits to the host path had no effect on what the running container read.
Fix: Write directly to ~/.qwen/settings.json from inside the correct container, or use a heredoc to overwrite cleanly.
3. What Worked — The Modelfile Solution
3.1 The Modelfile Approach
Since Qwen3.5 doesn't support /nothink and Ollama doesn't expose chat_template_kwargs at the API level, the solution was to create a custom Ollama Modelfile that:
- Bases on the existing
qwen3.5:4b(or:9b) pulled model - Sets coding-optimized sampling parameters per the official model card
- Injects a SYSTEM prompt that suppresses thinking output and instructs the model to use tools directly
Modelfile for 4B (save as Modelfile.qwen3.5-4b-instruct):
FROM qwen3.5:4b PARAMETER temperature 0.6 PARAMETER top_p 0.95 PARAMETER top_k 20 PARAMETER repeat_penalty 1.0 PARAMETER num_ctx 32768 SYSTEM """You are a precise coding and system assistant running inside an agentic terminal environment. CRITICAL RULES: - Do NOT output <think> or </think> tags - Do NOT show reasoning steps or internal monologue - Do NOT explain what you are about to do before doing it - Respond with tool calls ONLY — no prose, no narration When asked to perform any file, shell, or system operation: - Invoke the appropriate tool immediately (bash, write, read, etc.) - Use the exact tool call format the framework expects - Execute first, confirm after only if asked Available tools: bash, file read/write, glob, grep, list, task, webfetch. Always use tools. Never output commands as plain text."""
Build command:
ollama create qwen3.5-4b-instruct -f Modelfile.qwen3.5-4b-instruct
3.2 Why the Modelfile Made It Work
The Modelfile changes three things simultaneously:
| Layer | What changed | Effect |
|---|---|---|
| Sampling params | temperature=0.6, top_p=0.95, top_k=20 |
Matches Qwen3.5 model card recommendation for precise coding tasks — reduces token variance, more deterministic tool call format |
| System prompt | Explicit instruction to suppress <think> tags and use tools |
Overrides the default thinking behavior at the prompt level even when API-level enable_thinking: false isn't available
|
| Model identity | Registered as a new Ollama model tag | Can be added to OpenCode/Qwen Code config independently, tested in isolation, and versioned separately from the base model |
The key insight: Qwen3.5 does support tool calling (45.6 on TIR-Bench). The problem was never capability — it was output format. The Modelfile forces the model into a mode where it emits tool calls in the structured format OpenCode expects, rather than narrating what it would do.
3.3 Confirmed Working
qwen3.5-4b-instructin OpenCode successfully executedmkdir,echo, file creation, and multi-stepmkdir && echo && catchains- Response time: 7–10 seconds per tool call on 4B (vs 4–5 minutes on 9B from HDD)
- Thinking blocks: suppressed — clean tool call output observed
4. Config Management Scripts
Three scripts were produced to manage the model ecosystem. These live in the OpenCoder working directory.
4.1 sync-opencode-models.py
Syncs Ollama's model list into opencode.json under provider.ollama.models (dict format). Preserves custom names, auto-generates names using family/size/capability detection, supports --dry-run and --verbose.
4.2 sync-qwencode-models.py
Same logic but targets ~/.qwen/settings.json under modelProviders.openai[id=ollama].models (array format). Uses NAME_RULES lookup table with 3-letter SFX codes for model role identification. Also writes correct contextLength per model (262144 for Qwen3.5-9B, 32768 for others).
4.3 fix-opencode-model-names.py
One-shot rename pass over an existing opencode.json to apply the Family (Size) - Descriptor · SFX naming convention without a full sync.
Important: These scripts should be run using Claude or another capable agent to keep the NAME_RULES and PINNED_NAMES tables current as new models are pulled.
4.4 Naming Convention
Family (Size) - Descriptor [Quant] · SFX
| SFX | Meaning |
|---|---|
AGT |
Agentic / Tool Dispatch |
COD |
Coding focused |
RSN |
Reasoning / Thinking |
VIS |
Vision / Multimodal |
INS |
Instruct (non-thinking) |
GRM |
Grammar / Style utility |
GEN |
General purpose |
DST |
Distilled from larger model |
Examples:
Qwen 3.5 (4B) - Instruct No-Think · INSQwen 3.5 (9B) - Claude Opus Distill [Q4_K_M] · RSN·DSTQwen 2.5 Coder (7B) - Tool Dispatch · AGT
5. Infrastructure Notes
5.1 Storage Impact on Model Performance
| Storage | Read speed | 5GB model load | Usable? |
|---|---|---|---|
| NVMe SSD | ~3,000 MB/s | ~2 seconds | ✅ Ideal |
| SATA SSD | ~600 MB/s | ~8 seconds | ✅ Good |
| HDD (observed) | ~8–16 MB/s | 4–6 minutes | ❌ Not viable for interactive use |
Action: Always store active Ollama model files on SSD. Use HDD only for cold storage of models not currently in rotation.
5.2 OpenCode vs Qwen Code
| Feature | OpenCode | Qwen Code |
|---|---|---|
| Config format | opencode.json — models as dict |
settings.json — models as array
|
| Config path | ~/.config/opencode/ |
~/.qwen/
|
| Distrobox compatible | ✅ Yes | ⚠️ Path issues — run from host |
| Burner container workflow | ✅ Yes | ❌ Not compatible |
| Ollama provider | provider.ollama with npm package |
modelProviders.openai[id=ollama]
|
| Credentials | Not required for Ollama | Requires OLLAMA_API_KEY env var
|
6. Replication Steps (Clean Setup)
# 1. Pull base model
ollama pull qwen3.5:4b
# 2. Create Modelfile (save as Modelfile.qwen3.5-4b-instruct)
# [see Section 3.1]
# 3. Build instruct variant
ollama create qwen3.5-4b-instruct -f Modelfile.qwen3.5-4b-instruct
# 4. Verify in Ollama
ollama list | grep qwen3.5-4b-instruct
# 5. Quick sanity check — should produce NO <think> tags
ollama run qwen3.5-4b-instruct "create a file called test.txt with hello world"
# 6. Add to opencode.json
# Under provider.ollama.models:
# "qwen3.5-4b-instruct:latest": {
# "name": "Qwen 3.5 (4B) - Instruct No-Think · INS"
# }
# 7. Run sync script to keep config current
python3 sync-opencode-models.py --verbose
# 8. Launch OpenCode and test
opencode
7. Open Questions
- Does Ollama strip
<think>tags at the API boundary in newer versions? (Would make the Modelfile SYSTEM prompt partially redundant but harmless) - What is the practical context limit for
qwen3.5-4b-instructon 8GB VRAM before OOM? - Can
qwen3.5:9b(on SSD) match 4B tool call reliability with the same Modelfile approach? - Is there a way to run Qwen Code inside the distrobox burner workflow with symlinked config paths?
- At what task complexity does the 4B start failing vs needing the 9B?
Document generated: 2026-03-30
Part of: Project OpenCoder v3.0 — CGG R&D & BD Unit
See Also
- Project OpenCoder: AI Independence Initiative — Strategic overview
- Opencode isolation and burner workflow 260216 — Distrobox burner container workflow
- OpenCode in Android Termux 260303 — Running OpenCode on Android
- Proceedural Agentic Development Methodology 260304 — PAD methodology
- Lora Basics 260304 — LoRA model training fundamentals
- 🧠 Process: Selecting and Installing the Right Ollama Model for Your Hardware — Model selection guide