How to Enable Gemma 4 Thinking Mode in LM Studio and OpenCode

These past few days I have been playing with Gemma 4 locally.

Not just for the sake of tinkering, but to test its capabilities and see whether it could fit into an OpenClaw-style chatbot: something that can run close to me, gives me more control over the environment, and does not require sending every interaction to an external provider.

And if I wanted to test it seriously, it was not enough to ask it four random prompts and call it a day. I also needed to check how its thinking mode behaved, because that is where you start seeing whether a local model can handle slightly more complex flows.

Enabling thinking mode in a local model sounds like the kind of thing that should work on the first try.

You flip an option in LM Studio, load the model, connect it to OpenCode, and done.

Well, not quite.

There are several layers involved, and if one of them is not configured correctly, you usually see one of these three things:

the model does not think;
the model thinks, but you do not see it;
or the model only thinks when you manually add a weird token to every prompt.

In my case, the goal was to make Gemma 4 26B work with thinking inside LM Studio, then use it from OpenCode, without typing <|think|> manually on every turn.

That is where the fun started.

The problem is not in one place

When we work with local models, we tend to talk about “enabling thinking” as if it were a single switch.

It is not.

There are several moving parts:

The model needs to know how to generate that reasoning channel.
LM Studio needs to know how to parse it.
The prompt template needs to trigger the behavior.
OpenCode needs to send messages in a format compatible with that template.

If one of those pieces fails, the wrong diagnosis is very easy.

For example:

“OpenCode is not respecting the system prompt.”
“No, the problem is LM Studio.”
“No, the model does not support reasoning.”

Most of the time, it is not that simple.

The important part: LM Studio

In my case, most of the setup lived in LM Studio.

Inside My Models, in the advanced settings for the Gemma 4 model, I had to configure two things.

1. Reasoning Parsing

In the model settings:

Enabled: ON
Start String: <|channel>thought
End String: <channel|>

This tells LM Studio how to separate the final answer from the reasoning block.

Without this, the model may be thinking, but LM Studio will not know how to split the answer from the reasoning content.

2. Prompt Template

This is usually where the real key is.

At the very top of the Gemma template, I added:

{%- set enable_thinking = true %}

The template also injects <|think|> in the first system turn when enable_thinking is active.

The relevant part looked like this:

{%- if enable_thinking is defined and enable_thinking -%}
    {{- '<|think|>' -}}
    {%- set ns.prev_message_type = 'think' -%}
{%- endif -%}

And at the end:

{%- if not enable_thinking | default(false) -%}
    {{- '<|channel>thought\n<channel|>' -}}
{%- endif -%}

In plain English: the template is prepared to open the thinking channel, and LM Studio is configured to parse it afterwards.

The detail that wasted my time

Here is where things became misleading.

I assumed that if the template was correct and LM Studio had Reasoning Parsing configured, then OpenCode only needed to send its normal system prompt and everything would work.

But no.

The clue came from a very silly test:

when I typed hola, Gemma answered normally;
when I typed <|think|> hola, the thinking block appeared.

So there was a real difference.

That told me two things:

The model was reacting to the token.
The problem was not simply “LM Studio is not parsing”.

First verification: call the API directly

Before blaming OpenCode, the best move is to talk directly to LM Studio’s OpenAI-compatible API.

For example:

curl -s http://127.0.0.1:1234/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemma-4-26b-a4b-it",
    "messages": [
      { "role": "system", "content": "You are helpful." },
      { "role": "user", "content": "Reply with exactly OK" }
    ],
    "stream": false
  }'

LM Studio returned something like this:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "OK",
        "reasoning_content": "..."
      }
    }
  ]
}

So:

the model was thinking;
LM Studio was parsing it;
the API was exposing reasoning_content.

That removes half the suspects from the list.

So what was happening with OpenCode?

OpenCode does not send one giant raw prompt.

It sends an OpenAI-style conversation to LM Studio:

{
  "messages": [
    { "role": "system", "content": "..." },
    { "role": "user", "content": "hola" }
  ]
}

And this is where the small detail matters:

the trigger changing the behavior was not the system prompt. It was the content of the user turn.

When the message arrived as:

hola

the behavior was one thing.

When it arrived as:

<|think|> hola

the behavior changed.

So the practical solution was not “modify OpenCode’s system prompt”. It was inject the prefix automatically into user messages for that specific model.

The clean OpenCode solution: a plugin

OpenCode supports plugin hooks.

Instead of forcing myself to type <|think|> in every prompt, I created a global plugin at:

~/.config/opencode/plugins/gemma-think.js

With this content:

export const GemmaThinkPlugin = async () => {
  return {
    "experimental.chat.messages.transform": async (_input, output) => {
      const messages = output.messages;
      if (!Array.isArray(messages) || messages.length === 0) return;

      const last = messages[messages.length - 1];
      if (!last || last.info.role !== "user") return;

      const model = last.info.model;
      if (!model || model.providerID !== "lmstudio" || model.modelID !== "gemma-4-26b-a4b-it") {
        return;
      }

      const firstTextPart = last.parts.find((part) => part.type === "text");
      if (!firstTextPart) return;

      if (typeof firstTextPart.text !== "string") return;
      if (firstTextPart.text.startsWith("<|think|>")) return;

      firstTextPart.text = `<|think|> ${firstTextPart.text}`;
    },
  };
};

The nice thing about doing it this way:

it only affects that model;
it does not pollute other models like Qwen;
I do not have to remember to type the token;
I do not replace the base agent prompt;
and the whole setup remains local.

How to verify it is really working

Do not rely on vibes here.

Look at the LM Studio logs and check what OpenCode is actually sending.

After adding the plugin, the OpenCode request looked like this:

{
  "role": "user",
  "content": "<|think|> hola\n"
}

And for another model, like Qwen, it still arrived as:

{
  "role": "user",
  "content": "hola\n"
}

That confirmed three things:

the hook was working;
it only affected Gemma;
the token was not leaking into every model by accident.

The common conceptual trap

The easiest mistake in an integration like this is thinking:

“If this is about reasoning, the system prompt should be enough.”

Not necessarily.

Depending on the model and its template, behavior may change based on:

the first system turn;
the shape of the last user turn;
an explicit activation token;
or a mix of several things.

So if something does not make sense, my recommendation is:

Verify LM Studio with curl first.
Check whether reasoning_content appears.
Inspect what OpenCode is actually sending in the LM Studio logs.
Do not assume the trigger belongs in the system prompt.

In this case, it did not.

Conclusion

If you want to enable thinking mode for Gemma 4 with LM Studio and OpenCode, the solution is not just “flip the switch”.

You need to:

configure Reasoning Parsing in LM Studio;
adjust the model’s prompt template;
verify that the API returns reasoning_content;
and, in OpenCode’s case, make sure the trigger reaches the right part of the message.

For my setup, that trigger belonged in the user message, not in the system prompt.

The cleanest fix was an OpenCode plugin that automatically adds <|think|> only for gemma-4-26b-a4b-it.

Which, by the way, is exactly the kind of tiny detail that can eat an afternoon if you do not inspect the logs.

And yes, sometimes the bug is not where you think it is.