3 Comments
User's avatar
Sandeep's avatar

Good read! Typo maxInputTokens -> maxOutputokens

Alex Ewerlöf's avatar

Thank you Sandeep. copy/paste error! Fixed :)

Sandeep's avatar

Also, Gemma 12b qat model is available in LM studio now. I am working with an M4 Mac mini with 16 GB RAM (GPU/CPU given URAM on Apple Silicon).

When using VSCode Copilot as a harness, I am having a problem with Copilot’s layout engine choking before it even attempts to talk to LM Studio.

Trick I am doing to resolve this:

1. Click the model dropdown at the bottom of the chat panel (where it says Gemma 4 12B (chat-completions)).

2. Temporarily switch it back to a default cloud model (like GPT-4o or Gemini).

3.Type "hi" to confirm the chat panel cleans itself up and successfully renders.

4. Once it is working normally, switch the dropdown back to your custom Gemma 4 12B endpoint.

Not sure if it's just me or you faced something similar