Discussion about this post

User's avatar
Sandeep's avatar

Good read! Typo maxInputTokens -> maxOutputokens

andy's avatar

when I tried Q8_0 for K cache and Q4_0 V cache, Gemma 4 26B A4B QAT failed to load. I've been running Q8_0 for both K and V cache settings for a while now and it seems solid, and much less RAM usage than the default F16

6 more comments...

No posts

Ready for more?