I've never used the "Agent Manager" of antigravity, but I've certainly ran through the huge amount of bugs it has. The roll-back-removed-files bugs, the roll-back-re-send-sends-nothing bugs and the current one workspace-memory-loss. Compared to Cursor and VSCode and even Kilo Code, this is really embarrassing for Google to release something like this too early. That said, nothing even comes close to the Google One Ultra plan's Opus 4.6 usage limits, which is the only reason i stay..
Google Ultra has a massive credit and your usage may justify it, but one can buy a decent machine with a graphic card capable of running local models for part of that workload I guess. What are your thoughts around the value for the Ultra plan?
I'm currently using the Qwen 3.5 27B dense model for local coding on sensitive stuff. But cost wise, not even local models come close to the Google One value...
You're welcome! This is based on my own MITM-proxy and the reported details from Antigravity.
It looks like this:
"usageMetadata": {
"cachedContentTokenCount": 152259,
"candidatesTokenCount": 346,
"promptTokenCount": 153855,
"totalTokenCount": 154201
}
Compared to Cursor and VSCode - Antigravity caches tokens the most of all tools I've tried, which I assume contributes to their very generous rate limits. They also run their own language server, instead of directly calling the Google APIs, so all requests are signed with signatures (Probably to avoid reselling subs as APIs which was a huge issue for them not long ago).
It's well known that the price for actual inference is much much lower than the charged API pricing. Especially after seeing the latest Nvidia keynote, at least >35x lower cost for inference on the Rubin machines. It will be an interesting race to the bottom when these hyper-scalers stop hyper-training..
And yes, Google is hosting the Anthropic models on their own TPUs now. I'm quite certain this is using their experimental Cuda-translation layer, they where very unstable and crashed a lot when they first appeared there. (Which reflected in Antigravity).
Good article. Thanks for sharing!
Thank you
I've never used the "Agent Manager" of antigravity, but I've certainly ran through the huge amount of bugs it has. The roll-back-removed-files bugs, the roll-back-re-send-sends-nothing bugs and the current one workspace-memory-loss. Compared to Cursor and VSCode and even Kilo Code, this is really embarrassing for Google to release something like this too early. That said, nothing even comes close to the Google One Ultra plan's Opus 4.6 usage limits, which is the only reason i stay..
Google Ultra has a massive credit and your usage may justify it, but one can buy a decent machine with a graphic card capable of running local models for part of that workload I guess. What are your thoughts around the value for the Ultra plan?
I'm currently using the Qwen 3.5 27B dense model for local coding on sensitive stuff. But cost wise, not even local models come close to the Google One value...
Check this: https://i.imgur.com/1loUGmX.png
Based on my current usage past 2 months.
Previously used Cursor, but every new chat is non-cached so hard that the total cost is 10x that of the "Est. API Cost" column.
Note: The Local GPU is ONLY the energy cost (Sweden) of that token usage, not including the cost of the GPU itself (4090)
That was super useful Thomas! Thanks for sharing the data. I can think of 3 reasons for this difference:
1. AI subsidies where basically big corps try to crush the competition by throwing more money at compute
2. Google's TPU may be more efficient than a gaming GPU for this particular type of workload
3. Consumer electricity prices are probably higher than what the governments offer these big techs.
What do you think? Can there be any other reason to explain the difference?
You're welcome! This is based on my own MITM-proxy and the reported details from Antigravity.
It looks like this:
"usageMetadata": {
"cachedContentTokenCount": 152259,
"candidatesTokenCount": 346,
"promptTokenCount": 153855,
"totalTokenCount": 154201
}
Compared to Cursor and VSCode - Antigravity caches tokens the most of all tools I've tried, which I assume contributes to their very generous rate limits. They also run their own language server, instead of directly calling the Google APIs, so all requests are signed with signatures (Probably to avoid reselling subs as APIs which was a huge issue for them not long ago).
Sorry, I didn't see you updated the comment.
It's well known that the price for actual inference is much much lower than the charged API pricing. Especially after seeing the latest Nvidia keynote, at least >35x lower cost for inference on the Rubin machines. It will be an interesting race to the bottom when these hyper-scalers stop hyper-training..
And yes, Google is hosting the Anthropic models on their own TPUs now. I'm quite certain this is using their experimental Cuda-translation layer, they where very unstable and crashed a lot when they first appeared there. (Which reflected in Antigravity).