4 Comments
User's avatar
Swarup Karavadi's avatar

Nice article Alex. As always 🙌🏼 A lot of these multi agent workflows guzzle up tokens - chatty interfaces between agents means the context builds up real quick with each turn. I observe that many companies are not yet at a point where tokenomics is a concern but we will get there very soon. Would love to hear your thoughts on the topic.

Alex Ewerlöf's avatar

Thank you Swarup. The economy's rush to rub AI on every surface is a classic example of Tech Bet (https://blog.alexewerlof.com/p/tech-bet) boosted by emotions that are manipulated by AI vendors at scale. Obviously they have an intention to sell more tokens to compensate for the massive money that was poured into AI (e.g. see point 7&10 here: https://www.linkedin.com/posts/lennyrachitsky_my-biggest-takeaways-from-boris-cherny-head-activity-7430373428601208832-YESP).

Back to your query: personally I bet on local AI (https://blog.alexewerlof.com/p/ai-topology) for cost saving but more importantly privacy. People don't realize how much information they're pouring to AI providers (not just the company that offer the AI product).

I'm experimenting with multi-agent systems that run completely locally (on a 2nd hand Mac Mini M4 I bought off Blocket) and although the quality isn't as good as SOTA models for complex tasks, the simple tasks seem to yield good ROI. Unfortunately I don't think anyone has cracked the code yet, but maybe the Chinese AI researchers release very capable models in order to crash US AI market. The larger superpower conflicts may actually lead to better open-weight models. That's a speculation.

Swarup Karavadi's avatar

Neat. I’m thinking that on top of localisation, there might be a pendulum swing back to more specialised models for specialised tasks - instead of throwing an LLM equivalent of the kitchen sink at every problem - also a speculation

Alex Ewerlöf's avatar

That's good thinking. I'm not an expert in this area, but I've wondered the same thing and the answer I got is that there's a basic size that's needed for a model to be able to speak fluently (the most languages, the larger the model), and we can work around the deficiencies with other techniques. In fact 2 articles after this one is specifically about 4 techniques to create specialists from SLM using RAG, SKILL, MCP, and RLM. I'll share the draft with you as soon as it's worth your time.