smista.ai · blog

Your company's AI bill is not a model problem. It's a routing problem

Most companies don't need fewer powerful models. They need to stop using them for everything.

Christian VisintinJune 4, 20264 min read

Your company's AI bill is not a model problem. It's a routing problem

The expensive default

With the rising usage of coding agents in the enterprise ecosystem, costs have become a concern.

Many companies now give developers access to coding agents through Claude, Codex, or other API-based tools.

The problem is that companies have limited control over how those models are used.

Most developers are not expected to track every available model, its strengths, its price, and its trade-offs.

And the default option is often the most expensive one. Because of that, many companies are raising awareness among their employees about “saving tokens” to choose the best model for each task.

Best model is not a strategy

The issue with relying on individual judgment is that it's unreliable and hard to follow:

A developer may forget to switch the model.
A developer should check frequently which model they're using.
There is not always a shared knowledge about different models within the company.
A developer in a rush may pick the most capable model just to avoid thinking about the choice.

Cost control is not enough

So, a common solution adopted by companies is to implement cost-control policies for LLM platforms by setting a monthly token budget ceiling.

This strategy, though, tends to punish the developer by preventing them from using a model once they exceed a certain threshold, rather than improving the workflow.

One developer may have mistakenly used a powerful model on a simpler task, wasted their tokens, and now be unable to complete another task that would require that model instead.

At the end of the day, the developer has run out of tokens, the company has lost money, and the job is unfinished.

Routing turns model choice into infrastructure

The solution is not in relying on the developer's good faith, or on cost ceilings, but in adopting deterministic routing policies for your tasks.

Imagine a developer asks an AI agent to update an authentication flow and add tests.

The planning step can go to a stronger reasoning model.
The implementation can go to a fast coding model.
The changelog can go to a cheaper local model.
The review of src/auth/** can stay local, or use only models approved for security-sensitive code.

From the developer's perspective, this is still one workflow.

From the company's perspective, it is no longer an uncontrolled model usage.

It is policy-driven execution.

A router, which is able to split the task and dispatch small jobs to the best model for that action, is an easy win on all fronts:

It saves money and time by preventing expensive models from overengineering simple tasks.
It allows lighter models to handle simple coding tasks.
It prevents the leak of sensitive documentation by reading it on local models.

This is exactly what smista.ai is about. Providing a solid and reliable router, running everywhere and easy to set up, capable of dispatching all your jobs to the model best suited for it.

Different tasks have different needs

One of the biggest advantages of a routing layer is the ability to dispatch jobs across different LLM providers.

It's well known that each LLM provider does things differently; some teams may prefer Anthropic models for planning, OpenAI models for implementation, and local models for privacy-sensitive reviews.

So far, thinking about sharing context between providers has been about writing the context somewhere and the other provider reading it from the same location. If you ask me, this is extremely boring.

A routing layer, on the other hand, can handle this without requiring developers to manually move context.

You can now start to see the full vision of what smista.ai is capable of.

Every company needs different routing policies

At smista.ai, we don't want to force anyone to follow conventions about which model or provider should handle a given task.

Everybody has their own ideas, preferences, and needs.

For that reason, smista.ai allows the user to define detailed routing policies.

All routing policies can define intents, paths, cost limits, allowed tools, and a model for handling each step of a task.

All of this, by just defining rules in a TOML file. The advantage of this is that the policies can be understood by anyone, shared across the teams, and versioned by the administrators.

Also, workspace-level rules can never override organisation-level rules, allowing the IT department to prevent users from circumventing them.

[[routing.rules]]
name = "plan with strongest reasoning model"
priority = 10
intent = "plan"
model = "openai/gpt-5.5-thinking"
fallbacks = ["anthropic/claude-sonnet"]
 
[[routing.rules]]
name = "use local model for changelog skill"
priority = 20
skill = "changelog"
model = "ollama/qwen2.5-coder"
fallbacks = ["openai/gpt-5.5-mini"]
 
[[routing.rules]]
name = "auth code uses Claude"
priority = 30
intent = "edit"
paths = ["src/auth/**"]
model = "anthropic/claude-sonnet"
fallbacks = ["openai/gpt-5.5-thinking"]
 
[[routing.rules]]
name = "review security-sensitive code locally"
priority = 5
effort = "low"
intent = "review"
paths = ["src/crypto/**", "src/auth/**"]
local_only = true
model = "ollama/qwen2.5-coder"

Conclusion

The goal is not to make developers think about model choice all day.

The goal is to let them work normally, while the company defines clear, versioned, enforceable policies for cost, privacy, and performance.

That is what smista.ai is building: a local-first routing layer for AI developer workflows.