smista.ai · blog

I don't want to choose the Model. I want to Dispatch the Job.

Every time I switch models, I'm not just changing a dropdown. I'm changing cost, latency, reasoning behaviour, provider UX, context, skills, and sometimes the entire workflow. smista.ai is my attempt to turn AI model selection into a routing problem: local-first, policy-driven, and built for developer workflows.

Christian VisintinMay 27, 20268 min read

I don't want to choose the Model. I want to Dispatch the Job.

Choosing the right model is annoying

After using tools like Claude Code and Codex for months, one thing kept annoying me: choosing the right model is not just about quality. It affects cost, latency, reasoning behaviour, provider UX, available tools, context, and even the level of control I have over the workflow.

Choosing the right model is fundamental for different reasons:

Not all the tasks we do require heavy models like Claude Opus: simple tasks can be handled by Sonnet, and it can actually outperform Opus because it spends less time reasoning on them.
We consume more tokens than necessary.
Overthinking may produce worse results: having heavy models reasoning on simple tasks often leads to the implementation being overengineered because the model overthinks the task.

But not just choosing a model is important for a developer, but the provider too, for example, as a Rust developer, I've noticed that:

Claude code is great for planning, but sometimes lazy in writing code.
Codex is very good at writing code and has an excellent view of the code across the entire workspace, but it's generally worse at planning.

And don't forget the effort: most of us keep the effort the same for each task, but that is a problem, too. I remember that when Claude released Opus 4.6, it automatically switched to effort Extra-High, which caused me a lot of pain, because the model became too slow even for committing code.

If this thing is not a problem for you because you switch all the time, good for you.

But for most developers, this is a hassle, for many different reasons:

As humans, we are lazy.
Under pressure to finish a task, we don't plan each execution step.
Plan execution requires different efforts, and this can't be done within the same task.
Changing the model means losing the entire context.
Changing providers is even worse: you have to ensure that you also have the same skills and preferences.

Local models are great, but UX lacks

I haven't even talked about local models yet.

Recently, I was interested in setting up local models with Ollama because I believe local models can handle simpler tasks, reducing the need to rely on remote models that cost tokens.

The issue is, while Ollama is great, the user experience for someone coming from tools like Codex or Claude code is very poor.

Indeed, local models won't provide you with a fancy tool with skills, reading files, etc. These tools are very rudimentary: you communicate directly with the models, but they cannot access your file system automatically. So if you want to read the files of a project to analyse it, you have to manually paste their contents.

So you can easily understand why these models are not mainstream at all, even if they were to become so. It's easy to think that, in the future, mainstream models will become more expensive, and that relying on local models for simpler tasks may become a necessity.

Dispatching work instead of picking models

So I started wondering why we don't have a tool to handle all of these issues.

Something that can infer which model and provider should handle a given task and can also use local models, providing an experience similar to Codex.

Here came my idea to start designing such a tool.

I started designing this tool with a simple vision in mind: Deterministic model routing for agent workflows.

A CLI that routes each phase of an AI workflow to the most suitable model using deterministic and configurable policies.

So, on one hand, I want something that provides a user experience similar to mainstream CLI code agents, and on the other hand, I want it to be able to choose, based on my preferences, which model, provider, and with what effort to execute the task.

Why local-first matters

In designing such a tool, I couldn't help but think of a local-first approach.

I wanted this tool to be usable by anyone, so because of this, I wanted it to be:

source-available
usable on your local machine
or if preferred, shared in a local network

Open to everybody

In this vision of openness to everybody, there is, of course, the issue of local models.

Performant local models require substantial RAM. Most of us don't have such computers at home, so I wanted to be local-first while also being open to providing SaaS to unlock the full potential of this tool.

Of course, this SaaS would not be day one, but it would be in the near future for sure.

Policies over preferences

Most AI tools treat model selection as a preference.

You pick a default model, maybe a provider, maybe an effort level, and then you manually change them whenever the task feels different enough.

But that does not scale well for agent workflows.

A developer workflow is not a single task. It is a sequence of phases: planning, editing, testing, reviewing, documenting, and committing.

Each phase may have different requirements. Some need stronger reasoning. Some need faster execution. Some should run locally. Some should never leave your machine. Some can use a cheap remote model. Some deserve the expensive one.

That is why smista.ai is built around policies instead of preferences.

A preference says:

Use this model by default.

A policy says:

For planning, use this provider with high reasoning effort. For simple edits, prefer a local model. For documentation, use a cheaper remote model. For sensitive files, stay local. For large refactors, use the strongest available model.

The difference is subtle, but important.

Preferences are manual choices. Policies are executable decisions.

With policies, the routing logic becomes explicit, configurable, reviewable, and deterministic. You are not hoping the tool picks the right model. You are describing how work should be dispatched.

The goal is not to hide model selection behind magic. The goal is to make model selection predictable and automatable.

What I'm building with smista.ai

And this is where smista.ai comes in.

smista.ai is my attempt to build the tool I wanted while working with AI code agents every day.

At its core, smista.ai is a CLI and a router that sits between the developer and various AI providers, including both mainstream remote models and local models.

The experience should feel familiar if you already use tools like Claude Code or Codex: you describe the task, the agent works on your project, and you stay in control of what gets changed.

The difference is that smista.ai does not ask you to manually choose the model, provider, and effort level every time.

Instead, it lets you define deterministic routing policies that determine how each task is dispatched.

And the configuration should stay simple. In most cases, that means a few lines of TOML:

[[routing.rules]]
name = "plan with strongest reasoning model"
priority = 10
intent = "plan"
effort = "high"
model = "anthropic/claude-opus-4.7"
fallbacks = ["anthropic/claude-sonnet"]
 
[[routing.rules]]
name = "use local model for changelog skill"
priority = 20
skill = "changelog"
effort = "low"
model = "ollama/qwen2.5-coder"
fallbacks = ["openai/gpt-5.5-mini"]
 
[[routing.rules]]
name = "auth code uses Claude Sonnet"
priority = 30
intent = "edit"
effort = "medium"
paths = ["src/auth/**"]
model = "anthropic/claude-sonnet"
fallbacks = ["openai/gpt-5.5-thinking"]

After that, the workflow should feel similar to the one developers already use with modern code agents.

The difference is that routing is explicit.

You can see why a model was selected, which rule matched, which files were included, which files were excluded, and what the execution is expected to cost.

That makes the workflow:

faster
safer
traceable
and cheaper

The golden path

Let me show the golden workflow just to make you understand how easy it is:

A user should be able to run:

smista refactor the auth middleware

And before executing anything sensitive, smista should show something like:

Detected task: edit
Selected model: Claude Sonnet
Matched rule: edits under src/auth/** use Claude Sonnet
Included context:
  - src/auth/middleware.rs
  - current git diff
  - SMISTA.md
Excluded context:
  - .env
  - secrets.toml
Estimated cost: $0.08-$0.14
Required permissions:
  - file write approval

This is the important part: the developer is not blindly trusting an agent.

They are dispatching work through a system they can inspect, configure, and control.

I'm building this because I need it

I've always believed that you should build what you need, not what merely sounds cool. And smista.ai is what I need today.

I want to stop thinking about which model, provider, or effort level I should use every time I start a task.

I want to define my workflow once, make the routing explicit, and then dispatch the job.

I want local models to be part of the same developer experience as remote models, not a separate, worse workflow.

I want a system that can be fast when the task is simple, careful when the task is risky, local when the context is sensitive, and powerful when the work actually deserves it.

Most of all, I want AI coding tools to feel less like choosing from a list of models and more like running a development workflow I can trust.

That is what I'm trying to build with smista.ai.