smista.ai · blog

How we built a provider-agnostic LLM layer in Rust with Rig

A practical look at using Rig to build a Rust abstraction over multiple LLM providers, tools, and memory.

Christian VisintinJune 15, 20264 min read

How we built a provider-agnostic LLM layer in Rust with Rig

Every provider has a different client

One of our earliest goals for smista.ai was to support models from multiple providers.

In most projects that interface with LLMs, there is often a need to communicate with multiple providers, such as Anthropic and OpenAI, to meet the user's needs.

Once you support more than one LLM provider, the problem stops being "how do I call an API?" and becomes "how do I avoid leaking provider-specific details everywhere?"

At smista.ai, we wanted an integration that supports at least four providers: Anthropic, OpenAI, Gemini, and Ollama. We wanted this integration to be exposed through a common trait, so the rest of the router would not have to care which provider was behind a request.

Wrapping Rig's CompletionClient

Luckily for us, rig already provides a simple way to do that. With Rig, we could implement a common Agent wrapper around Rig's provider-specific CompletionClient:

pub struct Agent<C>
where
    C: CompletionClient,
{
    agent: RigAgent<C::CompletionModel>,
}

In the actual implementation, we also keep track of internal tools, so the router can distinguish between tools executed by the agent and tool calls that must be mediated by smista.ai.

impl Agent<C>
where
    C: CompletionClient,
{
    pub async fn new<S>(
        AgentArgs {
            completion_model,
            model,
            preamble,
        }: AgentArgs<C>,
    ) -> ProviderResult<Self>
    {
        let agent = completion_model
            .agent(model.clone())
            .preamble(&preamble)
            .build();
 
        Ok(Self {
            agent,
        })
}

From clients to abstract Agents

But how do we get from a connection to Gemini or Anthropic to our Agent struct?

Rig already provides provider-specific clients that can be configured with the required settings, such as API keys, base URLs, and other provider-specific options.

That allowed us to keep our own abstraction small.

For example, our Gemini model client works like this:

pub struct GeminiModel {
    agent: Agent<GeminiClient>,
}
 
impl GeminiModel {
    pub async fn new(
        /* args */
    ) -> Result<Self, ProviderError>
    {
        let api_key = /* access to auth arg */
        let client = GeminiClient::new(api_key.expose_secret())?;
 
        let agent = Agent::new(AgentArgs {
            completion_model: client,
            preamble,
            model,
        })
        .await?;
 
        Ok(Self {
            agent,
        })
    }
}

While the Anthropic client is similar, AnthropicClient is used instead. But both of them implement CompletionClient that can be used to build our Agent.

Sending a request from an Agent to the LLM

At this point, we can simply implement a function to send a request to the LLM from the agent:

/// Sends a request and awaits the full completion, returning the model's text.
pub async fn complete(&self, request: CompletionRequest) -> ProviderResult<String> {
    let CompletionRequest { messages, parameters, .. } = request;
 
    let mut history: Vec<RigMessage> = messages.into_iter().map(into_rig_message).collect();
    let Some(prompt) = history.pop() else {
        return Err(self.error(
            ProviderErrorCategory::InvalidRequest,
            "completion request has no messages",
        ));
    };
 
    let response = self
        .request_builder(prompt, &history, &parameters)
        .await?
        .send()
        .await
        .map_err(|error| {
            self.error(category_from_completion(&error), error.to_string())
        })?;
 
    let mut content = String::new();
    for item in response.choice.iter() {
        if let AssistantContent::Text(text) = item {
            content.push_str(&text.text);
        }
    }
 
    Ok(content)
}

Or even better, using stream() instead of send().

let stream = self
    .request_builder(prompt, &history, &parameters, tools, tool_choice)
    .await?
    .stream()
    .await
    .map_err(|error| {
        self.error(
        crate::error::category_from_completion(&error),
	      error.to_string(),
    )
})?;

This is, of course, an over-simplification of how it actually works. See agent.rs on smista.ai repository for a full implementation reference.

Adding user and session memory through a Rig Tool

One of the core functionalities of LLM agents is the possibility to use session and user memories. This means we must be able to save meaningful information for reuse in the current session and for the long term, including user preferences. Later, our agent must be capable of recalling memories for use.

For this purpose, we have implemented a MemoryStorage trait to provide this functionality and a Rig Tool.

To store session and user data, we have defined the MemoryStorage trait, which, on smista.ai, is later implemented as a client for SurrealDB.

First, we have defined a MemoryRecord as an entity to keep track of memories with a key to uniquely identify each piece of information:

pub struct MemoryRecord {
    /// Opaque, backend-defined handle identifying this entry. Pass it back to
    /// `forget_*` to remove exactly this entry.
    pub handle: String,
    /// Optional topic. A keyed entry upserts on its key, so a later record with
    /// the same key replaces this one; keyless entries accumulate freely.
    pub key: Option<String>,
    /// The remembered fact.
    pub content: String,
}

Then we have used this data as the core type for the MemoryStorage:

pub trait MemoryStorage: Send + Sync {
    type Error: std::error::Error + Send + Sync + 'static;
 
    fn put_user_memory(
        &self,
        key: Option<String>,
        content: String,
    ) -> impl Future<Output = Result<MemoryRecord, Self::Error>> + Send;
 
    fn forget_user_memory(
        &self,
        handle: String,
    ) -> impl Future<Output = Result<(), Self::Error>> + Send;
 
    fn get_user_memories(
        &self,
        limit: Option<usize>,
    ) -> impl Future<Output = Result<Vec<MemoryRecord>, Self::Error>> + Send;
 
    fn get_user_memory_by_key(
        &self,
        key: String,
    ) -> impl Future<Output = Result<Option<MemoryRecord>, Self::Error>> + Send;
}

The real trait then exposes the same methods for session records as well.

To collect operations on user memories, we need to use a Tool.

A tool is a typed function exposed to the model. The model does not execute it directly; it emits a structured tool call, and the host application decides whether and how to run it.

pub struct MemoryTool<S>
where
    S: MemoryStorage,
{
    /// The scoped backend this tool records into and forgets from.
    storage: Arc<S>,
}
 
impl<S> Tool for MemoryTool<S>
where
    S: MemoryStorage + 'static,
{
    const NAME: &'static str = "memory";
 
    type Error = MemoryToolError<S::Error>;
    type Args = MemoryArgs;
    type Output = String;
 
    async fn definition(&self, _prompt: String) -> ToolDefinition {
        ToolDefinition {
            name: Self::NAME.to_string(),
            description: concat!(
                "Record or forget a memory addressed by `key`.\n\n",
                "Use scope `user` for durable facts about the user that should ",
                "persist across sessions (preferences, identity, long-lived ",
                "context). Use scope `session` for working memory tied to the ",
                "current session only.\n\n",
                "Operations:\n",
                "- `record`: store `value` under `key`, replacing any existing ",
                "fact filed under the same key.\n",
                "- `forget`: remove the fact filed under `key`.\n\n",
                "You do not need to recall memories: everything recorded is ",
                "already provided to you as context at the start of the turn."
            )
            .to_string(),
            parameters: serde_json::json!({
                "type": "object",
                "properties": {
                    "op": {
                        "type": "string",
                        "enum": ["record", "forget"],
                        "description": "The operation to perform."
                    },
                    "scope": {
                        "type": "string",
                        "enum": ["user", "session"],
                        "description": "Which store to target: durable user memory or session-only working memory."
                    },
                    "key": {
                        "type": "string",
                        "description": "Topic the fact is filed under; reuse the same key to replace or forget it."
                    },
                    "value": {
                        "type": "string",
                        "description": "The fact to record. Required for `record`, ignored for `forget`."
                    }
                },
                "required": ["op", "scope", "key"]
            }),
        }
    }
 
    async fn call(&self, args: MemoryArgs) -> Result<String, Self::Error> {
        let MemoryArgs {
            op,
            scope,
            key,
            value,
        } = args;
 
        match op {
            MemoryOp::Record => {
                let value = value.ok_or(MemoryToolError::MissingValue)?;
                match scope {
                    MemoryScope::User => {
                        self.storage
                            .put_user_memory(Some(key.clone()), value)
                            .await?;
                    }
                    MemoryScope::Session => {
                        self.storage
                            .put_session_memory(Some(key.clone()), value)
                            .await?;
                    }
                }
                Ok(format!("Recorded {} memory \"{key}\".", scope.label()))
            }
            MemoryOp::Forget => match self.handle_for(scope, key.clone()).await? {
                Some(handle) => {
                    match scope {
                        MemoryScope::User => self.storage.forget_user_memory(handle).await?,
                        MemoryScope::Session => self.storage.forget_session_memory(handle).await?,
                    }
                    Ok(format!("Forgot {} memory \"{key}\".", scope.label()))
                }
                None => Ok(format!("No {} memory found for \"{key}\".", scope.label())),
            },
        }
    }
}

So definition tells the LLM when the tool should be called, what it does, and how to pass data to it, while call is the function executed whenever the Tool is triggered.

Notice that the tool does not expose a recall operation. This is intentional: memory retrieval is handled by the host before the turn starts, and the resulting memories are appended to the agent preamble. The model can record or forget memories, but it does not decide how memory retrieval works.

Linking Memory into the Agent

At this point, we can pass our MemoryStore implementation to the Agent and link both the Tool and provide the user's records to it:

pub async fn new<S>(
    AgentArgs {
        completion_model,
        descriptor,
        preamble,
        storage,
    }: AgentArgs<C, S>,
) -> ProviderResult<Self>
where
    S: MemoryStorage + 'static,
{
    // load preamble from memory storage
    let memory_preamble =
        load_memories_preamble(storage.as_ref()).await?;
 
    // load memory tool
    let memory_tool = MemoryTool::new(storage.clone());
 
    // build agent
    let builder = completion_model
        .agent(model.clone())
        .preamble(&preamble)
        .tool(memory_tool);
    // `preamble` replaces the system prompt, so the memory preamble must be
    // appended rather than set or it would wipe the base preamble.
    let builder = match memory_preamble {
        Some(memories) => builder.append_preamble(&memories),
        None => builder,
    };
    let agent = builder.build();
 
    // snapshot the names of the tools the agent executes itself, so
    // completions can tell internal tool calls apart from router-mediated ones
    let internal_tools = agent
        .tool_server_handle
        .get_tool_defs(None)
        .await
        .map_err(|error| {
            crate::error::provider_error(
                ProviderErrorCategory::Unknown,
                provider.clone(),
                Some(model.clone()),
                format!("failed to enumerate agent tools: {error}"),
            )
        })?
        .into_iter()
        .map(|tool| tool.name)
        .collect();
 
    Ok(Self {
        agent,
        descriptor,
        internal_tools,
    })
}

Conclusion

Rig did not remove the need for our own abstraction, but it moved the provider-specific complexity to the edges.

Each provider still has its own client, authentication, model names, and configuration. But once a client implements CompletionClient, the rest of smista.ai can build the same Agent, attach the same internal tools, inject the same memory preamble, and handle completions through the same flow.

That is exactly the kind of boundary we wanted: provider-specific where necessary, provider-agnostic everywhere else.

You can find the full implementation in the smista-providers crate on the smista.ai repository.

Reference: