HR / EN
Case study · C# · LLM tooling

How we connected an LLM to our own database (without RAG)

The story of how Bajs — analytics for Nextbike Zagreb — understands natural-language questions and returns exact numbers from the database, without making things up.

On this page
  1. The problem
  2. What we didn't want
  3. First attempt: attributes on pages
  4. Second attempt: [LlmTool] on methods
  5. Semantic parameter types (all 45)
  6. ToolRegistry — reflection at startup
  7. Execution: ToolExecutor
  8. Which LLM we use and why
  9. What "tool calling" actually is
  10. The real system prompt (from code)
  11. Three phases: Sonnet → Haiku → Flash
  12. Multi-round loop: planner and answerer
  13. "It can't even add" — the Calculate tool
  14. Where we are now
  15. What's left
  16. Recipe for your project

The problem

Bajs (bajs.informacija.hr) has a pile of bike-sharing analytics: trip counts, hourly averages, emptiest stations, anomalies, heatmaps. All of it is available across a dozen web pages full of charts and tables.

The problem: a user who wants to know "how many rides were there last Tuesday" has to know where to click, open the right page, find the right table, read off the number. And someone visiting the site for the first time just gets lost.

We wanted them to just type a question and get an answer. No clicking.

What we didn't want

A small aside about the database: Bajs's analytics run over our own strictly binary format, designed with two goals — that any data point can be fetched in milliseconds, and that the memory footprint stays extremely small. Everything lives in RAM in structures that are already shaped for queries (no JSON, no per-query deserialization). That's a story for a separate article, but the important thing here is: the analytical functions don't run SQL, they walk memory directly — which is exactly why it makes sense for the LLM to call methods, not write queries.

What we wanted: have the LLM call our existing analytical functions. They're tested, we know they return correct numbers, and the web uses them directly. DRY.

First attempt: attributes on pages

The first attempt was more naive. Bajs already has a dozen analytical pages that drill down into data — each specialized for one slice (activity by hour, top stations, route details, anomalies per day). We tagged them with attributes (title, description, subtitle, tags), and the LLM picked which page best matched the question based on those metadata. Then we'd render that page and feed the LLM its text as context.

Problems:

Conclusion: we need methods, not pages.

Second attempt: [LlmTool] attributes on methods

All of Bajs's analytical code is already in one class: StatsCalculator. It's a partial class spread across a dozen files (StatsCalculator.Reports.cs, StatsCalculator.Stations.cs, etc.) with methods like GetTopStations, GetLongestTrips, GetPeakHour.

We added a single attribute:

[AttributeUsage(AttributeTargets.Method)]
public class LlmToolAttribute : Attribute
{
    public string Description { get; }
    public LlmToolAttribute(string description) => Description = description;
}

Intentionally minimal — just a description. The method name is also the tool name.

Example:

[LlmTool("Longest rides by duration. " +
         "Use for 'longest rides', 'longest ride this week'.")]
public LongestTripsResult GetLongestTrips(string period, int limit)
{
    // existing implementation, unchanged
}

Zero new code. We just added an attribute above methods that already existed and worked for the web.

Semantic parameter types

GetLongestTrips takes a period and a limit. But the LLM doesn't know what "period" is — is it a number of days? An ISO date? An enum?

This was a key part of the design. We made an [LlmParam] attribute with a semantic type:

public enum SemanticType
{
    Period,          // "7d", "today", "this_week", "prev_week"...
    Date,            // "2026-04-08"
    DateRange,       // "2026-04-01..2026-04-08"
    StationNumber,   // int, e.g. 21331
    StationName,     // "Bundek", "Jarun"
    Count,           // how many rows to return
    Latitude,
    Longitude,
    DurationMinutes,
    BikeCount,
    TripCount,
    HourOfDay,
    DayOfWeek,
    AreaName,
    // ...45 types total
}

public class LlmParamAttribute : Attribute
{
    public SemanticType Type { get; }
    public string Description { get; }
    public bool Required { get; }
}

Now the method looks like this:

[LlmTool("Longest rides by duration. " +
         "Use for 'longest rides', 'longest ride this week'.")]
public LongestTripsResult GetLongestTrips(
    [LlmParam(SemanticType.Period, "Time period")]
    string period,
    [LlmParam(SemanticType.Count, "How many rides to return")]
    int limit)

Why this helps:

There's also [LlmFixedParam] for parameters the LLM is not allowed to touch (e.g. injected services or fixed values).

ToolRegistry — reflection at startup

When the application starts, ToolRegistry reflects over StatsCalculator, finds every method with an [LlmTool] attribute, reads its parameters, and builds an internal registry:

public ToolRegistry(params Type[] typesToScan)
{
    foreach (Type type in typesToScan)
    {
        foreach (MethodInfo method in type.GetMethods(...))
        {
            LlmToolAttribute toolAttr = method.GetCustomAttribute<LlmToolAttribute>();
            if (toolAttr == null) continue;

            // pick up method description + all [LlmParam] params with types
            // store in _tools dictionary
        }
    }
    Log.Info("ToolRegistry: registered " + _tools.Count + " tools");
}

Registered in the DI container:

builder.Services.AddSingleton<ToolRegistry>(
    new ToolRegistry(typeof(StatsCalculator)));

The registry then has two methods for export:

Execution: ToolExecutor

ToolExecutor takes a JSON plan from the LLM, parses the tool name and arguments, finds the method in the registry, converts JSON arguments into real .NET types (using the semantic type as a hint), and calls the method via reflection. The result is serialized back to JSON.

// LLM returns:
{
  "tools": [
    { "name": "GetTopStations", "args": { "period": "7d", "limit": 5 } }
  ]
}

// ToolExecutor:
// 1. Find GetTopStations in the registry
// 2. Parse period="7d" as SemanticType.Period → string "7d"
// 3. Parse limit=5 as SemanticType.Count → int 5
// 4. statsCalculator.GetTopStations("7d", 5)
// 5. Serialize result → JSON

Which LLM we use and why

This is the section with the most surprises. We started with "Claude, of course" and ended up somewhere completely different.

What "tool calling" actually is

For this whole pattern to work, the LLM has to do exactly one thing: read the tool descriptions, understand what each does, pick the right one for the user's question, and return valid JSON with arguments. That's it. That's "tool calling" as an LLM capability.

There are two ways to pull it off:

Native tool_use API. Anthropic, OpenAI, Google, and Mistral all expose a structured endpoint where, alongside your prompt, you send a tools array with a schema for each one. The LLM replies with special tool_use blocks that the SDK parses for you. Clean, but provider-specific — each vendor has its own format.

Plain JSON-in-text prompting. You put the tool schema in the system prompt, say "return JSON of this shape", and parse the text yourself. Works with any LLM that can follow instructions.

We went with the second. Simpler, no provider lock-in, and it turns out modern models are so good at structured output that the native API gains you next to nothing. Our call to Anthropic looks like this — notice there is no tools parameter, just system + user:

string requestBody = JsonSerializer.Serialize(new
{
    model = _model,
    max_tokens = 500,
    system = systemPrompt,       // tool schema is inside this string
    messages = new[] { new { role = "user", content = userMessage } }
});

HttpRequestMessage request = new HttpRequestMessage(
    HttpMethod.Post, "https://api.anthropic.com/v1/messages");
request.Headers.Add("x-api-key", _apiKey);
request.Headers.Add("anthropic-version", "2023-06-01");

The LLM returns plain text content starting with {"tools":[...]}. We parse it as regular JSON. No SDKs, no provider-specific gymnastics. The same wrapper works for Claude, Gemini, GPT, Mistral, Qwen — anything OpenRouter exposes.

The real system prompt (from code)

Enough abstract talk. Here's the literal planning system prompt from ChatService.cs (trimmed for readability, original is Croatian):

"You are the planner for the BAJS Zagreb bike-sharing analytics system.
You receive a user question and a list of available tools.
Your task: return ONLY JSON with the list of tools to call to answer the question.

TODAY'S DATE: 2026-04-10 (Friday)
When the user says 'yesterday', 'last week', 'this month', etc.,
compute the exact date from today above.
Current year is 2026. NEVER use 2024 or 2025.

Response format (ONLY JSON, no text):
{"tools":[{"name":"ToolName","args":{"param1":"value1"}}]}

Rules:
- Use ONLY tools from the list below
- If the question doesn't need data (e.g. a greeting), return {"tools":[]}
- For period parameters use: 7d, 14d, 28d, 30d, 90d,
  this_week, prev_week, today, yesterday
- If the user doesn't specify a period, default to 7d
- Maximum 5 tools per plan
- ALWAYS try to use a tool, even if unsure —
  a partial answer is better than none
- For ANY summation, average, or comparison of numbers
  use the Calculate tool — NEVER compute it yourself
- Pick tools strictly from their descriptions below
- When the user asks about a station, never assume it's the exact
  name — search for it first

Available tools:
[... JSON schema of all 26 tools is pasted here ...]"

A few of those rules come directly from pain:

Every one of those rules was written after some model tripped on a benchmark question. The system prompt grows over time, like a "never again" list.

Three phases: Sonnet → Haiku → Flash

Because we're doing plain JSON prompting, we can swap models trivially. One config field, no rebuild. Here's what we went through:

Phase 1 — Claude Sonnet direct. Quality, works out of the box. But expensive and slow for what we're doing. Average latency 3-4 seconds per question, high cost per call. For an interactive UI where the user expects "instant", Sonnet isn't the right fit. For complex agents, sure; for analytical questions, no.

Phase 2 — Claude Haiku. Much faster, cheaper, quality great for tooling. Seemed like the sweet spot. Then we multiplied the cost by the number of rounds per question: every round sends the entire 25+ tool schema (input tokens) plus accumulated results. An average question with 4-5 rounds was coming out around $0.50. Ten questions a day = $5, a hundred = $50 — not sustainable for a small project.

Phase 3 — Google Gemini 2.0 Flash (via OpenRouter). This is where the surprise happened. Gemini Flash is:

What it looks like in practice: since switching to Flash, cost has stopped being a factor. Haiku: $50 for 100 questions. Flash: $1 for 100 questions. The difference isn't quality — both models pass almost every question in our benchmark. The difference is that Google, which controls the entire Gemini stack in-house, could price tool-use scenarios aggressively low. Anthropic doesn't have that luxury.

We stayed on Flash as default, with Claude as a fallback if OpenRouter goes down. The pluggability we got for free by choosing JSON-in-text over native tool_use paid off the moment we wanted to switch providers.

The entire wrapper is a few-line interface:

public interface ILlmProvider
{
    LlmResult Complete(string systemPrompt, string userMessage);
}

// Registration depends on config:
builder.Services.AddSingleton<ILlmProvider>(sp => {
    string provider = config["Search:Provider"];
    return provider switch
    {
        "anthropic"  => new AnthropicLlmProvider(...),
        "openrouter" => new OpenRouterLlmProvider(...),
        _ => throw new Exception("Unknown provider")
    };
});

Each provider implements Complete(string systemPrompt, string userMessage) and that's it. ChatService neither knows nor cares which model is underneath.

Multi-round loop: planner and answerer

This part was harder. A single question is not necessarily a single tool call. For example:

"Compare how many rides Bundek had last week vs this week"

This needs:

  1. GetStationDetail("Bundek") → we get the station
  2. GetTripCountRange(stationNumber=X, "prev_week")
  3. GetTripCountRange(stationNumber=X, "this_week")

The LLM has to plan the order, look at intermediate results, and decide what next.

Solution: two different system prompts and a loop of up to 10 rounds.

Round 1 — planning

The LLM receives the user question + JSON schema of all tools + an instruction: "Return JSON listing the tools to call now. Maximum 5 per round. If nothing more is needed, return empty."

{
  "tools": [
    { "name": "GetStationDetail", "args": { "name": "Bundek" } }
  ]
}

We execute the tool, get the result, add it to allResults.

Round 2+ — continuation

Now the LLM receives: the question + accumulated results + an instruction "do we need more tools or is this enough?"

{
  "tools": [
    { "name": "GetTripCountRange", "args": { "stationNumber": 21331, "period": "prev_week" } },
    { "name": "GetTripCountRange", "args": { "stationNumber": 21331, "period": "this_week" } }
  ]
}

The loop terminates when:

Final step — the answer

A new LLM call with the answering prompt: "Based on these results (JSON), formulate a natural-language answer. Use only numbers from the data. Don't mention tools or APIs."

"It can't even add" — the Calculate tool

One funny moment from development: the LLM could fetch numbers just fine, but when it came time to sum them or compute an average, it would often miss. It would return text like "around 1500 rides total" when the actual number was 1823.

Solution: we added a calculator as a tool.

[LlmTool("Calculator — sum, average, min, max of a number series. " +
         "ALWAYS use this tool instead of computing manually.")]
public CalculateResult Calculate(
    [LlmParam(SemanticType.None, "Operation: 'sum', 'avg', 'min', 'max', 'count'")]
    string operation,
    [LlmParam(SemanticType.None, "Array of numbers")]
    double[] values)

Now the LLM, instead of doing arithmetic itself, first calls GetDailySnapshotAggregates for a period, then Calculate(operation="sum", values=[...]). Accurate numbers, every time.

Moral: we gave the AI a calculator, like you'd give a kid one. And it works.

Where we are now

What's left


If you want to apply this in your project

The recipe is surprisingly simple:

  1. Find your StatsCalculator. The class (or classes) where your database-querying methods live.
  2. Write a small [LlmTool] attribute. 10 lines of code.
  3. Write [LlmParam] with an enum of semantic types you use (dates, IDs, enums, counts). You can start with 5 types and grow from there.
  4. Annotate the existing methods. You don't change the implementation, just add attributes.
  5. Build a ToolRegistry that uses reflection to pick up all annotated methods.
  6. Build an LLM wrapper (interface + one provider to start, e.g. OpenRouter).
  7. Multi-round loop — plan prompt, continuation prompt, answer prompt. Three strings.
  8. A debug UI — critical for development, without it you're iterating blindly.

No RAG. No vector databases. No LangChain. Plain C# and reflection.

The whole pattern fits in a few hundred lines of code: the attribute, the parameter attribute, the enum of semantic types, a registry that reflects over the class, an executor that calls methods, an LLM wrapper, and a multi-round orchestrator. We're not publishing the implementation as a package, but every important piece is described in this article — enough to build your own version in a few hours.