Case study · C# · LLM tooling

How we connected an LLM to our own database (without RAG)

The story of how Bajs — analytics for Nextbike Zagreb — understands natural-language questions and returns exact numbers from the database, without making things up.

Stack: C# · .NET 8 · Gemini 2.0 Flash · Claude · Pattern: attributes + reflection

On this page

The problem
What we didn't want
First attempt: attributes on pages
Second attempt: [LlmTool] on methods
Semantic parameter types (all 45)
ToolRegistry — reflection at startup
Execution: ToolExecutor
Which LLM we use and why
What "tool calling" actually is
The real system prompt (from code)
Three phases: Sonnet → Haiku → Flash
Multi-round loop: planner and answerer
"It can't even add" — the Calculate tool
Where we are now
What's left
Recipe for your project

The problem

Bajs (bajs.informacija.hr) has a pile of bike-sharing analytics: trip counts, hourly averages, emptiest stations, anomalies, heatmaps. All of it is available across a dozen web pages full of charts and tables.

The problem: a user who wants to know "how many rides were there last Tuesday" has to know where to click, open the right page, find the right table, read off the number. And someone visiting the site for the first time just gets lost.

We wanted them to just type a question and get an answer. No clicking.

What we didn't want

RAG (Retrieval-Augmented Generation): scan documents, stuff them into a vector database, fill the prompt with relevant text. That works for documentation, not structured data.
A prompt with pre-baked statistics: "here's a summary for today, answer from this". The LLM would often miss, and we'd have to regenerate the summary every minute.
Letting the LLM write SQL: dangerous and slow, and besides, our database isn't even SQL.

A small aside about the database: Bajs's analytics run over our own strictly binary format, designed with two goals — that any data point can be fetched in milliseconds, and that the memory footprint stays extremely small. Everything lives in RAM in structures that are already shaped for queries (no JSON, no per-query deserialization). That's a story for a separate article, but the important thing here is: the analytical functions don't run SQL, they walk memory directly — which is exactly why it makes sense for the LLM to call methods, not write queries.

What we wanted: have the LLM call our existing analytical functions. They're tested, we know they return correct numbers, and the web uses them directly. DRY.

First attempt: attributes on pages

The first attempt was more naive. Bajs already has a dozen analytical pages that drill down into data — each specialized for one slice (activity by hour, top stations, route details, anomalies per day). We tagged them with attributes (title, description, subtitle, tags), and the LLM picked which page best matched the question based on those metadata. Then we'd render that page and feed the LLM its text as context.

Problems:

A page mixes presentation and data. The LLM gets HTML full of colors and icons, and has to pull numbers out of the noise. Too much signal to filter.
Granularity is too coarse. One page = one answer. It can't combine two things ("compare last week and this week" = two pages, no joining).
Parameters are awkward. If the user asks about a specific station or period, the URL has to have query params, and the LLM has no idea what's allowed.

Conclusion: we need methods, not pages.

Second attempt: `[LlmTool]` attributes on methods

All of Bajs's analytical code is already in one class: StatsCalculator. It's a partial class spread across a dozen files (StatsCalculator.Reports.cs, StatsCalculator.Stations.cs, etc.) with methods like GetTopStations, GetLongestTrips, GetPeakHour.

We added a single attribute:

[AttributeUsage(AttributeTargets.Method)]
public class LlmToolAttribute : Attribute
{
    public string Description { get; }
    public LlmToolAttribute(string description) => Description = description;
}

Intentionally minimal — just a description. The method name is also the tool name.

Example:

[LlmTool("Longest rides by duration. " +
         "Use for 'longest rides', 'longest ride this week'.")]
public LongestTripsResult GetLongestTrips(string period, int limit)
{
    // existing implementation, unchanged
}

Zero new code. We just added an attribute above methods that already existed and worked for the web.

Semantic parameter types

GetLongestTrips takes a period and a limit. But the LLM doesn't know what "period" is — is it a number of days? An ISO date? An enum?

This was a key part of the design. We made an [LlmParam] attribute with a semantic type:

public enum SemanticType
{
    Period,          // "7d", "today", "this_week", "prev_week"...
    Date,            // "2026-04-08"
    DateRange,       // "2026-04-01..2026-04-08"
    StationNumber,   // int, e.g. 21331
    StationName,     // "Bundek", "Jarun"
    Count,           // how many rows to return
    Latitude,
    Longitude,
    DurationMinutes,
    BikeCount,
    TripCount,
    HourOfDay,
    DayOfWeek,
    AreaName,
    // ...45 types total
}

public class LlmParamAttribute : Attribute
{
    public SemanticType Type { get; }
    public string Description { get; }
    public bool Required { get; }
}

Now the method looks like this:

[LlmTool("Longest rides by duration. " +
         "Use for 'longest rides', 'longest ride this week'.")]
public LongestTripsResult GetLongestTrips(
    [LlmParam(SemanticType.Period, "Time period")]
    string period,
    [LlmParam(SemanticType.Count, "How many rides to return")]
    int limit)

Why this helps:

The LLM knows the format. In the system prompt we explain what each SemanticType means. Period = one of "7d / 14d / today / yesterday / this_week / prev_week / 2026-04-01". The LLM doesn't invent formats.
Validation is centralized. When we execute a tool call, we know what each parameter must be, can parse it, and report errors before even calling the method.
Descriptions stay in the code. No separate JSON schema file that drifts from the implementation.

There's also [LlmFixedParam] for parameters the LLM is not allowed to touch (e.g. injected services or fixed values).

ToolRegistry — reflection at startup

When the application starts, ToolRegistry reflects over StatsCalculator, finds every method with an [LlmTool] attribute, reads its parameters, and builds an internal registry:

public ToolRegistry(params Type[] typesToScan)
{
    foreach (Type type in typesToScan)
    {
        foreach (MethodInfo method in type.GetMethods(...))
        {
            LlmToolAttribute toolAttr = method.GetCustomAttribute<LlmToolAttribute>();
            if (toolAttr == null) continue;

            // pick up method description + all [LlmParam] params with types
            // store in _tools dictionary
        }
    }
    Log.Info("ToolRegistry: registered " + _tools.Count + " tools");
}

Registered in the DI container:

builder.Services.AddSingleton<ToolRegistry>(
    new ToolRegistry(typeof(StatsCalculator)));

The registry then has two methods for export:

ToTextManifest() — a short text list of tools, goes into the system prompt
ToJsonSchema() — a JSON schema with all parameters and types, for structured tool calling

Execution: ToolExecutor

ToolExecutor takes a JSON plan from the LLM, parses the tool name and arguments, finds the method in the registry, converts JSON arguments into real .NET types (using the semantic type as a hint), and calls the method via reflection. The result is serialized back to JSON.

// LLM returns:
{
  "tools": [
    { "name": "GetTopStations", "args": { "period": "7d", "limit": 5 } }
  ]
}

// ToolExecutor:
// 1. Find GetTopStations in the registry
// 2. Parse period="7d" as SemanticType.Period → string "7d"
// 3. Parse limit=5 as SemanticType.Count → int 5
// 4. statsCalculator.GetTopStations("7d", 5)
// 5. Serialize result → JSON

Which LLM we use and why

This is the section with the most surprises. We started with "Claude, of course" and ended up somewhere completely different.

What "tool calling" actually is

For this whole pattern to work, the LLM has to do exactly one thing: read the tool descriptions, understand what each does, pick the right one for the user's question, and return valid JSON with arguments. That's it. That's "tool calling" as an LLM capability.

There are two ways to pull it off:

Native tool_use API. Anthropic, OpenAI, Google, and Mistral all expose a structured endpoint where, alongside your prompt, you send a tools array with a schema for each one. The LLM replies with special tool_use blocks that the SDK parses for you. Clean, but provider-specific — each vendor has its own format.

Plain JSON-in-text prompting. You put the tool schema in the system prompt, say "return JSON of this shape", and parse the text yourself. Works with any LLM that can follow instructions.

We went with the second. Simpler, no provider lock-in, and it turns out modern models are so good at structured output that the native API gains you next to nothing. Our call to Anthropic looks like this — notice there is no tools parameter, just system + user:

string requestBody = JsonSerializer.Serialize(new
{
    model = _model,
    max_tokens = 500,
    system = systemPrompt,       // tool schema is inside this string
    messages = new[] { new { role = "user", content = userMessage } }
});

HttpRequestMessage request = new HttpRequestMessage(
    HttpMethod.Post, "https://api.anthropic.com/v1/messages");
request.Headers.Add("x-api-key", _apiKey);
request.Headers.Add("anthropic-version", "2023-06-01");

The LLM returns plain text content starting with {"tools":[...]}. We parse it as regular JSON. No SDKs, no provider-specific gymnastics. The same wrapper works for Claude, Gemini, GPT, Mistral, Qwen — anything OpenRouter exposes.

The real system prompt (from code)

Enough abstract talk. Here's the literal planning system prompt from ChatService.cs (trimmed for readability, original is Croatian):

"You are the planner for the BAJS Zagreb bike-sharing analytics system.
You receive a user question and a list of available tools.
Your task: return ONLY JSON with the list of tools to call to answer the question.

TODAY'S DATE: 2026-04-10 (Friday)
When the user says 'yesterday', 'last week', 'this month', etc.,
compute the exact date from today above.
Current year is 2026. NEVER use 2024 or 2025.

Response format (ONLY JSON, no text):
{"tools":[{"name":"ToolName","args":{"param1":"value1"}}]}

Rules:
- Use ONLY tools from the list below
- If the question doesn't need data (e.g. a greeting), return {"tools":[]}
- For period parameters use: 7d, 14d, 28d, 30d, 90d,
  this_week, prev_week, today, yesterday
- If the user doesn't specify a period, default to 7d
- Maximum 5 tools per plan
- ALWAYS try to use a tool, even if unsure —
  a partial answer is better than none
- For ANY summation, average, or comparison of numbers
  use the Calculate tool — NEVER compute it yourself
- Pick tools strictly from their descriptions below
- When the user asks about a station, never assume it's the exact
  name — search for it first

Available tools:
[... JSON schema of all 26 tools is pasted here ...]"

A few of those rules come directly from pain:

You inject the date on every call. The LLM doesn't know what "today" is — if you don't tell it, it'll use its training cutoff. We were on 2026 data while the model cheerfully returned "last week" as dates from 2024. The fix is literally one line: "TODAY'S DATE: X" in the prompt.
The "NEVER compute yourself" rule. Without it, the Calculate tool was underused — the LLM would sum numbers by hand and lose accuracy. More on this later.
"ALWAYS try to use a tool". Without that, the LLM would sometimes respond "unfortunately I can't tell you" even though there was clearly a tool for the job. Models are too defensive by default.
The warning about station names. A user types "Bundek", which could be Bundek OŠ, Bundek jezero, or Bundek park. The LLM would guess a name and call the tool with a wrong argument. Now it searches first.
"Pick tools strictly from their descriptions". This was the first big lesson. Our early prompts had a hard-coded mapping — things like "for questions about longest rides use GetLongestTrips, for top stations use GetTopStations". It worked, but it was a crutch: every time you add a new tool, you also have to update the prompt; every time you rename a tool, the prompt is wrong; and worst of all, when the mapping had a bug, the LLM would blindly follow the prompt instead of looking at the schema. Eventually we religiously banned it and enforced the rule "pick strictly from descriptions". From that moment on, adding a new tool meant just sticking [LlmTool("description...")] on a method — the prompt stays the same, the LLM finds it on its own.

Every one of those rules was written after some model tripped on a benchmark question. The system prompt grows over time, like a "never again" list.

Three phases: Sonnet → Haiku → Flash

Because we're doing plain JSON prompting, we can swap models trivially. One config field, no rebuild. Here's what we went through:

Phase 1 — Claude Sonnet direct. Quality, works out of the box. But expensive and slow for what we're doing. Average latency 3-4 seconds per question, high cost per call. For an interactive UI where the user expects "instant", Sonnet isn't the right fit. For complex agents, sure; for analytical questions, no.

Phase 2 — Claude Haiku. Much faster, cheaper, quality great for tooling. Seemed like the sweet spot. Then we multiplied the cost by the number of rounds per question: every round sends the entire 25+ tool schema (input tokens) plus accumulated results. An average question with 4-5 rounds was coming out around $0.50. Ten questions a day = $5, a hundred = $50 — not sustainable for a small project.

Phase 3 — Google Gemini 2.0 Flash (via OpenRouter). This is where the surprise happened. Gemini Flash is:

Brutally good at tool calling. Google clearly tuned it hard for function calling scenarios. Even though we use plain JSON prompting (not the native API), that capability carries over — Flash almost never hallucinates tool names, doesn't break JSON syntax, and reliably picks the right tool for the question.
Fast. 1-2 second responses, often quicker than Haiku.
Comically cheap. The same question that cost ~$0.50 on Haiku runs around $0.01 on Flash. That's a 50x difference.

What it looks like in practice: since switching to Flash, cost has stopped being a factor. Haiku: $50 for 100 questions. Flash: $1 for 100 questions. The difference isn't quality — both models pass almost every question in our benchmark. The difference is that Google, which controls the entire Gemini stack in-house, could price tool-use scenarios aggressively low. Anthropic doesn't have that luxury.

We stayed on Flash as default, with Claude as a fallback if OpenRouter goes down. The pluggability we got for free by choosing JSON-in-text over native tool_use paid off the moment we wanted to switch providers.

The entire wrapper is a few-line interface:

public interface ILlmProvider
{
    LlmResult Complete(string systemPrompt, string userMessage);
}

// Registration depends on config:
builder.Services.AddSingleton<ILlmProvider>(sp => {
    string provider = config["Search:Provider"];
    return provider switch
    {
        "anthropic"  => new AnthropicLlmProvider(...),
        "openrouter" => new OpenRouterLlmProvider(...),
        _ => throw new Exception("Unknown provider")
    };
});

Each provider implements Complete(string systemPrompt, string userMessage) and that's it. ChatService neither knows nor cares which model is underneath.

Multi-round loop: planner and answerer

This part was harder. A single question is not necessarily a single tool call. For example:

"Compare how many rides Bundek had last week vs this week"

This needs:

GetStationDetail("Bundek") → we get the station
GetTripCountRange(stationNumber=X, "prev_week")
GetTripCountRange(stationNumber=X, "this_week")

The LLM has to plan the order, look at intermediate results, and decide what next.

Solution: two different system prompts and a loop of up to 10 rounds.

Round 1 — planning

The LLM receives the user question + JSON schema of all tools + an instruction: "Return JSON listing the tools to call now. Maximum 5 per round. If nothing more is needed, return empty."

{
  "tools": [
    { "name": "GetStationDetail", "args": { "name": "Bundek" } }
  ]
}

We execute the tool, get the result, add it to allResults.

Round 2+ — continuation

Now the LLM receives: the question + accumulated results + an instruction "do we need more tools or is this enough?"

{
  "tools": [
    { "name": "GetTripCountRange", "args": { "stationNumber": 21331, "period": "prev_week" } },
    { "name": "GetTripCountRange", "args": { "stationNumber": 21331, "period": "this_week" } }
  ]
}

The loop terminates when:

The LLM returns {"tools": []} (nothing more to call)
Every tool in this round is a duplicate of a previous call (stuck detection)
We've reached 10 rounds

Final step — the answer

A new LLM call with the answering prompt: "Based on these results (JSON), formulate a natural-language answer. Use only numbers from the data. Don't mention tools or APIs."

"It can't even add" — the Calculate tool

One funny moment from development: the LLM could fetch numbers just fine, but when it came time to sum them or compute an average, it would often miss. It would return text like "around 1500 rides total" when the actual number was 1823.

Solution: we added a calculator as a tool.

[LlmTool("Calculator — sum, average, min, max of a number series. " +
         "ALWAYS use this tool instead of computing manually.")]
public CalculateResult Calculate(
    [LlmParam(SemanticType.None, "Operation: 'sum', 'avg', 'min', 'max', 'count'")]
    string operation,
    [LlmParam(SemanticType.None, "Array of numbers")]
    double[] values)

Now the LLM, instead of doing arithmetic itself, first calls GetDailySnapshotAggregates for a period, then Calculate(operation="sum", values=[...]). Accurate numbers, every time.

Moral: we gave the AI a calculator, like you'd give a kid one. And it works.

Where we are now

26 [LlmTool] methods in StatsCalculator: stations, routes, trips, anomalies, geocoding.
45 semantic types for parameters.
Pluggable LLM provider, default Gemini 2.0 Flash via OpenRouter, fallback to Claude direct.
Multi-round loop up to 10 rounds with duplicate detection.
Debug UI at /chatlab — shows each round, tool calls, JSON results, total token count and cost.
Benchmark of 120 questions, used to measure model quality.
System prompt is in Croatian, answers too. Users write naturally and get natural answers.

What's left

The cost problem — every round sends the full tool schema. With 25+ tools, that's non-trivial. We're considering a reduced schema in continuation rounds (only tools likely to be next), or composite tools for common combinations.
Weather correlations — the LLM still can't connect "rain" with a drop in ride count, because the weather tool isn't well integrated.
Area / zone queries — "how many downtown" doesn't work well right now, we're missing a zone concept.
Per-station time-of-day — questions like "when is Bundek at its emptiest" still need more tools.

If you want to apply this in your project

The recipe is surprisingly simple:

Find your StatsCalculator. The class (or classes) where your database-querying methods live.
Write a small [LlmTool] attribute. 10 lines of code.
Write [LlmParam] with an enum of semantic types you use (dates, IDs, enums, counts). You can start with 5 types and grow from there.
Annotate the existing methods. You don't change the implementation, just add attributes.
Build a ToolRegistry that uses reflection to pick up all annotated methods.
Build an LLM wrapper (interface + one provider to start, e.g. OpenRouter).
Multi-round loop — plan prompt, continuation prompt, answer prompt. Three strings.
A debug UI — critical for development, without it you're iterating blindly.

No RAG. No vector databases. No LangChain. Plain C# and reflection.

The whole pattern fits in a few hundred lines of code: the attribute, the parameter attribute, the enum of semantic types, a registry that reflects over the class, an executor that calls methods, an LLM wrapper, and a multi-round orchestrator. We're not publishing the implementation as a package, but every important piece is described in this article — enough to build your own version in a few hours.

The problem

What we didn't want

First attempt: attributes on pages

Second attempt: [LlmTool] attributes on methods

Semantic parameter types

ToolRegistry — reflection at startup

Execution: ToolExecutor

Which LLM we use and why

What "tool calling" actually is

The real system prompt (from code)

Three phases: Sonnet → Haiku → Flash

Multi-round loop: planner and answerer

Round 1 — planning

Round 2+ — continuation

Final step — the answer

"It can't even add" — the Calculate tool

Where we are now

What's left

If you want to apply this in your project

Second attempt: `[LlmTool]` attributes on methods