How we connected an LLM to our own database (without RAG)
The story of how Bajs — analytics for Nextbike Zagreb — understands natural-language questions and returns exact numbers from the database, without making things up.
- The problem
- What we didn't want
- First attempt: attributes on pages
- Second attempt:
[LlmTool]on methods - Semantic parameter types (all 45)
- ToolRegistry — reflection at startup
- Execution: ToolExecutor
- Which LLM we use and why
- What "tool calling" actually is
- The real system prompt (from code)
- Three phases: Sonnet → Haiku → Flash
- Multi-round loop: planner and answerer
- "It can't even add" — the Calculate tool
- Where we are now
- What's left
- Recipe for your project
The problem
Bajs (bajs.informacija.hr) has a pile of bike-sharing analytics: trip counts, hourly averages, emptiest stations, anomalies, heatmaps. All of it is available across a dozen web pages full of charts and tables.
The problem: a user who wants to know "how many rides were there last Tuesday" has to know where to click, open the right page, find the right table, read off the number. And someone visiting the site for the first time just gets lost.
We wanted them to just type a question and get an answer. No clicking.
What we didn't want
- RAG (Retrieval-Augmented Generation): scan documents, stuff them into a vector database, fill the prompt with relevant text. That works for documentation, not structured data.
- A prompt with pre-baked statistics: "here's a summary for today, answer from this". The LLM would often miss, and we'd have to regenerate the summary every minute.
- Letting the LLM write SQL: dangerous and slow, and besides, our database isn't even SQL.
What we wanted: have the LLM call our existing analytical functions. They're tested, we know they return correct numbers, and the web uses them directly. DRY.
First attempt: attributes on pages
The first attempt was more naive. Bajs already has a dozen analytical pages that drill down into data — each specialized for one slice (activity by hour, top stations, route details, anomalies per day). We tagged them with attributes (title, description, subtitle, tags), and the LLM picked which page best matched the question based on those metadata. Then we'd render that page and feed the LLM its text as context.
Problems:
- A page mixes presentation and data. The LLM gets HTML full of colors and icons, and has to pull numbers out of the noise. Too much signal to filter.
- Granularity is too coarse. One page = one answer. It can't combine two things ("compare last week and this week" = two pages, no joining).
- Parameters are awkward. If the user asks about a specific station or period, the URL has to have query params, and the LLM has no idea what's allowed.
Conclusion: we need methods, not pages.
Second attempt: [LlmTool] attributes on methods
All of Bajs's analytical code is already in one class: StatsCalculator. It's a partial class spread across a dozen files (StatsCalculator.Reports.cs, StatsCalculator.Stations.cs, etc.) with methods like GetTopStations, GetLongestTrips, GetPeakHour.
We added a single attribute:
[AttributeUsage(AttributeTargets.Method)]
public class LlmToolAttribute : Attribute
{
public string Description { get; }
public LlmToolAttribute(string description) => Description = description;
}
Intentionally minimal — just a description. The method name is also the tool name.
Example:
[LlmTool("Longest rides by duration. " +
"Use for 'longest rides', 'longest ride this week'.")]
public LongestTripsResult GetLongestTrips(string period, int limit)
{
// existing implementation, unchanged
}
Zero new code. We just added an attribute above methods that already existed and worked for the web.
Semantic parameter types
GetLongestTrips takes a period and a limit. But the LLM doesn't know what "period" is — is it a number of days? An ISO date? An enum?
This was a key part of the design. We made an [LlmParam] attribute with a semantic type:
public enum SemanticType
{
Period, // "7d", "today", "this_week", "prev_week"...
Date, // "2026-04-08"
DateRange, // "2026-04-01..2026-04-08"
StationNumber, // int, e.g. 21331
StationName, // "Bundek", "Jarun"
Count, // how many rows to return
Latitude,
Longitude,
DurationMinutes,
BikeCount,
TripCount,
HourOfDay,
DayOfWeek,
AreaName,
// ...45 types total
}
public class LlmParamAttribute : Attribute
{
public SemanticType Type { get; }
public string Description { get; }
public bool Required { get; }
}
Now the method looks like this:
[LlmTool("Longest rides by duration. " +
"Use for 'longest rides', 'longest ride this week'.")]
public LongestTripsResult GetLongestTrips(
[LlmParam(SemanticType.Period, "Time period")]
string period,
[LlmParam(SemanticType.Count, "How many rides to return")]
int limit)
Why this helps:
- The LLM knows the format. In the system prompt we explain what each
SemanticTypemeans.Period= one of"7d / 14d / today / yesterday / this_week / prev_week / 2026-04-01". The LLM doesn't invent formats. - Validation is centralized. When we execute a tool call, we know what each parameter must be, can parse it, and report errors before even calling the method.
- Descriptions stay in the code. No separate JSON schema file that drifts from the implementation.
There's also [LlmFixedParam] for parameters the LLM is not allowed to touch (e.g. injected services or fixed values).
ToolRegistry — reflection at startup
When the application starts, ToolRegistry reflects over StatsCalculator, finds every method with an [LlmTool] attribute, reads its parameters, and builds an internal registry:
public ToolRegistry(params Type[] typesToScan)
{
foreach (Type type in typesToScan)
{
foreach (MethodInfo method in type.GetMethods(...))
{
LlmToolAttribute toolAttr = method.GetCustomAttribute<LlmToolAttribute>();
if (toolAttr == null) continue;
// pick up method description + all [LlmParam] params with types
// store in _tools dictionary
}
}
Log.Info("ToolRegistry: registered " + _tools.Count + " tools");
}
Registered in the DI container:
builder.Services.AddSingleton<ToolRegistry>(
new ToolRegistry(typeof(StatsCalculator)));
The registry then has two methods for export:
ToTextManifest()— a short text list of tools, goes into the system promptToJsonSchema()— a JSON schema with all parameters and types, for structured tool calling
Execution: ToolExecutor
ToolExecutor takes a JSON plan from the LLM, parses the tool name and arguments, finds the method in the registry, converts JSON arguments into real .NET types (using the semantic type as a hint), and calls the method via reflection. The result is serialized back to JSON.
// LLM returns:
{
"tools": [
{ "name": "GetTopStations", "args": { "period": "7d", "limit": 5 } }
]
}
// ToolExecutor:
// 1. Find GetTopStations in the registry
// 2. Parse period="7d" as SemanticType.Period → string "7d"
// 3. Parse limit=5 as SemanticType.Count → int 5
// 4. statsCalculator.GetTopStations("7d", 5)
// 5. Serialize result → JSON
Which LLM we use and why
This is the section with the most surprises. We started with "Claude, of course" and ended up somewhere completely different.
What "tool calling" actually is
For this whole pattern to work, the LLM has to do exactly one thing: read the tool descriptions, understand what each does, pick the right one for the user's question, and return valid JSON with arguments. That's it. That's "tool calling" as an LLM capability.
There are two ways to pull it off:
Native tool_use API. Anthropic, OpenAI, Google, and Mistral all expose a structured endpoint where, alongside your prompt, you send a tools array with a schema for each one. The LLM replies with special tool_use blocks that the SDK parses for you. Clean, but provider-specific — each vendor has its own format.
Plain JSON-in-text prompting. You put the tool schema in the system prompt, say "return JSON of this shape", and parse the text yourself. Works with any LLM that can follow instructions.
We went with the second. Simpler, no provider lock-in, and it turns out modern models are so good at structured output that the native API gains you next to nothing. Our call to Anthropic looks like this — notice there is no tools parameter, just system + user:
string requestBody = JsonSerializer.Serialize(new
{
model = _model,
max_tokens = 500,
system = systemPrompt, // tool schema is inside this string
messages = new[] { new { role = "user", content = userMessage } }
});
HttpRequestMessage request = new HttpRequestMessage(
HttpMethod.Post, "https://api.anthropic.com/v1/messages");
request.Headers.Add("x-api-key", _apiKey);
request.Headers.Add("anthropic-version", "2023-06-01");
The LLM returns plain text content starting with {"tools":[...]}. We parse it as regular JSON. No SDKs, no provider-specific gymnastics. The same wrapper works for Claude, Gemini, GPT, Mistral, Qwen — anything OpenRouter exposes.
The real system prompt (from code)
Enough abstract talk. Here's the literal planning system prompt from ChatService.cs (trimmed for readability, original is Croatian):
"You are the planner for the BAJS Zagreb bike-sharing analytics system.
You receive a user question and a list of available tools.
Your task: return ONLY JSON with the list of tools to call to answer the question.
TODAY'S DATE: 2026-04-10 (Friday)
When the user says 'yesterday', 'last week', 'this month', etc.,
compute the exact date from today above.
Current year is 2026. NEVER use 2024 or 2025.
Response format (ONLY JSON, no text):
{"tools":[{"name":"ToolName","args":{"param1":"value1"}}]}
Rules:
- Use ONLY tools from the list below
- If the question doesn't need data (e.g. a greeting), return {"tools":[]}
- For period parameters use: 7d, 14d, 28d, 30d, 90d,
this_week, prev_week, today, yesterday
- If the user doesn't specify a period, default to 7d
- Maximum 5 tools per plan
- ALWAYS try to use a tool, even if unsure —
a partial answer is better than none
- For ANY summation, average, or comparison of numbers
use the Calculate tool — NEVER compute it yourself
- Pick tools strictly from their descriptions below
- When the user asks about a station, never assume it's the exact
name — search for it first
Available tools:
[... JSON schema of all 26 tools is pasted here ...]"
A few of those rules come directly from pain:
- You inject the date on every call. The LLM doesn't know what "today" is — if you don't tell it, it'll use its training cutoff. We were on 2026 data while the model cheerfully returned "last week" as dates from 2024. The fix is literally one line: "TODAY'S DATE: X" in the prompt.
- The "NEVER compute yourself" rule. Without it, the Calculate tool was underused — the LLM would sum numbers by hand and lose accuracy. More on this later.
- "ALWAYS try to use a tool". Without that, the LLM would sometimes respond "unfortunately I can't tell you" even though there was clearly a tool for the job. Models are too defensive by default.
- The warning about station names. A user types "Bundek", which could be Bundek OŠ, Bundek jezero, or Bundek park. The LLM would guess a name and call the tool with a wrong argument. Now it searches first.
- "Pick tools strictly from their descriptions". This was the first big lesson. Our early prompts had a hard-coded mapping — things like "for questions about longest rides use GetLongestTrips, for top stations use GetTopStations". It worked, but it was a crutch: every time you add a new tool, you also have to update the prompt; every time you rename a tool, the prompt is wrong; and worst of all, when the mapping had a bug, the LLM would blindly follow the prompt instead of looking at the schema. Eventually we religiously banned it and enforced the rule "pick strictly from descriptions". From that moment on, adding a new tool meant just sticking
[LlmTool("description...")]on a method — the prompt stays the same, the LLM finds it on its own.
Every one of those rules was written after some model tripped on a benchmark question. The system prompt grows over time, like a "never again" list.
Three phases: Sonnet → Haiku → Flash
Because we're doing plain JSON prompting, we can swap models trivially. One config field, no rebuild. Here's what we went through:
Phase 1 — Claude Sonnet direct. Quality, works out of the box. But expensive and slow for what we're doing. Average latency 3-4 seconds per question, high cost per call. For an interactive UI where the user expects "instant", Sonnet isn't the right fit. For complex agents, sure; for analytical questions, no.
Phase 2 — Claude Haiku. Much faster, cheaper, quality great for tooling. Seemed like the sweet spot. Then we multiplied the cost by the number of rounds per question: every round sends the entire 25+ tool schema (input tokens) plus accumulated results. An average question with 4-5 rounds was coming out around $0.50. Ten questions a day = $5, a hundred = $50 — not sustainable for a small project.
Phase 3 — Google Gemini 2.0 Flash (via OpenRouter). This is where the surprise happened. Gemini Flash is:
- Brutally good at tool calling. Google clearly tuned it hard for function calling scenarios. Even though we use plain JSON prompting (not the native API), that capability carries over — Flash almost never hallucinates tool names, doesn't break JSON syntax, and reliably picks the right tool for the question.
- Fast. 1-2 second responses, often quicker than Haiku.
- Comically cheap. The same question that cost ~$0.50 on Haiku runs around $0.01 on Flash. That's a 50x difference.
We stayed on Flash as default, with Claude as a fallback if OpenRouter goes down. The pluggability we got for free by choosing JSON-in-text over native tool_use paid off the moment we wanted to switch providers.
The entire wrapper is a few-line interface:
public interface ILlmProvider
{
LlmResult Complete(string systemPrompt, string userMessage);
}
// Registration depends on config:
builder.Services.AddSingleton<ILlmProvider>(sp => {
string provider = config["Search:Provider"];
return provider switch
{
"anthropic" => new AnthropicLlmProvider(...),
"openrouter" => new OpenRouterLlmProvider(...),
_ => throw new Exception("Unknown provider")
};
});
Each provider implements Complete(string systemPrompt, string userMessage) and that's it. ChatService neither knows nor cares which model is underneath.
Multi-round loop: planner and answerer
This part was harder. A single question is not necessarily a single tool call. For example:
"Compare how many rides Bundek had last week vs this week"
This needs:
GetStationDetail("Bundek")→ we get the stationGetTripCountRange(stationNumber=X, "prev_week")GetTripCountRange(stationNumber=X, "this_week")
The LLM has to plan the order, look at intermediate results, and decide what next.
Solution: two different system prompts and a loop of up to 10 rounds.
Round 1 — planning
The LLM receives the user question + JSON schema of all tools + an instruction: "Return JSON listing the tools to call now. Maximum 5 per round. If nothing more is needed, return empty."
{
"tools": [
{ "name": "GetStationDetail", "args": { "name": "Bundek" } }
]
}
We execute the tool, get the result, add it to allResults.
Round 2+ — continuation
Now the LLM receives: the question + accumulated results + an instruction "do we need more tools or is this enough?"
{
"tools": [
{ "name": "GetTripCountRange", "args": { "stationNumber": 21331, "period": "prev_week" } },
{ "name": "GetTripCountRange", "args": { "stationNumber": 21331, "period": "this_week" } }
]
}
The loop terminates when:
- The LLM returns
{"tools": []}(nothing more to call) - Every tool in this round is a duplicate of a previous call (stuck detection)
- We've reached 10 rounds
Final step — the answer
A new LLM call with the answering prompt: "Based on these results (JSON), formulate a natural-language answer. Use only numbers from the data. Don't mention tools or APIs."
"It can't even add" — the Calculate tool
One funny moment from development: the LLM could fetch numbers just fine, but when it came time to sum them or compute an average, it would often miss. It would return text like "around 1500 rides total" when the actual number was 1823.
Solution: we added a calculator as a tool.
[LlmTool("Calculator — sum, average, min, max of a number series. " +
"ALWAYS use this tool instead of computing manually.")]
public CalculateResult Calculate(
[LlmParam(SemanticType.None, "Operation: 'sum', 'avg', 'min', 'max', 'count'")]
string operation,
[LlmParam(SemanticType.None, "Array of numbers")]
double[] values)
Now the LLM, instead of doing arithmetic itself, first calls GetDailySnapshotAggregates for a period, then Calculate(operation="sum", values=[...]). Accurate numbers, every time.
Where we are now
- 26
[LlmTool]methods inStatsCalculator: stations, routes, trips, anomalies, geocoding. - 45 semantic types for parameters.
- Pluggable LLM provider, default Gemini 2.0 Flash via OpenRouter, fallback to Claude direct.
- Multi-round loop up to 10 rounds with duplicate detection.
- Debug UI at
/chatlab— shows each round, tool calls, JSON results, total token count and cost. - Benchmark of 120 questions, used to measure model quality.
- System prompt is in Croatian, answers too. Users write naturally and get natural answers.
What's left
- The cost problem — every round sends the full tool schema. With 25+ tools, that's non-trivial. We're considering a reduced schema in continuation rounds (only tools likely to be next), or composite tools for common combinations.
- Weather correlations — the LLM still can't connect "rain" with a drop in ride count, because the weather tool isn't well integrated.
- Area / zone queries — "how many downtown" doesn't work well right now, we're missing a zone concept.
- Per-station time-of-day — questions like "when is Bundek at its emptiest" still need more tools.
If you want to apply this in your project
The recipe is surprisingly simple:
- Find your
StatsCalculator. The class (or classes) where your database-querying methods live. - Write a small
[LlmTool]attribute. 10 lines of code. - Write
[LlmParam]with an enum of semantic types you use (dates, IDs, enums, counts). You can start with 5 types and grow from there. - Annotate the existing methods. You don't change the implementation, just add attributes.
- Build a
ToolRegistrythat uses reflection to pick up all annotated methods. - Build an LLM wrapper (interface + one provider to start, e.g. OpenRouter).
- Multi-round loop — plan prompt, continuation prompt, answer prompt. Three strings.
- A debug UI — critical for development, without it you're iterating blindly.
No RAG. No vector databases. No LangChain. Plain C# and reflection.
The whole pattern fits in a few hundred lines of code: the attribute, the parameter attribute, the enum of semantic types, a registry that reflects over the class, an executor that calls methods, an LLM wrapper, and a multi-round orchestrator. We're not publishing the implementation as a package, but every important piece is described in this article — enough to build your own version in a few hours.