From Long Prompt to RAG: How to Build Robust AI Agents with Your Knowledge Base
Nov 14, 2025
5 minutes
Reading time

Do we really need RAG – or is it enough to just put everything into the prompt? In this article, we will look at how RAG and Context Retrieval actually work, why "everything" never ends up in a request – and when you should switch from "just putting everything into the prompt" to a structured retrieval architecture.
What exactly is RAG – and why does "everything" never come with it?
Before we discuss "long prompt vs. RAG," a clarification:
Your entire knowledge base never lands in a single AI query.
Even with classic RAG, the process looks like this:
Prepare documents:
Convert PDFs, Confluence, SharePoint, code, manuals into text, clean them up, and break them down into meaningful chunks (paragraphs, chapters, functions).Build an index:
Vector index (embeddings) for semantic similarity.
Optionally additional: classic full-text index (BM25) to cover keywords well, as semantic similarity can often miss identical phrases.
Retrieval per user query
The user asks a question.
The system searches for the most relevant 10–50 chunks in the index.
Only these chunks land as "context" in the prompt.
Generation
LLM receives: system prompt, user question, found chunks.
An answer is generated based on this excerpt – not based on the entire dataset.
Even if you use "RAG," a retrieval step always decides which small parts of your knowledge base come into the prompt. The myth "we just load all our knowledge into the AI" is thus never true – technically this is not possible with the current context windows.
The simplest solution: just put everything up to ~200,000 tokens in the prompt
Now to the exciting part: Do I even need to build RAG – or is a very long prompt sufficient?
If your knowledge base is manageable (e.g., manual, internal wiki, 100–500 pages) and doesn't change constantly, then the simplest idea is often the best: Take your entire, cleaned knowledge base (up to about 200k tokens), put it in the prompt – done.
Of course, not as a 500-page PDF in one go, but processed neatly. Structure documents into meaningful sections (chapters, headings) and use structured representations like JSON/YAML with title, type, content. Clear system instructions additionally help the model understand how to handle these contents ("Answer based only on the following information").
Classic RAG: search selectively instead of sending everything
The core of classic RAG is:
AI receives a request.
You search for similar chunks in the vector index.
The best hits (e.g., top-20) move into the queries (prompt) and serve as additional context.
This way, you send the model only a small, relevant excerpt of your knowledge base – instead of everything.
Contextual Retrieval: fewer false hits, better answers
Anthropic proposes a relatively simple but very effective improvement to this retrieval step with Contextual Retrieval. The idea here is to supplement the chunks in the classic RAG system with context from the original document it came from. This reduces the error rate for targeted questions about the knowledge base by up to 67%.
How we approach this topic at Ahoi Kapptn!
Our approach also follows our process Understand → Develop → Optimize:
Understand
The beginning is understanding. Together with you, we clarify the contents you actually have – meaning the formats, quality, and size of your knowledge base. We look at which use cases are in the foreground, such as support, sales, internal onboarding, or sports data, and what requirements exist regarding security, governance, and on-prem or open-source models.
Often we start here with a compact AI workshop, where we prioritize use cases and decide whether a long prompt is sufficient or whether you will need RAG sooner or later.
Develop
In this phase, we implement what was jointly decided. For smaller knowledge bases, this means: clean data preparation, meaningful structure, a well-designed long prompt, and thoughtful prompt design – with this, you are often already productive. For larger setups, we design and implement a RAG architecture with index, retrieval, and suitable guardrails. Where appropriate, we supplement this with a contextual retrieval pipeline with BM25, embeddings, and reranker to further improve hit quality.
Optimize
In the Optimize phase, we look at how the system performs in everyday life. We monitor which questions are actually asked and where the system fails. Based on measurable KPIs like hit quality, latency, usage rates, and possibly manual evaluations, we iterate step by step: We refine prompts, adjust retrieval parameters, and expand the use of contextual retrieval as needed.
Are you planning an AI project with your own knowledge base or want to take your existing system to the next level?
Let's talk and check whether a "simple prompt" is enough or if RAG/contextual retrieval makes sense – request a project now.
More Articles

Cookie banners everywhere – but do you really need them? Fathom as a clear, GDPR-compliant alternative to Google Analytics
March 22, 2025

Win-Back Offers: How you can win back canceled subscribers on the App Store
November 7, 2025

Model Context Protocol (MCP): The new standard that enables AI systems to take action
May 19, 2025