Tutorial: How to Agentically Code a RAG System

By Tom Hundley
(Ghostwritten with Grok 4.1, curated and orchestrated by Tom Hundley)

If my story above hooked you, here's the practical follow-on: a step-by-step tutorial on "agentically coding" a RAG system. This means using AI agents (like Grok, Claude, or GPT) to build iteratively via prompts, while you guide with smart questions. I'll frame it around my experience—building Sparkles (the RAG chatbot) and fixing the job tracker. Assume you're on a stack like Next.js/React with Supabase/Postgres for the DB. No code snippets here; focus on the prompting and oversight process. (You can ask your agent for the actual code.)

Step 1: Define Your Vision and Gather Knowledge

Before prompting, educate yourself (like I did). Key concepts:

Chunking: Split docs (e.g., 10-page resume PDF) into bits (500 chars with 100-char overlap) to avoid overwhelming the embedding model.
Embedding Model: Pick one like OpenAI's text-embedding-3-small (1536 dims, API key needed) or open-source like bge-small-en (384 dims, hostable via Hugging Face).
Vector DB: Use PG Vector in Postgres/Supabase for storing/querying embeddings.
Similarity Metric: Cosine for direction-based matches (ignores length).
RAG Flow: Query → Embed → Retrieve top-k matches → Augment LLM prompt → Generate response.

Prompt your agent: "Explain RAG basics: chunking, embeddings, PG Vector, and cosine similarity." Use this to reinforce—like I did in our chat.

Step 2: Prompt the Agent to Set Up the Basics

Start broad but specify RAG. My initial Sparkles prompt: "Build a RAG-based chatbot for my site thomashundley.com. Use semantic search on my resume data in PG Vector via Supabase."

Ask clarifying questions mid-build (agentic vibe):

"What embedding model are you using to vectorize chunks and queries? (e.g., text-embedding-3-small)"
"How are you chunking the input? (e.g., RecursiveCharacterTextSplitter with 500 chars and 100 overlap)"
"Which similarity metric for comparisons? (e.g., cosine)"

If it skips RAG (like my job tracker did), probe: "Confirm this uses embeddings and semantic search, not just keywords."

Tools: Use LangChain or LlamaIndex for chunking/embedding (pip install via your agent).

Step 3: Handle Data Ingestion

For a PDF resume: "Extract text from my 10-page resume PDF, chunk it, embed each chunk with [model], and store in PG Vector."

Questions to ask:

"How do you extract text? (e.g., PyPDF2)"
"What's the chunk size/overlap? Why that?"
"How many dims in the embeddings?"

For my job tracker fix: "Connect this to the existing Sparkles PG Vector DB. Embed scraped job descs and compare semantically to my profile embeddings."

Step 4: Implement Retrieval and Generation

Core RAG: "On query (e.g., job desc), embed it, query PG Vector for top-5 matches via cosine, stuff them into an LLM prompt (e.g., Claude Sonnet), and generate a score/response."

Questions:

"What LLM for generation? (e.g., Anthropic's Claude, not for embeddings)"
"How do you score? (e.g., cosine from 0-1, then LLM refines)"
"Handle edge cases? (e.g., location restrictions in job desc)"

Test: Run a sample query, like my "remote states" issue, and iterate if it fails.

Step 5: Deploy and Audit

"Integrate into Next.js: API route for queries, frontend chatbot UI."

Post-deploy: "Run quality control: Score this job desc against my data. Explain why it caught/didn't catch [nuance]."

If it's keywords (plot twist!), redo: "Switch to full RAG with embeddings."

Step 6: Iterate and Scale

Benchmark: "Compare keyword vs. RAG accuracy on 10 jobs."
Scale: "Add more data sources? Fine-tune embeddings?"

Key takeaway from my story: Vague prompts get simple results. Specific ones (with questions like above) get robust RAG. Time to build? Minutes with agents. But your oversight makes it shine.

Ready to try? Prompt an AI with this tutorial as a base. Hit me up if you build something cool.