Back to JournalSoftware Engineering

I Let AI Build My Job Tracker—And It Almost Screwed Me Over

A tale of prompting pitfalls, unexpected keyword detours, and the power of course-correcting on the fly when building RAG systems with AI agents.

January 7, 20265 min read


By Tom Hundley
(Ghostwritten with Grok 4.1, curated and orchestrated by Tom Hundley)

Hey, I'm Tom Hundley, a developer who's been knee-deep in AI for a while now. I've built tools that leverage retrieval-augmented generation (RAG) to make sense of my career data—resumes, interview notes, you name it. But recently, I had a wake-up call that reminded me why humans still need to stay in the loop when working with AI agents. Let me walk you through the story of how I built two websites, one that nailed RAG from the start and another that... well, didn't. It's a tale of prompting pitfalls, unexpected keyword detours, and the power of course-correcting on the fly.

The Setup: Sparkles, My AI Recruiter Chatbot

It all started with my personal site, thomashundley.com. That's where I host my resume and a bunch of career-related content. I wanted recruiters to interact with it dynamically, so I built an AI agent I call Sparkles. Sparkles is a chatbot that lets recruiters drop job descriptions and ask about my background. It pulls from a vector database stuffed with embeddings of my resumes, interview transcripts, and discussions.

How did I build it? I sat down with an AI agent (think something like Grok or similar) and described what I wanted. I was specific: "Implement a RAG system using semantic search with embeddings in a PG Vector database via Supabase." I knew enough about the concepts—vector embeddings, chunking, cosine similarity—to guide it. The agent orchestrated the whole thing: extracting text from PDFs and Markdown files, chunking them into manageable pieces (like 500-character bits with overlap), embedding them using something like OpenAI's text-embedding-3-small, and storing them in PG Vector for fast queries.

The result? Sparkles works like a charm. A recruiter pastes a job desc, it gets vectorized, compared semantically to my data, and the agent generates context-aware responses. No hallucinations, just relevant matches. It took the AI less time to build than a long coffee break.

The Plot Twist: My Job Tracker Goes Old-School

Fast-forward to my next project: a job tracking database and website. I'm job hunting (more on that later), so I wanted something to scrape jobs from LinkedIn, score them against my profile, and flag fits. I told the AI agent to "build a system that scores jobs based on my resume data." But this time, I didn't specify RAG, embeddings, or semantic search. I assumed it'd mirror what we did for Sparkles.

It didn't. The agent built a keyword-scoring system, like those outdated applicant tracking systems (ATS) that just count word matches. It scraped jobs, parsed my resume for keywords, and scored based on overlaps—things like "React" appearing in both. I deployed it, ran some tests, and the scores looked decent. My "smoke test" passed; jobs that aligned with my skills got high marks.

But then I spotted a glitch. A job scored perfectly, but it required "remote only in these five states"—and I'm not in any of them. How'd it miss that? I prompted a quality check: "Why didn't it catch the location restriction?" That's when the truth came out: no RAG, no embeddings, just crude keyword counting. The system couldn't grasp semantics like "remote if in CA, NY, etc." versus my actual location. It was accurate enough for broad matches but blind to nuances.

What the fuck, right? I'd been using it live, thinking it was smart, but it was basically a glorified grep command.

The Fix: Hooking It to the RAG Engine

I didn't panic—I iterated. I told the agent: "Redo this using the same RAG setup as Sparkles. Connect to the existing PG Vector database, embed job descriptions, and score semantically." As I was chatting about this (ironically, with another AI), it rebuilt everything. Now, job descs get chunked, vectorized, and compared via cosine similarity to my embedded profile. Scores factor in context: skills, experience, even subtle requirements like locations or tech stacks.

The difference? Night and day. Keyword scoring might hit 70-80% accuracy for obvious matches, but RAG nails the subtleties—understanding "full-stack dev with cloud experience" means AWS/GCP, not just the words. And it all happened faster than explaining the concepts in a conversation.

Lessons Learned: Why Human Oversight Matters

This experience hammered home a key point: AI agents are incredible at execution, but they default to the path of least resistance if your prompts aren't precise. I knew the lingo—chunking, dims, embedding models—so Sparkles got RAG right. But for the job tracker, my vague prompt led to a simpler, dumber system. A non-developer might not have caught it; they'd be scoring jobs on vibes alone.

That said, "agentic coding" (vibing with AI to build iteratively) is the future. Agents built my setups quicker than I could code them manually. But you need education on the architecture: know RAG from keywords, ask the right questions like "What embedding model are you using?" or "Which similarity metric?" Keep a human in the loop to audit and refine.

Oh, and why share this? I'm pushing RAG implementations on my consulting site, but honestly, I'm job hunting too. Sales are ramping up, but not fast enough to skip a steady gig. If you're a recruiter or hiring manager reading this, hit me up—I've got the resume site, Sparkles to chat with, and now a RAG-powered job tracker proving I don't just talk AI, I live it (and fix it when it breaks).

What do you think? Drop a comment or reach out on LinkedIn. Let's talk RAG, jobs, or whatever.