Introduction
RAG (Retrieval-Augmented Generation) is the key technique for addressing the biggest limitations of LLMs: they do not know information beyond their training cutoff, they lack deep knowledge of specific domains, and they can hallucinate. The fact that RAG-related job postings have surged by +2,047% makes it clear that this is now foundational AI development knowledge.
This article covers the basics of AI development through the lens of RAG, from theory to environment setup.
1. What RAG Is
RAG is a technique where an LLM retrieves relevant information from an external knowledge base before generating a response, then adds that information to its context. Instead of relying only on memory, the model effectively "looks up references when needed."
The Basic RAG Flow
- Receive the user question
- Retrieve relevant documents or data from a vector database
- Augment the prompt with the retrieved information
- Generate an answer using the augmented context
Problems RAG Solves
- Knowledge freshness: reflects information newer than the model's training data
- Domain specialization: uses company documents, product manuals, or other domain-specific sources
- Reduced hallucination: improves accuracy by grounding answers in reference material
- Cost efficiency: injects new knowledge without full fine-tuning
2. Naive RAG vs. Advanced RAG
Naive RAG
The simplest RAG implementation works like this:
- Split documents into chunks
- Convert each chunk into an embedding vector
- Store the vectors in a vector database
- When a question arrives, retrieve relevant chunks through similarity search
- Pass the retrieved chunks to the LLM for answer generation
Advantages: simple to implement and easy to start with
Drawbacks: retrieval quality may be limited, and context can get cut awkwardly depending on chunking strategy
Advanced RAG
Advanced RAG improves on naive RAG with additional techniques:
- Hybrid search: combine vector search with keyword search
- Reranking: reorder search results to select the most relevant ones
- Query transformation: rewrite the user's question into a form better suited for retrieval
- Parent-child chunking: retrieve with smaller chunks but answer with larger context
- Metadata filtering: pre-filter by date, category, and other metadata
3. Vector Databases
The core infrastructure behind RAG is the vector database, which stores text as high-dimensional vectors and supports similarity-based search.
Major Vector Databases
- Milvus: open source and strong for large-scale workloads
- Faiss: developed by Facebook as a Python library
- Pinecone: managed service, good for fast startup
- ChromaDB: lightweight and good for local development
- pgvector: PostgreSQL extension that integrates well with Supabase
How to Choose
- Prototype and learning: ChromaDB for easy local setup
- Production, managed: Pinecone for minimal operational overhead
- Production, self-hosted: Milvus for scale and customization
- Existing PostgreSQL stack: pgvector when you want to avoid extra infrastructure
4. Python Development Environment Setup
Here is a basic way to set up an AI development environment.
Required Tools
# Python installation (3.10+)
python --version
# Create a virtual environment
python -m venv ai-env
source ai-env/bin/activate
# Install core packages
pip install langchain openai anthropic chromadb
pip install jupyter notebook
Core Libraries
- LangChain: framework for building RAG pipelines
- OpenAI and Anthropic SDKs: for LLM API calls
- ChromaDB: vector storage and retrieval
- BeautifulSoup and Requests: web scraping for data collection
- tiktoken: token counting and cost management
5. Jupyter Notebook and Google Colab
Jupyter Notebook
Jupyter provides an interactive execution environment and is essential for experimentation and prototyping in AI development. Because code, text, and visual output live in one document, it is ideal for quickly validating ideas.
Google Colab
Google Colab is a cloud-based Jupyter environment with access to free GPUs. It is useful for training and experimentation when a high-performance local machine is not available, especially in the early learning and prototyping stage.
6. Designing a RAG Chatbot
Basic Architecture
User question
->
[Query processing] -> vector DB retrieval
->
[Context assembly] -> retrieved results + system prompt + conversation history
->
[LLM call] -> answer generation
->
Return response to user
Design Considerations
- Chunk size: too small loses context, too large adds noise; 500 to 1000 tokens is a common starting range
- Number of retrieved items: start with top-k of 3 to 5 and tune from there
- System prompt: clearly instruct the LLM on how to use retrieved results
- Fallback strategy: define what happens when no relevant document is found
- Evaluation pipeline: build automated checks for answer accuracy, relevance, and faithfulness
Closing
RAG is both a core AI development skill and a fundamental building block for real AI products. The practical learning path is to start with naive RAG, understand the fundamentals, and then add advanced RAG techniques as production requirements become more demanding.
Set up Python, LangChain, ChromaDB, and Jupyter, then build a RAG chatbot with real data. Theory alone is not enough. The real skill comes from hands-on intuition about chunk size, retrieval strategy, and prompt design.