RAG and AI Development Basics: From Theory to Hands-On Practice

Introduction

RAG (Retrieval-Augmented Generation) is the key technique for addressing the biggest limitations of LLMs: they do not know information beyond their training cutoff, they lack deep knowledge of specific domains, and they can hallucinate. The fact that RAG-related job postings have surged by +2,047% makes it clear that this is now foundational AI development knowledge.

This article covers the basics of AI development through the lens of RAG, from theory to environment setup.

1. What RAG Is

RAG is a technique where an LLM retrieves relevant information from an external knowledge base before generating a response, then adds that information to its context. Instead of relying only on memory, the model effectively "looks up references when needed."

The Basic RAG Flow

Receive the user question
Retrieve relevant documents or data from a vector database
Augment the prompt with the retrieved information
Generate an answer using the augmented context

Problems RAG Solves

Knowledge freshness: reflects information newer than the model's training data
Domain specialization: uses company documents, product manuals, or other domain-specific sources
Reduced hallucination: improves accuracy by grounding answers in reference material
Cost efficiency: injects new knowledge without full fine-tuning

2. Naive RAG vs. Advanced RAG

Naive RAG

The simplest RAG implementation works like this:

Split documents into chunks
Convert each chunk into an embedding vector
Store the vectors in a vector database
When a question arrives, retrieve relevant chunks through similarity search
Pass the retrieved chunks to the LLM for answer generation

Advantages: simple to implement and easy to start with
Drawbacks: retrieval quality may be limited, and context can get cut awkwardly depending on chunking strategy

Advanced RAG

Advanced RAG improves on naive RAG with additional techniques:

Hybrid search: combine vector search with keyword search
Reranking: reorder search results to select the most relevant ones
Query transformation: rewrite the user's question into a form better suited for retrieval
Parent-child chunking: retrieve with smaller chunks but answer with larger context
Metadata filtering: pre-filter by date, category, and other metadata

3. Vector Databases

The core infrastructure behind RAG is the vector database, which stores text as high-dimensional vectors and supports similarity-based search.

Major Vector Databases

Milvus: open source and strong for large-scale workloads
Faiss: developed by Facebook as a Python library
Pinecone: managed service, good for fast startup
ChromaDB: lightweight and good for local development
pgvector: PostgreSQL extension that integrates well with Supabase

How to Choose

Prototype and learning: ChromaDB for easy local setup
Production, managed: Pinecone for minimal operational overhead
Production, self-hosted: Milvus for scale and customization
Existing PostgreSQL stack: pgvector when you want to avoid extra infrastructure

4. Python Development Environment Setup

Here is a basic way to set up an AI development environment.

Required Tools

# Python installation (3.10+)
python --version

# Create a virtual environment
python -m venv ai-env
source ai-env/bin/activate

# Install core packages
pip install langchain openai anthropic chromadb
pip install jupyter notebook

Core Libraries

LangChain: framework for building RAG pipelines
OpenAI and Anthropic SDKs: for LLM API calls
ChromaDB: vector storage and retrieval
BeautifulSoup and Requests: web scraping for data collection
tiktoken: token counting and cost management

5. Jupyter Notebook and Google Colab

Jupyter Notebook

Jupyter provides an interactive execution environment and is essential for experimentation and prototyping in AI development. Because code, text, and visual output live in one document, it is ideal for quickly validating ideas.

Google Colab

Google Colab is a cloud-based Jupyter environment with access to free GPUs. It is useful for training and experimentation when a high-performance local machine is not available, especially in the early learning and prototyping stage.

6. Designing a RAG Chatbot

Basic Architecture

User question
    ->
[Query processing] -> vector DB retrieval
    ->
[Context assembly] -> retrieved results + system prompt + conversation history
    ->
[LLM call] -> answer generation
    ->
Return response to user

Design Considerations

Chunk size: too small loses context, too large adds noise; 500 to 1000 tokens is a common starting range
Number of retrieved items: start with top-k of 3 to 5 and tune from there
System prompt: clearly instruct the LLM on how to use retrieved results
Fallback strategy: define what happens when no relevant document is found
Evaluation pipeline: build automated checks for answer accuracy, relevance, and faithfulness

Closing

RAG is both a core AI development skill and a fundamental building block for real AI products. The practical learning path is to start with naive RAG, understand the fundamentals, and then add advanced RAG techniques as production requirements become more demanding.

Set up Python, LangChain, ChromaDB, and Jupyter, then build a RAG chatbot with real data. Theory alone is not enough. The real skill comes from hands-on intuition about chunk size, retrieval strategy, and prompt design.