RAG and AI Development Basics: From Theory to Hands-On Practice

By seokchol hong

Introduction

RAG (Retrieval-Augmented Generation) is the key technique for addressing the biggest limitations of LLMs: they do not know information beyond their training cutoff, they lack deep knowledge of specific domains, and they can hallucinate. The fact that RAG-related job postings have surged by +2,047% makes it clear that this is now foundational AI development knowledge.

This article covers the basics of AI development through the lens of RAG, from theory to environment setup.


1. What RAG Is

RAG is a technique where an LLM retrieves relevant information from an external knowledge base before generating a response, then adds that information to its context. Instead of relying only on memory, the model effectively "looks up references when needed."

The Basic RAG Flow

  1. Receive the user question
  2. Retrieve relevant documents or data from a vector database
  3. Augment the prompt with the retrieved information
  4. Generate an answer using the augmented context

Problems RAG Solves

  • Knowledge freshness: reflects information newer than the model's training data
  • Domain specialization: uses company documents, product manuals, or other domain-specific sources
  • Reduced hallucination: improves accuracy by grounding answers in reference material
  • Cost efficiency: injects new knowledge without full fine-tuning

2. Naive RAG vs. Advanced RAG

Naive RAG

The simplest RAG implementation works like this:

  1. Split documents into chunks
  2. Convert each chunk into an embedding vector
  3. Store the vectors in a vector database
  4. When a question arrives, retrieve relevant chunks through similarity search
  5. Pass the retrieved chunks to the LLM for answer generation

Advantages: simple to implement and easy to start with
Drawbacks: retrieval quality may be limited, and context can get cut awkwardly depending on chunking strategy

Advanced RAG

Advanced RAG improves on naive RAG with additional techniques:

  • Hybrid search: combine vector search with keyword search
  • Reranking: reorder search results to select the most relevant ones
  • Query transformation: rewrite the user's question into a form better suited for retrieval
  • Parent-child chunking: retrieve with smaller chunks but answer with larger context
  • Metadata filtering: pre-filter by date, category, and other metadata

3. Vector Databases

The core infrastructure behind RAG is the vector database, which stores text as high-dimensional vectors and supports similarity-based search.

Major Vector Databases

  • Milvus: open source and strong for large-scale workloads
  • Faiss: developed by Facebook as a Python library
  • Pinecone: managed service, good for fast startup
  • ChromaDB: lightweight and good for local development
  • pgvector: PostgreSQL extension that integrates well with Supabase

How to Choose

  • Prototype and learning: ChromaDB for easy local setup
  • Production, managed: Pinecone for minimal operational overhead
  • Production, self-hosted: Milvus for scale and customization
  • Existing PostgreSQL stack: pgvector when you want to avoid extra infrastructure

4. Python Development Environment Setup

Here is a basic way to set up an AI development environment.

Required Tools

# Python installation (3.10+)
python --version

# Create a virtual environment
python -m venv ai-env
source ai-env/bin/activate

# Install core packages
pip install langchain openai anthropic chromadb
pip install jupyter notebook

Core Libraries

  • LangChain: framework for building RAG pipelines
  • OpenAI and Anthropic SDKs: for LLM API calls
  • ChromaDB: vector storage and retrieval
  • BeautifulSoup and Requests: web scraping for data collection
  • tiktoken: token counting and cost management

5. Jupyter Notebook and Google Colab

Jupyter Notebook

Jupyter provides an interactive execution environment and is essential for experimentation and prototyping in AI development. Because code, text, and visual output live in one document, it is ideal for quickly validating ideas.

Google Colab

Google Colab is a cloud-based Jupyter environment with access to free GPUs. It is useful for training and experimentation when a high-performance local machine is not available, especially in the early learning and prototyping stage.


6. Designing a RAG Chatbot

Basic Architecture

User question
    ->
[Query processing] -> vector DB retrieval
    ->
[Context assembly] -> retrieved results + system prompt + conversation history
    ->
[LLM call] -> answer generation
    ->
Return response to user

Design Considerations

  • Chunk size: too small loses context, too large adds noise; 500 to 1000 tokens is a common starting range
  • Number of retrieved items: start with top-k of 3 to 5 and tune from there
  • System prompt: clearly instruct the LLM on how to use retrieved results
  • Fallback strategy: define what happens when no relevant document is found
  • Evaluation pipeline: build automated checks for answer accuracy, relevance, and faithfulness

Closing

RAG is both a core AI development skill and a fundamental building block for real AI products. The practical learning path is to start with naive RAG, understand the fundamentals, and then add advanced RAG techniques as production requirements become more demanding.

Set up Python, LangChain, ChromaDB, and Jupyter, then build a RAG chatbot with real data. Theory alone is not enough. The real skill comes from hands-on intuition about chunk size, retrieval strategy, and prompt design.

Back to blog