Notes on exploring RAG

08 Jan, 2025

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique in AI that reduces hallucinations in language models by providing additional information beyond the prompt. Traditional large language models (LLMs) are often trained on historical data, which may lack detailed knowledge of specific systems or recent changes in information.

Why Do We Need RAG?

Training LLMs is costly and time-consuming. Instead of retraining, we can provide relevant data to the model, allowing it to generate more accurate and tailored responses.

High-Level Overview of Creating RAG

Vector Database Preparation

Prepare the dataset (e.g., markdown documents, JSON, text, PDF files).
Create an index database using a vector store like Pinecone.
Implement a text splitter to break the dataset into smaller chunks.
Generate embeddings for each chunk and insert them into the vector database.

Retrieval Process

Once the vector database is ready, we can start querying it.

Generate embeddings for the search query (using the same embedding model as the dataset chunks).
Perform the search query. In Pinecone’s SDK, we can limit the number of results returned.

Integrating with LLM

We can use OpenAI’s ChatGPT to generate responses based on retrieved information.

Create a prompt that includes both the retrieved data and the user query.
Send the prompt to OpenAI’s API and evaluate whether the prompt or vector database needs refinement.

Questions?

Why Do We Need to Chunk the Dataset Before Storing It in a Vector Database?

Splitting text into smaller chunks has several benefits:

It improves embedding quality since models perform better with shorter text sequences.
It enhances retrieval accuracy by making relevant information more accessible.
It optimizes memory usage, making the system more efficient.

Glossary

Embeddings

Embeddings are numerical representations of text that help computers understand words and sentences in a way that captures meaning and context. They allow similar words or concepts to be stored close together in a vector space.

Vector Database

A vector database is a specialized database designed to store and search for embeddings efficiently. It enables quick retrieval of relevant information by comparing similarity between vectors rather than searching for exact words.

Chunking

Chunking is the process of breaking large pieces of text into smaller, more manageable parts. This improves search accuracy and makes it easier for AI models to process and retrieve relevant information.

References

#AI #LLM #Machine Learning