"How I Built a Local AI Agent Platform with ChromaDB, RAG, Ollama, and a Bit of Streamlit Magic"
1. Introduction: From Idea to Prototype
This article takes you behind the scenes of a small but surprisingly capable solution I built — one that lets users create their own AI agents, upload documents or URLs, and interact with their data like they’ve got a personal assistant with a photographic memory.
The goal?
“Make it so easy to create AI agents that even your non-techie friend who still uses Internet Explorer could do it.”
2. High-Level Overview
Here’s what my solution does:
You create AI agents for different topics (product Q&A, research, whatever).
Upload documents or give it a URL.
I extract, chunk, and embed the data locally.
Store everything in ChromaDB.
Ask questions and the agent answers intelligently using a local LLM via Ollama and RAG.
All of this works through a lightweight web UI built using Streamlit. No clouds were harmed in the making of this demo.
3. Tech Stack I Used (aka What’s Under the Hood)
Python – The glue holding everything together. Everything runs through it — from the UI to embeddings to the backend logic.
Streamlit – The simple web UI framework I used to stitch the app together. Quick to build, and easy on the eyes (kind of).
ChromaDB – A local vector database that stores all the document embeddings. Each AI agent gets its own neat little collection.
LangChain – The orchestrator behind the scenes, managing the Retrieval-Augmented Generation (RAG) pipeline and handling all the document wrangling.
SentenceTransformers – Used to generate embeddings for all text chunks, specifically using the
all-MiniLM-L6-v2
model.Ollama – Runs LLMs like LLaMA 2 or Mistral locally on your machine, meaning you can ask questions without ever hitting the cloud.
LlamaParse – A reliable document parser, especially for PDFs and messy text-heavy files. It breaks them down into clean, readable chunks.
BeautifulSoup / Pandas – When you're dealing with HTML pages or CSV files, these two jump in to extract and clean the content before it's embedded.
4. Document Ingestion and Embedding Flow
Whether you upload a PDF, CSV, or paste a link, the pipeline kicks in like this:
Parsing:
PDFs and rich text? LlamaParse handles them.
HTML? BeautifulSoup.
Spreadsheets? Pandas.
Chunking: Using recursive character splitting for manageable context windows.
Embedding: Each chunk is embedded with
sentence-transformers
.Storage: Embeddings go into ChromaDB under the agent’s unique collection.
5. Chat Time: Local RAG with Ollama
When you type a question, here’s what’s happening under the hood:
Your query is embedded.
ChromaDB retrieves the most relevant document chunks.
The chunks are stitched into a prompt.
Ollama (running something like
mistral
orllama2
) processes it locally and answers.
This means no internet, no OpenAI keys, and complete control. Feels like running ChatGPT on your own terms.
6. Multi-Agent Architecture
Each agent is isolated and has:
Its own document collection in Chroma
Its own UI view for uploads and queries
Context-aware responses, thanks to independent embedding spaces
You can create:
A product documentation agent
A research assistant for a specific topic
An internal legal doc reader
Each one stays in its lane — like a well-trained golden retriever.
7. Limitations & What’s Next
Yes, this is still a prototype, so here’s what’s not perfect:
LLM quality depends on what you run locally (Ollama has come a long way though!)
Large files may take time to process or very large may not work
UI is functional, not flashy (Streamlit charm)
No multi-user support yet (but it's modular for future upgrades)
Planned upgrades:
Auth + role-based agent sharing
Async doc processing + status view
More UI magic for power users
8. Try It Yourself!
This isn’t a polished product — it’s a working example of what’s possible when local AI gets smarter, lighter, and cheaper. You can try it yourself, play with it, and break it lovingly.
Medium article - Click here