"How I Built a Local AI Agent Platform with ChromaDB, RAG, Ollama, and a Bit of Streamlit Magic"

1. Introduction: From Idea to Prototype

This article takes you behind the scenes of a small but surprisingly capable solution I built — one that lets users create their own AI agents, upload documents or URLs, and interact with their data like they’ve got a personal assistant with a photographic memory.

The goal?

“Make it so easy to create AI agents that even your non-techie friend who still uses Internet Explorer could do it.”

2. High-Level Overview

Here’s what my solution does:

  • You create AI agents for different topics (product Q&A, research, whatever).

  • Upload documents or give it a URL.

  • I extract, chunk, and embed the data locally.

  • Store everything in ChromaDB.

  • Ask questions and the agent answers intelligently using a local LLM via Ollama and RAG.

All of this works through a lightweight web UI built using Streamlit. No clouds were harmed in the making of this demo.

3. Tech Stack I Used (aka What’s Under the Hood)

Python – The glue holding everything together. Everything runs through it — from the UI to embeddings to the backend logic.

  1. Streamlit – The simple web UI framework I used to stitch the app together. Quick to build, and easy on the eyes (kind of).

  2. ChromaDB – A local vector database that stores all the document embeddings. Each AI agent gets its own neat little collection.

  3. LangChain – The orchestrator behind the scenes, managing the Retrieval-Augmented Generation (RAG) pipeline and handling all the document wrangling.

  4. SentenceTransformers – Used to generate embeddings for all text chunks, specifically using the all-MiniLM-L6-v2 model.

  5. Ollama – Runs LLMs like LLaMA 2 or Mistral locally on your machine, meaning you can ask questions without ever hitting the cloud.

  6. LlamaParse – A reliable document parser, especially for PDFs and messy text-heavy files. It breaks them down into clean, readable chunks.

  7. BeautifulSoup / Pandas – When you're dealing with HTML pages or CSV files, these two jump in to extract and clean the content before it's embedded.

4. Document Ingestion and Embedding Flow

Whether you upload a PDF, CSV, or paste a link, the pipeline kicks in like this:

  1. Parsing:

    • PDFs and rich text? LlamaParse handles them.

    • HTML? BeautifulSoup.

    • Spreadsheets? Pandas.

  2. Chunking: Using recursive character splitting for manageable context windows.

  3. Embedding: Each chunk is embedded with sentence-transformers.

  4. Storage: Embeddings go into ChromaDB under the agent’s unique collection.

    5. Chat Time: Local RAG with Ollama

    When you type a question, here’s what’s happening under the hood:

    • Your query is embedded.

    • ChromaDB retrieves the most relevant document chunks.

    • The chunks are stitched into a prompt.

    • Ollama (running something like mistral or llama2) processes it locally and answers.

    This means no internet, no OpenAI keys, and complete control. Feels like running ChatGPT on your own terms.

    6. Multi-Agent Architecture

    Each agent is isolated and has:

    • Its own document collection in Chroma

    • Its own UI view for uploads and queries

    • Context-aware responses, thanks to independent embedding spaces

    You can create:

    • A product documentation agent

    • A research assistant for a specific topic

    • An internal legal doc reader

    Each one stays in its lane — like a well-trained golden retriever.

    7. Limitations & What’s Next

    Yes, this is still a prototype, so here’s what’s not perfect:

    • LLM quality depends on what you run locally (Ollama has come a long way though!)

    • Large files may take time to process or very large may not work

    • UI is functional, not flashy (Streamlit charm)

    • No multi-user support yet (but it's modular for future upgrades)

    Planned upgrades:

    • Auth + role-based agent sharing

    • Async doc processing + status view

    • More UI magic for power users

    8. Try It Yourself!

    This isn’t a polished product — it’s a working example of what’s possible when local AI gets smarter, lighter, and cheaper. You can try it yourself, play with it, and break it lovingly.

    📎 Try the app here

Medium article - Click here