SETUP GUIDE • v1.0.0

Getting Dhi Running in < 10 Minutes

Everything runs on your machine — no cloud, no API costs, no data leaving your control. You need Docker Desktop and a Telegram bot token. That's it.

Docker Desktop Git 8GB RAM min ~6GB disk Telegram account (for bot)

01 / CLONE

Get the project onto your machine.

Terminal

git clone https://github.com/Arya-shivam/SecondBrainRAG.git cd SecondBrainRAG

02 / CONFIGURE

Two things are required: your Obsidian vault path (where Dhi writes notes) and your Telegram bot token. Create a bot at @BotFather on Telegram first.

Terminal

cp .env.example .env

.env — Required Variables

# Path to your local Obsidian vault OBSIDIAN_VAULT_PATH=/absolute/path/to/your/vault # Telegram bot token from @BotFather TELEGRAM_BOT_TOKEN=123456:ABC-your-token-here # Your Telegram user ID (so only you can use it) TELEGRAM_ALLOWED_USER_ID=123456789
TIP — Find your Telegram user ID by messaging @userinfobot.

03 / LAUNCH

One command starts the full stack.

ServicePort
FastAPI (backend):8000
OpenSearch:9200
PostgreSQL:5432
Airflow:8080
Langfuse:3000
Telegram Botlong-poll

Terminal

docker compose up -d

Verify everything is healthy

$ docker compose ps NAME STATUS dhi-api healthy dhi-opensearch healthy dhi-postgres healthy dhi-airflow healthy dhi-langfuse healthy dhi-telegram-bot running

API health check

curl http://localhost:8000/health

04 / INGEST

Three ways to ingest content. All end up in OpenSearch and your Obsidian vault as linked Markdown notes.

Option A — Telegram (recommended, from your phone)

Send any of these to @YourDhiBot: https://youtube.com/watch?v=... # YouTube transcript https://example.com/article # Web article Any raw text or note # Plain thought

Option B — Chrome Extension (one-click from browser)

1. Load chrome-extension/ folder in Chrome (Developer Mode) 2. Click the Dhi icon on any page or YouTube video 3. Desktop notification confirms when saved

Option C — Direct API

curl -X POST http://localhost:8000/api/ingest \ -H "Content-Type: application/json" \ -d '{"source_url": "https://youtu.be/...", "content_type": "youtube"}'

05 / QUERY

The /ask endpoint uses hybrid BM25 + vector search with RRF fusion, then generates a grounded answer using Llama 3.1 via OpenRouter.

POST /ask

curl -X POST http://localhost:8000/ask \ -H "Content-Type: application/json" \ -d '{ "question": "What did the video say about attention mechanisms?", "top_k": 5 }'

Response

{ "answer": "The video explains that attention is all you need... [src:yt_dQw4w]", "sources": [{ "title": "Transformers Explained", "score": 0.94 }], "latency_ms": 1620 }

STACK REFERENCE

INGESTION

yt_fetcher.py

youtube-transcript-api

universal_ingest.py

pymupdf, trafilatura

python-docx

STORAGE

PostgreSQL 16

documents, chunks

OpenSearch

BM25 + 768d kNN

Obsidian Vault

Markdown + Wikilinks

RETRIEVAL

LlamaIndex

Hybrid BM25 + vector

RRF fusion

asyncpg

async DB ops

INTERFACES

FastAPI

/api/ingest, /ask

Telegram Bot

telegram_bot.py

Chrome Extension

one-click capture

TROUBLESHOOTING

OpenSearch fails to start
Raise the virtual memory limit:

sysctl -w vm.max_map_count=262144
Telegram bot not responding
Check the token and your allowed user ID in .env, then restart:

docker compose restart dhi-telegram-bot
Ingest silently fails
Check the API logs for the real error:

docker compose logs dhi-api --tail 50
Notes not appearing in Obsidian
Verify the vault path in .env is the absolute path. The Docker container mounts it as a volume — the directory must exist before docker compose up.