Why I Built a RAG Pipeline Without LangChain (And What I Learned)
When I started building AskMyCourse, the natural first instinct was to reach for LangChain. It is, after all, the default framework most tutorials recommend for anything involving retrieval-augmented generation. But after a week of fighting with opaque abstractions, callback chains that swallowed errors silently, and configuration objects three layers deep, I made a decision that turned out to be one of the best architectural choices of the project: I threw it all out and built the retrieval pipeline from scratch.
The core of the system uses a hybrid retrieval strategy combining dense vector search (via OpenAI embeddings stored in a PostgreSQL pgvector index) with sparse BM25 keyword matching. The two result sets are merged using Reciprocal Rank Fusion, a technique that weights documents based on their rank position across multiple retrieval methods rather than raw scores. The elegance of RRF is that it does not require score normalization between fundamentally different ranking systems -- it just works. In practice, this gave us significantly better recall on edge-case queries where pure semantic search would miss keyword-heavy technical content, and pure keyword search would miss paraphrased questions.
The citation algorithm was another piece I built from the ground up. Most RAG implementations punt on citation quality -- they either dump all retrieved chunks as "sources" regardless of whether the model actually used them, or they rely on the LLM to self-cite (which is unreliable). Instead, I implemented a post-generation alignment step that maps spans of the generated answer back to specific source chunks using a combination of token overlap scoring and semantic similarity. The result is that every claim in the response carries a verifiable citation, and users can click through to the exact passage in their course material. This was the feature that drove the highest user satisfaction scores in early testing.
The lesson here is not that LangChain is bad -- it is a useful tool for prototyping and for teams that need to move fast on standard patterns. The lesson is that when your retrieval quality is the product, you need to own every layer of the pipeline. Abstractions are a tradeoff: they speed up the common case but make the uncommon case (which is where your differentiation lives) much harder to debug and optimize. For AskMyCourse, going framework-free meant I could iterate on retrieval quality in hours instead of days, and the final system outperformed every LangChain-based prototype I had built earlier.