March 2026 — AI researcher Andrej Karpathy has developed a personal knowledge management system that processes approximately 400,000 words of research notes using large language model embeddings rather than traditional retrieval-augmented generation (RAG) architecture. The system, described in a recent social media thread, enables semantic search, summarization, and content generation directly from his Obsidian vault without relying on external RAG pipelines.
What Karpathy Built
Karpathy's system represents an alternative approach to personal knowledge management that bypasses conventional RAG implementations. Instead of building a separate retrieval system that fetches documents to augment LLM prompts, his setup uses LLM embeddings to create a searchable knowledge base that can be queried directly.
The workflow follows several distinct stages:
- Source Dumping: Raw research materials are collected into a designated directory
- Markdown Conversion: An LLM processes these sources into linked Markdown documents
- Metadata Enhancement: The system adds summaries, extracts key concepts, and creates backlinks between related documents
- Obsidian Integration: The processed content is viewable in Obsidian, a popular knowledge management application
- Query Interface: Users can ask questions about the knowledge base using an LLM
- Output Generation: The system can produce notes, slides, or charts based on queries
- Feedback Loop: Generated outputs are fed back into the knowledge base
- Quality Assurance: Automated checks identify gaps and errors in the knowledge base
Technical Approach: Embeddings Over RAG
The key technical distinction from typical knowledge management systems is the avoidance of RAG architecture. While RAG systems typically involve:
- Chunking documents into smaller segments
- Creating vector embeddings for each chunk
- Storing embeddings in a vector database
- Retrieving relevant chunks at query time
- Injecting retrieved context into LLM prompts
Karpathy's approach appears to use embeddings differently—likely creating a comprehensive embedding of the knowledge base that enables direct querying without the retrieval step. This suggests either:
- A single embedding representing the entire knowledge base structure
- Hierarchical embeddings that capture document relationships
- A hybrid approach where embeddings facilitate navigation rather than retrieval
System Capabilities
According to the description, the system enables several specific functions:
- Semantic Search: Finding relevant information across 400,000 words of research notes
- Question Answering: Getting answers to specific questions about the research content
- Content Generation: Creating notes, slides, or charts based on the knowledge base
- Gap Analysis: Identifying missing information or inconsistencies in the research collection
- Concept Mapping: Visualizing relationships between different research topics
Implementation Details
While specific implementation details weren't provided, the system likely involves:
- Local Processing: All processing happens locally, maintaining privacy and control
- Obsidian Integration: Leverages Obsidian's graph view and linking capabilities
- LLM Orchestration: Coordinates multiple LLM calls for different processing stages
- Embedding Models: Uses embedding models to create semantic representations
- Feedback Mechanisms: Incorporates generated content back into the knowledge base
Why This Approach Matters
Karpathy's system represents a departure from the current RAG-dominated landscape for knowledge management with LLMs. By avoiding RAG architecture, the system potentially offers:
- Simpler Architecture: Fewer moving parts than typical RAG pipelines
- Direct Querying: More natural interaction with the knowledge base
- Integrated Workflow: Seamless movement between research collection and querying
- Local Control: Complete privacy and ownership of both data and processing
This approach aligns with Karpathy's historical preference for simple, elegant solutions over complex systems. His previous work on minGPT, nanoGPT, and educational materials has consistently emphasized clarity and accessibility over architectural complexity.
gentic.news Analysis
Karpathy's knowledge management system represents a notable departure from the industry's current RAG obsession. While virtually every enterprise AI implementation now includes some form of RAG for knowledge retrieval, Karpathy's approach suggests there may be simpler alternatives for personal knowledge management use cases.
This development follows Karpathy's pattern of building practical tools for his own workflow needs. His previous projects—from the original char-rnn in 2015 to nanoGPT in 2022—have often started as personal utilities before influencing broader industry practices. The timing is particularly interesting given the current market saturation of RAG-focused startups and tools. Just as companies like Pinecone, Weaviate, and Qdrant have built substantial businesses around vector databases for RAG implementations, Karpathy's approach questions whether all knowledge retrieval needs require such infrastructure.
From a technical perspective, the most intriguing aspect is how the system achieves semantic search without traditional retrieval. One possibility is that it uses embeddings to create a structured representation of the knowledge base that can be navigated directly, rather than using embeddings for similarity search. This could involve techniques like knowledge graph embeddings or hierarchical representations that capture both content and relationships.
The system also reflects the growing trend toward local, privacy-preserving AI tools. As LLMs become more capable of running on consumer hardware, we're seeing increased interest in systems that don't send sensitive data to external APIs. Karpathy's local-first approach aligns with developments like Ollama, LM Studio, and the increasing viability of quantized models on consumer hardware.
For practitioners, the key takeaway isn't necessarily to abandon RAG, but to consider the full spectrum of knowledge management approaches. RAG excels at certain tasks—particularly when dealing with large, frequently updated document collections—but simpler embedding-based approaches may suffice for personal research collections or smaller knowledge bases.
Frequently Asked Questions
How does Karpathy's system differ from traditional RAG?
Traditional RAG systems work by breaking documents into chunks, creating vector embeddings for each chunk, storing them in a vector database, and retrieving relevant chunks at query time to augment LLM prompts. Karpathy's system appears to use embeddings differently—likely creating a comprehensive representation of the entire knowledge base that enables direct querying without the separate retrieval step. This results in a simpler architecture with fewer components.
What are the advantages of avoiding RAG architecture?
Avoiding RAG can lead to several advantages: simpler system architecture with fewer moving parts, potentially faster query times by eliminating the retrieval step, reduced computational overhead from not maintaining a separate vector database, and more direct interaction with the knowledge base. For personal use cases with moderate-sized knowledge bases (like 400,000 words), a non-RAG approach may provide sufficient functionality with less complexity.
Can this approach scale to enterprise knowledge bases?
The scalability of Karpathy's approach compared to traditional RAG is unclear from the available information. RAG systems are specifically designed to handle massive document collections by distributing the search across many vector embeddings. Karpathy's approach might face challenges with very large knowledge bases, but could be optimal for personal or team-sized collections. The boundary where RAG becomes necessary likely depends on specific use cases and performance requirements.
What tools are needed to implement a similar system?
Based on the description, implementing a similar system would require: a local LLM capable of processing documents and answering queries, embedding models to create semantic representations of content, Obsidian or similar knowledge management software, and custom orchestration code to connect these components. The system appears to be custom-built rather than using off-the-shelf tools, suggesting implementation requires significant technical expertise.
How does this relate to other personal knowledge management systems?
Karpathy's system sits at the intersection of several trends: the personal knowledge management movement (tools like Obsidian, Roam Research, Logseq), local AI processing (Ollama, LM Studio), and semantic search. What distinguishes it is the specific avoidance of RAG architecture and the tight integration between document processing, querying, and content generation. Unlike many PKM tools that focus primarily on note-taking, this system emphasizes the entire lifecycle from research collection to output generation.








