Karpathy Self-Improving AI Knowledge Base

Last updated: 24 May 2026

Build a compounding personal knowledge base that uses an AI librarian to organise, link, and surface your information — growing smarter every time you use it.

// TL;DR

The Karpathy Self-Improving AI Knowledge Base is a framework for building a personal knowledge system where an AI librarian (like Claude) automatically organizes, links, summarizes, and indexes all your saved articles, notes, and insights. You dump raw material into a folder, the AI compiles a structured Wiki, and every query you make feeds back into the system — making it smarter over time. Use it when you're tired of losing saved content across apps and want a compounding second brain that grows more valuable with each interaction, without you ever having to manually organize anything.

Framework

// When should you use the Karpathy Self-Improving AI Knowledge Base?

Use this skill when you want to stop losing saved articles, notes, and insights and instead build a queryable second brain that improves itself over time without requiring you to be the librarian.

// What do you need to get started with the Karpathy AI Knowledge Base?

Subject or domain focusrequired
The specific topic or professional domain this knowledge base will cover, e.g. 'productivity', 'marketing strategy', 'investing'.
Existing raw knowledge materialrequired
Articles, notes, meeting transcripts, book quotes, screenshots, PDFs, or any content the user already has on this topic.
AI environmentrequired
The AI tool being used as the librarian (e.g. Claude with file-system access, Claude Cowork, or equivalent). Must be able to read and write local files or a connected folder.
Anti-AI writing style guide
A set of writing rules instructing the AI to avoid generic AI prose — can be generated by pasting Wikipedia's AI writing style article into the AI and asking it to write rules to never do any of that.
Themed focus areas
Three to five specific sub-themes within the domain that the knowledge base should deepen, used to tune the Claude MD.

// What are the core principles behind the Karpathy AI Knowledge Base?

AI as Librarian

You are never the librarian. Your only job is to dump information into Raw. The AI organises, links, summarises, and indexes everything. The moment you start manually organising, you've broken the system.

Raw as Junk Drawer

Raw is a folder for capture, not organisation. Articles, notes, transcripts, screenshots, PDFs — everything goes in unedited and unsorted. Do not make it pretty. Tidiness is the AI's job.

Compounding Loop

Every answer the AI generates gets saved back into Raw or the Wiki. Each question makes the next answer better. The system compounds in value the more you use it — day one is basic, day 100 is a company asset nobody else has.

Claude MD as Schema

The Claude MD file at the root of each knowledge base is the instruction layer. It tells the AI how to read, organise, ingest, run health checks, and behave as a librarian. The system only works correctly if this file is precise and up to date.

Wiki is AI-Only Territory

The Wiki folder is never edited by hand. All content there is written and maintained exclusively by the AI. Human edits will introduce drift and break the integrity of the system.

Monthly Health Check

Once a month the AI audits the entire Wiki for contradictions, unsourced claims, orphaned references, stale articles, and coverage gaps — then proposes and drafts new articles to fill them. This is what prevents the system from slowly degrading.

Multiple Independent Knowledge Bases

A top-level second brain folder acts as a container. Inside it, each subject gets its own self-contained knowledge base folder with its own Claude MD. They remain independent but can be queried together.

// How do you build a Karpathy AI Knowledge Base step by step?

1
Build the folder architecture
Create a top-level second brain folder. Inside it, create one folder per knowledge base domain. Each domain folder must contain three subfolders — Raw, Wiki, Outputs — plus a Claude MD file at its root. Optionally add a top-level Claude MD that describes the container structure for all knowledge bases. Also create a Change Log MD file which doubles as a systems memory, recording when ingestion and health checks last ran. Do not skip the Claude MD; without it the AI has no schema to operate from.
2
Write the Claude MD schema file
The Claude MD must specify: (1) the subject and themed focus areas, (2) the folder roles — Raw as intake, Wiki as AI-written output, Outputs as generated reports, (3) ingestion rules — how the AI should process new Raw files and what constitutes 'processed', (4) wiki rules — one MD file per major topic, an index MD first, cross-links between related topics, anti-AI writing style applied, (5) output rules — every question-and-answer generates a report saved to Outputs, every report is presented as a clickable page in chat, (6) health check schedule and seven-stage audit process, (7) memory file rules — the system logs the last action date so it knows what is new. Work with the AI iteratively to improve this file before ingesting content.
3
Dump all existing material into Raw
Copy and paste articles, notes, screenshots, meeting transcripts, book quotes, PDFs into the Raw folder as markdown files. Do not organise or rename them. You can paste content directly into the AI chat and instruct it to save each item as an MD file in Raw. On Mac, Xcode (free) lets you create markdown files quickly by selecting File > New from Template > Markdown File. The Obsidian web clipper browser extension converts any web page to a clean markdown file in one click. PDFs are harder for the AI to parse — use them but expect lower fidelity. This step should take no more than 10–15 minutes for most people.
4
Build the Wiki
Point the AI at the knowledge base folder and give it this single prompt: 'Read everything in Raw and compile a Wiki in the Wiki folder following the rules in your Claude MD. Create the index MD first, then one MD file per major topic and link related topics.' Walk away and let it run — this takes around 30 minutes. Ensure the AI applies the anti-AI writing style guide during this step. What you get back is: topic pages with summaries, discovered connections between ideas, and a searchable index. If sessions are long, split across multiple sittings. You do not need RAG, vector databases, or embeddings — the LLM maintains an index and reads what it needs.
5
Query the knowledge base and save outputs back
Start a new session, point the AI at the knowledge base folder, and ask a question relevant to your domain. The AI reads the index, pulls the most relevant Wiki entries, and generates an answer citing sources. The answer must be saved as a report in Outputs — update the Claude MD if needed to enforce this rule. After reviewing the output, identify gaps: ask 'Based on everything in the Wiki, what are the three biggest gaps in my understanding of this topic?' Save that gap report to Outputs too. These outputs feed the next health check and improve future answers. This compounding loop is what makes the system grow.
6
Run a monthly health check
Once a month, run a seven-stage audit across the entire Wiki: (1) contradictions and inconsistent data between articles, (2) broken backlinks and orphaned references, (3) source provenance — claims not backed by a source in Raw, (4) coverage gaps relative to what Raw contains, (5) stale articles older than 90 days that are no longer relevant, (6) suggested new article candidates based on gaps, (7) suggested connections between articles not yet drawn. The AI files a health check report in Outputs and updates the Change Log. In phase two (interactive mode), the AI presents an action menu — choose which findings to action and the AI drafts, edits, and ingests accordingly. Schedule this as an automated task if your AI environment supports it, set to a different day per knowledge base so credits are spread across the month.

// What are real-world examples of the Karpathy AI Knowledge Base in action?

A solo consultant wants a knowledge base on client communication and negotiation strategy. They have saved blog posts, book highlights, and notes from past client debrief meetings scattered across Notion and their downloads folder.

Create a folder called 'negotiation-kb' inside a top-level second brain folder. Write a Claude MD specifying three focus themes: persuasion principles, difficult conversation frameworks, and commercial negotiation tactics. Dump all the Notion exports, blog posts, and meeting notes into Raw as markdown files. Run the Wiki build prompt. After the Wiki is compiled, query: 'What does my knowledge base say about handling a client who disputes scope mid-project?' Save the report to Outputs. Run a health check at month end — the AI will likely flag that the Wiki lacks content on written vs verbal negotiation, and propose new article candidates. Draft those articles and ingest them.

A product manager wants to build a team knowledge base on UX research methods, drawing on their team's past research reports, saved articles, and conference talk notes.

Create a second brain folder with a domain folder called 'ux-research-kb'. Write a Claude MD with focus themes: qualitative research methods, synthesis techniques, and communicating findings to stakeholders. Note that the default Karpathy architecture assumes solo use — the Claude MD should be updated to acknowledge collaborative inputs and attribute sources to team members. Dump all team research reports as markdown. After building the Wiki, query: 'What methods does our knowledge base currently recommend for rapid generative research?' The gap report will likely surface that the base contains no content on remote or async research — use this to commission and ingest new content deliberately.

// What mistakes should you avoid when building an AI Knowledge Base?

Editing the Wiki folder by hand — this breaks AI-maintained integrity and introduces drift that compounds over time.
Over-organising the Raw folder before ingestion — Raw is a junk drawer by design; tidying it defeats the purpose and wastes time.
Skipping or under-specifying the Claude MD — without a precise schema the AI has no consistent instruction layer and outputs will be inconsistent across sessions.
Forgetting to save outputs back into the system — the compounding loop only works if answers and gap reports are saved to Outputs and eventually ingested back.
Running all health checks on the same day across multiple knowledge bases — this exhausts session credits unnecessarily; stagger them across the month.
Building the entire knowledge base in one session — the Wiki build and health checks are credit-intensive; plan for multiple sessions or a paid/max plan.
Treating day-one output as the final product — the system is deliberately weak at the start and only becomes a genuine asset around day 100 with consistent use and re-ingestion.
Ignoring the anti-AI writing style guide — without it, Wiki articles will accumulate generic AI prose that degrades readability and trustworthiness over time.
Not adding a memory file or Change Log — without it, the AI cannot tell what is new in Raw versus already processed, leading to duplicate or missed ingestion.

// What do the key terms in the Karpathy AI Knowledge Base mean?

Second Brain: A personal knowledge base where you hold all your information, make connections between ideas, and use it to inform decisions and output — external to your own memory.
Raw: The junk drawer folder. All incoming material — articles, notes, screenshots, transcripts, PDFs — is dropped here unorganised. The AI processes it; the human never sorts it.
Wiki: The AI-written, AI-maintained organised version of Raw. One markdown file per major topic, cross-linked, indexed, and summarised. Never edited by hand.
Outputs: The folder where every AI-generated answer, briefing, report, or health check result is saved. Outputs feed back into the system to improve future answers.
Claude MD: The schema file at the root of each knowledge base. It instructs the AI on how to read, organise, ingest, generate outputs, and run health checks. The operating instructions for the AI librarian.
AI as Librarian: The core principle: the AI — not the human — is responsible for organising, linking, summarising, and indexing all content. The human's only job is to dump information and ask questions.
Compounding Loop: The feedback cycle in which outputs (answers, reports) are saved back into the system, making each subsequent answer better than the last.
Health Check: A monthly seven-stage audit run by the AI across the entire Wiki to find contradictions, unsourced claims, orphaned references, stale articles, coverage gaps, and new article candidates.
Ingestion: The process by which the AI reads new files in Raw, processes them, and either updates the Wiki or flags them — logging the action in the Change Log and memory file.
Change Log: A markdown file that doubles as the system's memory, recording when ingestion runs, health checks, and edits last occurred so the AI knows what is new on each pass.
Anti-AI Writing Style Guide: A set of writing rules — derived from Wikipedia's AI writing style guidelines — that instructs the AI to avoid generic AI prose when writing Wiki articles.
Guided Ingestion Mode: An optional interactive process where the AI walks the user through ingesting new material step by step, rather than the user dumping files silently.

// FREQUENTLY ASKED QUESTIONS

What is the Karpathy Self-Improving AI Knowledge Base?

It's a framework where you dump raw notes, articles, and transcripts into a folder and an AI (like Claude) automatically organizes them into a structured, cross-linked Wiki. A schema file called Claude MD instructs the AI on how to ingest, organize, and audit content. Every question you ask and every answer generated gets saved back, creating a compounding loop that makes the system smarter with every use.

What is the Claude MD file and why is it important?

The Claude MD is a markdown schema file placed at the root of each knowledge base folder. It tells the AI how to read Raw files, build the Wiki, generate outputs, apply writing style rules, and run monthly health checks. Without a precise Claude MD, the AI has no consistent instruction layer and outputs will vary unpredictably across sessions. It's the operating system for your AI librarian.

How do I set up a Karpathy-style AI knowledge base from scratch?

Create a top-level second brain folder, then a domain subfolder with three subfolders inside: Raw, Wiki, and Outputs. Add a Claude MD schema file at the domain root specifying subject focus, folder roles, ingestion rules, wiki rules, and health check procedures. Dump all your existing notes and articles into Raw as markdown files. Then prompt your AI to read everything in Raw and compile a Wiki following the Claude MD instructions.

How do I add new content to my AI knowledge base?

Drop new articles, notes, transcripts, screenshots, or PDFs into the Raw folder as markdown files — don't organize or rename them. You can paste content directly into the AI chat and instruct it to save each item to Raw. Tools like Obsidian's web clipper can convert web pages to markdown in one click. Then run an ingestion pass where the AI processes new Raw files and updates the Wiki accordingly.

How does the Karpathy AI knowledge base compare to using Notion or Obsidian alone?

Traditional tools like Notion and Obsidian require you to manually organize, tag, and link content — making you the librarian. The Karpathy method offloads all organization to the AI. You never sort, tag, or structure anything yourself. The AI builds the index, writes summaries, discovers connections, and runs monthly audits. Notion and Obsidian can still serve as capture tools, but the AI handles everything after the initial dump.

When should I use the Karpathy Self-Improving AI Knowledge Base?

Use it when you have valuable information scattered across apps, downloads, and note-taking tools and you keep losing or forgetting what you saved. It's ideal when you want a queryable second brain that organizes itself, especially if you're a consultant, researcher, product manager, or knowledge worker who needs to synthesize large volumes of material without spending hours filing and tagging.

What results can I expect from using this AI knowledge base system?

Day one output is basic — the AI can only work with what you've fed it. By day 30 with regular use, you'll have a structured Wiki with cross-linked topics and discoverable connections you didn't see yourself. By day 100 with consistent dumping and querying, you'll have a compounding asset that gives increasingly specific, well-sourced answers. Monthly health checks ensure the system doesn't degrade over time.

What is the compounding loop in the Karpathy knowledge base?

The compounding loop is the feedback cycle where every AI-generated answer, report, or gap analysis gets saved back into the Outputs folder and eventually re-ingested into the system. Each question you ask improves the next answer because the AI has more context to draw from. This is what makes the knowledge base grow smarter over time rather than remaining static like a traditional note-taking system.

Do I need RAG or vector databases for this knowledge base?

No. The Karpathy method does not require RAG, vector databases, or embeddings. The AI maintains a markdown index file in the Wiki and reads relevant entries as needed. The structure relies on the AI's ability to parse markdown files within a connected folder system. This makes it accessible to non-technical users — you only need an AI tool like Claude with file-system access.

What is a health check in the Karpathy AI knowledge base?

A health check is a monthly seven-stage audit the AI runs across the entire Wiki. It checks for contradictions between articles, broken backlinks, unsourced claims, coverage gaps, stale content older than 90 days, suggested new articles, and missing connections. The AI files a report in Outputs and presents an action menu so you can choose which findings to address. This prevents the system from slowly degrading.

// GET THIS SKILL — FREE