Karpathy Self-Improving AI Knowledge Base

Build a compounding personal knowledge base that uses an AI librarian to organise, link, and surface your information — growing smarter every time you use it.

// TL;DR

The Karpathy Self-Improving AI Knowledge Base is a system for building a personal second brain where an AI acts as your librarian. You dump articles, notes, transcripts, and PDFs into a Raw folder, and the AI organises everything into a cross-linked Wiki, generates answers on demand, and runs monthly health checks to fix gaps and contradictions. Use it when you're tired of losing saved content and want a queryable knowledge base that compounds in value over time — getting smarter with every question you ask and every piece of content you add.

// When should I use the Karpathy Self-Improving AI Knowledge Base?

Use this skill when you want to stop losing saved articles, notes, and insights and instead build a queryable second brain that improves itself over time without requiring you to be the librarian.

// What do I need before building a Karpathy AI knowledge base?

  • Subject or domain focusrequired
    The specific topic or professional domain this knowledge base will cover, e.g. 'productivity', 'marketing strategy', 'investing'.
  • Existing raw knowledge materialrequired
    Articles, notes, meeting transcripts, book quotes, screenshots, PDFs, or any content the user already has on this topic.
  • AI environmentrequired
    The AI tool being used as the librarian (e.g. Claude with file-system access, Claude Cowork, or equivalent). Must be able to read and write local files or a connected folder.
  • Anti-AI writing style guide
    A set of writing rules instructing the AI to avoid generic AI prose — can be generated by pasting Wikipedia's AI writing style article into the AI and asking it to write rules to never do any of that.
  • Themed focus areas
    Three to five specific sub-themes within the domain that the knowledge base should deepen, used to tune the Claude MD.

// What are the core principles behind the self-improving AI knowledge base?

AI as Librarian

You are never the librarian. Your only job is to dump information into Raw. The AI organises, links, summarises, and indexes everything. The moment you start manually organising, you've broken the system.

Raw as Junk Drawer

Raw is a folder for capture, not organisation. Articles, notes, transcripts, screenshots, PDFs — everything goes in unedited and unsorted. Do not make it pretty. Tidiness is the AI's job.

Compounding Loop

Every answer the AI generates gets saved back into Raw or the Wiki. Each question makes the next answer better. The system compounds in value the more you use it — day one is basic, day 100 is a company asset nobody else has.

Claude MD as Schema

The Claude MD file at the root of each knowledge base is the instruction layer. It tells the AI how to read, organise, ingest, run health checks, and behave as a librarian. The system only works correctly if this file is precise and up to date.

Wiki is AI-Only Territory

The Wiki folder is never edited by hand. All content there is written and maintained exclusively by the AI. Human edits will introduce drift and break the integrity of the system.

Monthly Health Check

Once a month the AI audits the entire Wiki for contradictions, unsourced claims, orphaned references, stale articles, and coverage gaps — then proposes and drafts new articles to fill them. This is what prevents the system from slowly degrading.

Multiple Independent Knowledge Bases

A top-level second brain folder acts as a container. Inside it, each subject gets its own self-contained knowledge base folder with its own Claude MD. They remain independent but can be queried together.

// How do you build and maintain a Karpathy AI knowledge base step by step?

  1. 1

    Build the folder architecture

    Create a top-level second brain folder. Inside it, create one folder per knowledge base domain. Each domain folder must contain three subfolders — Raw, Wiki, Outputs — plus a Claude MD file at its root. Optionally add a top-level Claude MD that describes the container structure for all knowledge bases. Also create a Change Log MD file which doubles as a systems memory, recording when ingestion and health checks last ran. Do not skip the Claude MD; without it the AI has no schema to operate from.

  2. 2

    Write the Claude MD schema file

    The Claude MD must specify: (1) the subject and themed focus areas, (2) the folder roles — Raw as intake, Wiki as AI-written output, Outputs as generated reports, (3) ingestion rules — how the AI should process new Raw files and what constitutes 'processed', (4) wiki rules — one MD file per major topic, an index MD first, cross-links between related topics, anti-AI writing style applied, (5) output rules — every question-and-answer generates a report saved to Outputs, every report is presented as a clickable page in chat, (6) health check schedule and seven-stage audit process, (7) memory file rules — the system logs the last action date so it knows what is new. Work with the AI iteratively to improve this file before ingesting content.

  3. 3

    Dump all existing material into Raw

    Copy and paste articles, notes, screenshots, meeting transcripts, book quotes, PDFs into the Raw folder as markdown files. Do not organise or rename them. You can paste content directly into the AI chat and instruct it to save each item as an MD file in Raw. On Mac, Xcode (free) lets you create markdown files quickly by selecting File > New from Template > Markdown File. The Obsidian web clipper browser extension converts any web page to a clean markdown file in one click. PDFs are harder for the AI to parse — use them but expect lower fidelity. This step should take no more than 10–15 minutes for most people.

  4. 4

    Build the Wiki

    Point the AI at the knowledge base folder and give it this single prompt: 'Read everything in Raw and compile a Wiki in the Wiki folder following the rules in your Claude MD. Create the index MD first, then one MD file per major topic and link related topics.' Walk away and let it run — this takes around 30 minutes. Ensure the AI applies the anti-AI writing style guide during this step. What you get back is: topic pages with summaries, discovered connections between ideas, and a searchable index. If sessions are long, split across multiple sittings. You do not need RAG, vector databases, or embeddings — the LLM maintains an index and reads what it needs.

  5. 5

    Query the knowledge base and save outputs back

    Start a new session, point the AI at the knowledge base folder, and ask a question relevant to your domain. The AI reads the index, pulls the most relevant Wiki entries, and generates an answer citing sources. The answer must be saved as a report in Outputs — update the Claude MD if needed to enforce this rule. After reviewing the output, identify gaps: ask 'Based on everything in the Wiki, what are the three biggest gaps in my understanding of this topic?' Save that gap report to Outputs too. These outputs feed the next health check and improve future answers. This compounding loop is what makes the system grow.

  6. 6

    Run a monthly health check

    Once a month, run a seven-stage audit across the entire Wiki: (1) contradictions and inconsistent data between articles, (2) broken backlinks and orphaned references, (3) source provenance — claims not backed by a source in Raw, (4) coverage gaps relative to what Raw contains, (5) stale articles older than 90 days that are no longer relevant, (6) suggested new article candidates based on gaps, (7) suggested connections between articles not yet drawn. The AI files a health check report in Outputs and updates the Change Log. In phase two (interactive mode), the AI presents an action menu — choose which findings to action and the AI drafts, edits, and ingests accordingly. Schedule this as an automated task if your AI environment supports it, set to a different day per knowledge base so credits are spread across the month.

// What does a Karpathy AI knowledge base look like in practice?

A solo consultant wants a knowledge base on client communication and negotiation strategy. They have saved blog posts, book highlights, and notes from past client debrief meetings scattered across Notion and their downloads folder.

Create a folder called 'negotiation-kb' inside a top-level second brain folder. Write a Claude MD specifying three focus themes: persuasion principles, difficult conversation frameworks, and commercial negotiation tactics. Dump all the Notion exports, blog posts, and meeting notes into Raw as markdown files. Run the Wiki build prompt. After the Wiki is compiled, query: 'What does my knowledge base say about handling a client who disputes scope mid-project?' Save the report to Outputs. Run a health check at month end — the AI will likely flag that the Wiki lacks content on written vs verbal negotiation, and propose new article candidates. Draft those articles and ingest them.

A product manager wants to build a team knowledge base on UX research methods, drawing on their team's past research reports, saved articles, and conference talk notes.

Create a second brain folder with a domain folder called 'ux-research-kb'. Write a Claude MD with focus themes: qualitative research methods, synthesis techniques, and communicating findings to stakeholders. Note that the default Karpathy architecture assumes solo use — the Claude MD should be updated to acknowledge collaborative inputs and attribute sources to team members. Dump all team research reports as markdown. After building the Wiki, query: 'What methods does our knowledge base currently recommend for rapid generative research?' The gap report will likely surface that the base contains no content on remote or async research — use this to commission and ingest new content deliberately.

// What mistakes should I avoid when building an AI knowledge base?

  • Editing the Wiki folder by hand — this breaks AI-maintained integrity and introduces drift that compounds over time.
  • Over-organising the Raw folder before ingestion — Raw is a junk drawer by design; tidying it defeats the purpose and wastes time.
  • Skipping or under-specifying the Claude MD — without a precise schema the AI has no consistent instruction layer and outputs will be inconsistent across sessions.
  • Forgetting to save outputs back into the system — the compounding loop only works if answers and gap reports are saved to Outputs and eventually ingested back.
  • Running all health checks on the same day across multiple knowledge bases — this exhausts session credits unnecessarily; stagger them across the month.
  • Building the entire knowledge base in one session — the Wiki build and health checks are credit-intensive; plan for multiple sessions or a paid/max plan.
  • Treating day-one output as the final product — the system is deliberately weak at the start and only becomes a genuine asset around day 100 with consistent use and re-ingestion.
  • Ignoring the anti-AI writing style guide — without it, Wiki articles will accumulate generic AI prose that degrades readability and trustworthiness over time.
  • Not adding a memory file or Change Log — without it, the AI cannot tell what is new in Raw versus already processed, leading to duplicate or missed ingestion.

// What do the key terms in the Karpathy knowledge base system mean?

Second Brain
A personal knowledge base where you hold all your information, make connections between ideas, and use it to inform decisions and output — external to your own memory.
Raw
The junk drawer folder. All incoming material — articles, notes, screenshots, transcripts, PDFs — is dropped here unorganised. The AI processes it; the human never sorts it.
Wiki
The AI-written, AI-maintained organised version of Raw. One markdown file per major topic, cross-linked, indexed, and summarised. Never edited by hand.
Outputs
The folder where every AI-generated answer, briefing, report, or health check result is saved. Outputs feed back into the system to improve future answers.
Claude MD
The schema file at the root of each knowledge base. It instructs the AI on how to read, organise, ingest, generate outputs, and run health checks. The operating instructions for the AI librarian.
AI as Librarian
The core principle: the AI — not the human — is responsible for organising, linking, summarising, and indexing all content. The human's only job is to dump information and ask questions.
Compounding Loop
The feedback cycle in which outputs (answers, reports) are saved back into the system, making each subsequent answer better than the last.
Health Check
A monthly seven-stage audit run by the AI across the entire Wiki to find contradictions, unsourced claims, orphaned references, stale articles, coverage gaps, and new article candidates.
Ingestion
The process by which the AI reads new files in Raw, processes them, and either updates the Wiki or flags them — logging the action in the Change Log and memory file.
Change Log
A markdown file that doubles as the system's memory, recording when ingestion runs, health checks, and edits last occurred so the AI knows what is new on each pass.
Anti-AI Writing Style Guide
A set of writing rules — derived from Wikipedia's AI writing style guidelines — that instructs the AI to avoid generic AI prose when writing Wiki articles.
Guided Ingestion Mode
An optional interactive process where the AI walks the user through ingesting new material step by step, rather than the user dumping files silently.

// FREQUENTLY ASKED QUESTIONS

What is the Karpathy Self-Improving AI Knowledge Base?

It's a personal knowledge management system where an AI (like Claude with file access) acts as your librarian. You dump raw materials — articles, notes, transcripts, PDFs — into a folder, and the AI organises them into a cross-linked Wiki, answers your questions from that Wiki, and runs monthly health checks to fix contradictions and fill gaps. The system compounds in value because every output feeds back in, making future answers richer.

What is the Claude MD file in a self-improving knowledge base?

The Claude MD is the schema file at the root of each knowledge base that instructs the AI on how to behave. It specifies the subject focus, folder roles (Raw, Wiki, Outputs), ingestion rules, wiki formatting rules, output saving rules, health check procedures, and memory logging. Without a precise Claude MD, the AI has no consistent instructions and your outputs will vary wildly between sessions. Think of it as the operating manual for your AI librarian.

How do I build a Karpathy-style AI knowledge base in Claude?

Create a top-level second brain folder, then a domain subfolder containing Raw, Wiki, and Outputs folders plus a Claude MD schema file and a Change Log. Write the Claude MD specifying your subject, focus themes, and rules. Dump all your existing notes and articles into Raw as markdown. Then prompt Claude to read Raw and compile the Wiki following your schema. Start querying, save outputs back, and run monthly health checks to keep the system improving.

How do I run a health check on my AI knowledge base?

Prompt the AI to run a seven-stage audit across your Wiki: check for contradictions between articles, broken backlinks, unsourced claims, coverage gaps relative to Raw, stale articles older than 90 days, new article candidates, and undiscovered connections. The AI files a health check report in Outputs and updates the Change Log. Then enter interactive mode where you choose which findings to action — the AI drafts fixes and new articles accordingly. Run this monthly.

How does the Karpathy AI knowledge base compare to using Notion or Obsidian alone?

Notion and Obsidian require you to be the librarian — you manually organise, tag, link, and maintain everything. The Karpathy system offloads all organisation to the AI. You only dump content and ask questions. Obsidian is actually useful as a companion tool (its web clipper converts pages to markdown), but the key difference is the compounding loop: every query and health check automatically improves the knowledge base, which traditional tools cannot do without manual effort.

When should I use a self-improving AI knowledge base instead of just searching my notes?

Use it when you have more than a handful of saved articles, notes, or documents on a subject and you keep losing or forgetting what you've saved. It's especially valuable when you need to synthesise across sources — connecting ideas from a book highlight with a meeting transcript and an article — rather than just retrieving a single file. If your knowledge is scattered across tools and folders, this system consolidates and compounds it.

What results can I expect from using the Karpathy AI knowledge base after a few months?

Day one output is basic — the AI has limited material to work with. By day 30, after regular content dumps and a health check, answers become notably richer with cross-referenced sources. By day 100, with consistent use and re-ingestion, you have a genuine strategic asset: a queryable knowledge base that surfaces non-obvious connections, identifies gaps in your understanding, and generates briefings that would take hours to produce manually. The compounding loop is the entire point.

Do I need RAG or vector databases to build this knowledge base?

No. The Karpathy system deliberately avoids RAG, vector databases, and embeddings. Instead, the AI maintains a markdown index file in the Wiki folder and reads relevant entries when answering questions. This keeps the system simple, portable, and entirely within a tool like Claude with file-system access. The markdown-based architecture means you can move it between tools, back it up, and read it yourself without any infrastructure.

Can I build multiple knowledge bases on different topics?

Yes. The architecture supports multiple independent knowledge bases inside one top-level second brain folder. Each domain gets its own folder with its own Raw, Wiki, Outputs subfolders and its own Claude MD schema. They remain self-contained but can be queried together if needed. Stagger health checks across the month — one per week rather than all at once — to avoid exhausting AI session credits.

What is the compounding loop in the Karpathy knowledge base?

The compounding loop is the feedback cycle where every answer, report, and health check result the AI generates gets saved back into the system. When you ask a question, the answer goes to Outputs. Gap reports go to Outputs. Health check findings update the Wiki and Change Log. Each of these additions makes the next query richer and more accurate. This is what transforms a static collection of notes into a living, self-improving asset.

What is the anti-AI writing style guide and why does it matter?

The anti-AI writing style guide is a set of rules that instruct the AI to avoid generic AI prose — hedging phrases, filler sentences, overly formal tone, and repetitive structures. You create it by pasting Wikipedia's article on AI-generated writing into Claude and asking it to write rules to never do any of those things. Without it, your Wiki articles accumulate bland, untrustworthy language that degrades readability and makes the knowledge base feel like a content farm.

// GET STARTED

Turn Any YouTube Video Into An AI Skill

SkillForge captures a creator's exact methodology from their video and turns it into a reusable AI skill you can invoke in Claude, ChatGPT, or any LLM.

Forge your own skill