Frequently Asked Questions About Karpathy Self-Improving AI Knowledge Base

21 answers covering everything from basics to advanced usage.

// Basics

What is the difference between the Raw folder and the Wiki folder?

Raw is your unorganised junk drawer — every article, note, transcript, and PDF goes in without sorting or renaming. The Wiki is the AI-organised version: one markdown file per major topic, cross-linked, indexed, and summarised. The critical rule is that you never edit the Wiki by hand. You only touch Raw (by dumping content in). The AI reads Raw, processes it, and writes the Wiki. This separation is what makes the system self-maintaining.

Can I use this system with ChatGPT instead of Claude?

The system was designed for Claude with file-system access (Claude Cowork or equivalent), but the architecture is tool-agnostic in principle. You need an AI that can read and write local files or a connected folder. ChatGPT with Code Interpreter or a plugin that accesses files could work, though the Claude MD file would need renaming and adaptation. The key requirement is persistent file read/write access — if your AI tool has that, the system can be adapted.

How many articles or notes do I need before this system is worth setting up?

The system becomes worthwhile once you have roughly 15 to 20 pieces of content on a single domain. Below that threshold, you can hold the connections in your head. Above it, you start losing track of what you've saved and the AI's ability to surface cross-references and gaps becomes genuinely valuable. That said, even starting with five items works — the system is designed to compound, so starting small and adding consistently matters more than a large initial dump.

Should I use themed focus areas or leave them open?

Use themed focus areas. Specifying three to five sub-themes in your Claude MD gives the AI a lens for prioritising and organising content. Without them, the Wiki becomes a flat collection of summaries with no depth hierarchy. Focus areas tell the AI which topics deserve dedicated, detailed articles versus brief mentions. You can update focus areas over time as your interests shift — just edit the Claude MD and run a health check to realign the Wiki accordingly.

// How To

How long does it take to set up the Karpathy knowledge base from scratch?

The initial setup — creating the folder structure, writing the Claude MD, and dumping existing material into Raw — takes about 30 to 45 minutes. The AI then needs roughly 30 minutes to build the Wiki from your Raw files. So plan for about 60 to 90 minutes total for day one. The ongoing time investment is minimal: dump new content into Raw as you find it (seconds per item) and run a monthly health check (15 to 30 minutes of interactive review).

How do I write a good Claude MD schema file?

Start by specifying your subject and three to five themed focus areas. Then define folder roles explicitly: Raw for intake, Wiki for AI-written output, Outputs for generated reports. Include ingestion rules (how the AI processes new Raw files), wiki rules (one MD per topic, index first, cross-links, anti-AI style), output rules (every Q&A saved as a report), health check procedures (seven-stage audit), and memory file rules. Iterate with the AI — paste a draft Claude MD and ask it to identify ambiguities or missing instructions.

What types of content work best in Raw?

Markdown files and plain text work best — the AI parses them cleanly. The Obsidian web clipper converts web pages to markdown in one click, making it ideal for articles. Book highlights, meeting transcripts, and notes paste directly as markdown. PDFs work but expect lower fidelity because the AI struggles with complex formatting, tables, and images in PDFs. Screenshots with text are usable if your AI environment supports image reading. Prioritise text-based formats for highest accuracy.

What's the best way to capture content quickly into Raw on a phone?

Use a share-to-folder workflow. On iPhone, the Shortcuts app can create a shortcut that saves shared text or URLs as markdown files to an iCloud or Dropbox folder synced with your Raw directory. On Android, apps like Tasker or simple note-to-file apps achieve the same. The key is reducing friction to near zero — if dumping content into Raw takes more than ten seconds, you'll stop doing it. The content doesn't need to be clean; messy is by design.

// Troubleshooting

What if my Claude session runs out of credits mid-Wiki-build?

This is expected for large knowledge bases. Split the work across multiple sessions. Prompt the AI to build the index first, then process Raw files in batches — for example, ten files per session. The Change Log and memory file track what has been processed, so the AI knows where to resume. Stagger health checks and ingestion runs across days. A paid or max-tier plan helps, but even free-tier users can build the system incrementally.

Why is my Wiki full of generic AI-sounding language?

You either skipped the anti-AI writing style guide or didn't include it in the Claude MD. Generate one by pasting Wikipedia's article on AI-generated writing patterns into Claude and asking it to write rules to avoid every listed pattern. Add those rules to your Claude MD under wiki formatting rules. Then re-run the Wiki build. Without this guide, the AI defaults to hedging phrases, filler sentences, and bland prose that makes the knowledge base feel untrustworthy and hard to read.

The AI keeps re-processing files it already ingested. How do I fix this?

You're missing a Change Log or memory file. Create a Change Log markdown file at the root of the knowledge base and add rules to your Claude MD requiring the AI to log every ingestion run with timestamps and file names. On each new session, the AI reads the Change Log to determine what in Raw is new versus already processed. Without this file, the AI has no memory between sessions and will re-ingest everything, wasting credits and creating duplicates.

What happens if I accidentally edit a Wiki file by hand?

You introduce drift that compounds over time. The AI assumes it wrote and controls all Wiki content. When you manually edit, the AI doesn't know about the change, so future health checks and ingestion runs may overwrite your edits, create contradictions, or produce duplicate content. If you've already edited Wiki files, the cleanest fix is to add the corrected information as a new file in Raw and let the AI re-process it during the next ingestion or health check cycle.

Is there a risk of the AI hallucinating in Wiki articles?

Yes, and the health check is designed to catch this. Stage three of the seven-stage audit specifically checks source provenance — claims in the Wiki that aren't backed by a source in Raw. This surfaces hallucinated content so you can remove or correct it. The anti-AI writing style guide also helps by reducing the AI's tendency toward confident-sounding but unsupported generalizations. Run health checks consistently and treat unsourced claims as action items.

// Comparisons

How does this compare to using a traditional Zettelkasten system?

A Zettelkasten requires you to write atomic notes, manually link them, and maintain the system yourself — you are the librarian. The Karpathy system offloads linking, summarising, indexing, and maintenance to the AI. The trade-off: Zettelkasten forces deep processing of each idea during note-writing, which aids retention. The AI knowledge base optimises for retrieval and synthesis over personal encoding. They can complement each other — use Zettelkasten for deep thinking, the AI knowledge base for comprehensive coverage and querying.

How is this different from just uploading files to Claude and asking questions?

Uploading files to Claude ad hoc gives you one-shot answers with no memory between sessions. The Karpathy system creates persistent, structured knowledge that compounds. The Wiki provides pre-organised context so the AI doesn't re-read everything each time. The Outputs folder builds institutional memory. The health check catches drift and gaps. Without the system architecture, you're starting from zero every session. With it, session 50 is dramatically more powerful than session one.

What's the difference between this and Mem, Reflect, or other AI note-taking apps?

AI note-taking apps like Mem and Reflect handle capture and basic linking but keep you as the organiser and don't run systematic health checks. The Karpathy system is more opinionated: strict folder separation (Raw, Wiki, Outputs), the AI as sole Wiki editor, a formal schema (Claude MD), and a monthly seven-stage audit. It's also tool-agnostic and file-based — you own the markdown files, can back them up anywhere, and aren't locked into a proprietary platform. The trade-off is more initial setup effort.

// Advanced

Can I use this for a team, not just personal use?

Yes, with modifications. Update the Claude MD to acknowledge collaborative inputs and attribute sources to team members. Establish a convention for who dumps content into Raw (everyone) and who triggers health checks (one person, monthly). The Wiki remains AI-only territory regardless of team size. The main challenge is concurrent access — if multiple people add to Raw simultaneously, ensure the AI processes changes sequentially. Shared folders via cloud storage (Dropbox, Google Drive) work for this.

Can I query across multiple knowledge bases at the same time?

Yes. If your knowledge bases sit inside one top-level second brain folder, you can point the AI at the parent folder and ask cross-domain questions. An optional top-level Claude MD can describe the container structure so the AI knows how to navigate between domains. However, cross-domain queries consume more credits because the AI reads multiple indexes. For routine questions, stay within one domain. Reserve cross-domain queries for strategic synthesis — like connecting marketing strategy insights with negotiation tactics.

How do I know if my knowledge base is actually improving over time?

Three signals: First, compare the quality and depth of answers to the same question at day 1 versus day 60 — you should see more sources cited and more nuanced responses. Second, health check reports should show fewer coverage gaps over time. Third, the AI should start surfacing non-obvious connections between ideas that you didn't explicitly add. If none of these are happening, check that you're consistently saving outputs back and running monthly health checks — the compounding loop only works if both halves are active.

What does guided ingestion mode mean and when should I use it?

Guided ingestion mode is an optional interactive process where the AI walks you through ingesting new material step by step instead of silently processing everything in Raw. Use it when you've added content that needs context the AI can't infer — for example, meeting notes where the AI doesn't know who the participants were or why the meeting matters. During guided ingestion, the AI asks clarifying questions before filing content into the Wiki, producing more accurate and useful entries.

How do I prevent my knowledge base from becoming bloated with outdated content?

The monthly health check handles this through stage five: identifying stale articles older than 90 days that are no longer relevant. The AI flags them and proposes actions — archive, update, or delete. You make the call during the interactive phase. Additionally, running the coverage gap analysis (stage four) ensures new content displaces outdated material rather than just piling on top. Consistent monthly health checks are the single most important habit for preventing bloat.