Frequently Asked Questions About Matt Giaro AI Second Brain Build
21 answers covering everything from basics to advanced usage.
// Basics
What is the dumping ground problem in personal knowledge management?
The dumping ground problem is Matt Giaro's term for the failure mode where information is saved into a system but never resurfaces when needed. Most second brain systems — Notion databases, Evernote notebooks, bookmark folders — suffer from this. The AI Second Brain solves it by actively processing saved content into cross-linked wiki pages and grounding all AI responses in that content, so information is retrieved automatically rather than manually searched for.
What is Karpathy's LLM Wiki and how does it relate to this system?
Karpathy's LLM Wiki is Andrej Karpathy's architectural concept for using an LLM to maintain a self-building, cross-linked markdown wiki from raw source material. Matt Giaro uses this as the foundational wiki layer of the AI Second Brain and extends it with Journal and CRM pillars. The wiki is not static — every query, journal entry, and new ingestion updates it by adding pages, cross-linking entities, and logging changes, making the system grow smarter with every interaction.
What is the RAW to Processed pipeline in the AI second brain?
The RAW to Processed pipeline is the ingestion mechanism. All source material — web clips, YouTube transcripts, articles — enters the RAW folder untouched and immutable. Once the AI processes a source into wiki pages, extracts entities, updates the index and log, and creates cross-links, the source file moves to RAW/Processed. This provides a clear audit trail of what has and has not been ingested, preventing duplicate processing and ensuring nothing falls through the cracks.
How many sources should I clip before the system becomes useful?
Seed with a minimum of 5-10 sources to give the wiki meaningful cross-linking from the start. The system becomes noticeably more valuable around 20-30 processed sources as entity overlap creates dense interconnections visible in the graph view. However, even a single well-processed source provides grounded responses. The key is consistent clipping over time — the hourly automation handles processing, so your only job is to clip content whenever you encounter something worth saving.
// How To
How do I seed my AI second brain with initial content?
Navigate to relevant YouTube videos, articles, or web pages and use the Obsidian Web Clipper to save each one directly to the RAW folder. Seed with at least 5-10 sources to give the wiki meaningful cross-linking material from the start. A useful first seed is ingesting the Karpathy LLM Wiki GitHub page itself. Then open Codex, start a new chat in your project, and prompt 'Process the files inside the RAW folder' to build out the initial wiki.
How do I add a person to the CRM in the AI second brain?
Open a new Codex chat in your second brain project and say 'Add to CRM: [Name] — met at [event], discussed [topic], their role is [X], follow up about [Y].' The system creates a contact file in the /CRM folder, captures all details, cross-links to any wiki pages on discussed topics, updates /CRM/index.md alphabetically with a short bio, and logs the update in log.md. Later, ask 'What did I discuss with [Name]?' to retrieve the full context.
How do I write a journal entry in the AI second brain?
Start a new Codex chat and begin with the word 'journal' on the first line, then write your entry below it. The 'journal' prefix triggers the journal-handling logic in agents.md. The AI will respond with advice grounded in your wiki content and past journal entries — not generic LLM output. The full conversation is saved as a markdown file in /Journal named [date]-[short-title].md, and the journal index and log are updated automatically.
How do I set up the hourly automation for processing new content?
In Codex, go to Automations → New Automation. Title it 'Process Second Brain RAW Files'. Set Work tree to Local, Project to your Second Brain, and Schedule to Hourly. Use the prompt: 'If there are any unprocessed files inside the RAW directory, please process them.' Select the strongest available model on high reasoning. Optionally append: 'Once everything is processed, commit and push the current version to the main branch on GitHub.' Now you only need to clip — processing happens automatically.
How do I check if my AI second brain is actually working?
Use three tests: (1) Ask a question in Codex and verify the response cites specific saved sources rather than generic knowledge; (2) Check the Obsidian graph view — a densely interconnected graph indicates healthy cross-linking; (3) Review log.md for a running history of all ingestions, wiki updates, and journal entries. If the graph is flat, responses are generic, or log.md is sparse, your agents.md processing rules need debugging. The graph view is the single best health metric.
// Troubleshooting
Why is my Obsidian graph view flat and disconnected after weeks of use?
A flat, disconnected graph view after weeks indicates the cross-linking rules in agents.md are not functioning correctly. Check that agents.md includes explicit instructions to cross-link wiki pages back to the original source page and to link related concepts, entities, and people across pages. Also verify that the AI is extracting entities (people, companies, tools, ideas) during processing. Re-process a few sources manually and check whether the resulting wiki pages contain internal links.
Why is the AI giving me generic responses instead of citing my saved content?
Generic responses mean the AI is not grounding its answers in your wiki. Check three things: (1) agents.md must explicitly instruct the AI to search wiki content before responding; (2) your RAW files must have been processed into wiki pages — check RAW/Processed to confirm; (3) the index.md must be up-to-date so the AI can scan your full knowledge base efficiently. If the wiki is empty or the index is stale, the AI falls back to its generic training data.
My journal entries aren't being saved as files — what went wrong?
Journal entries only save when the chat begins with the 'journal' trigger word. Without it, the system treats the entry as a regular wiki query and does not create a journal file. Ensure your message starts with 'journal' on the first line. Also verify that agents.md contains the journal-handling rules: save conversation to /Journal as [date]-[short-title].md, update /Journal/index.md, and log the entry in log.md.
Codex created too many files during the initial build — how do I fix it?
Codex often over-builds the architecture if not constrained. Explicitly prompt it to prune back to the minimal Karpathy structure: /RAW (source material), /RAW/Processed (ingested sources), /RAW/Assets (optional), /Wiki (generated pages), agents.md, index.md, and log.md. Say something like 'Remove all files and folders except [list]. The architecture should match the minimal Karpathy LLM Wiki structure.' Then verify the file tree before proceeding.
Where should I add the YouTube channel name — the wiki page or the source page?
Add the YouTube channel name to the front matter of the original source page in the RAW folder, not to the generated wiki page. This is a common pitfall. The source page is the record of where the content came from; the wiki page is the processed knowledge. Keeping provenance metadata on the source page ensures clean wiki pages focused on concepts and entities while maintaining full attribution in the source archive.
What happens if I edit agents.md incorrectly and break the system?
If you have GitHub backup connected, simply revert to the previous commit. If not, the system is resilient because agents.md is just a prompt file — the worst case is the AI ignores or misinterprets a rule. You can always re-prompt Codex to regenerate agents.md based on the original framework rules. Keep a copy of your working agents.md before making changes. Since all source files remain in RAW/Processed, no content is lost even if processing rules break temporarily.
// Comparisons
How does the AI second brain compare to using ChatGPT with custom instructions?
ChatGPT with custom instructions gives you persona-level customisation but no persistent, growing knowledge base. Every conversation starts from scratch — it cannot cite a video you saved last week or detect patterns across your journal entries. The AI Second Brain maintains a persistent, cross-linked wiki that the AI queries before responding. Your knowledge compounds over time. ChatGPT's memory feature is shallow compared to a full wiki with entity extraction, cross-linking, and pattern detection.
How does this compare to Mem, Reflect, or other AI-native note-taking apps?
AI-native note apps like Mem or Reflect offer built-in AI search over your notes, but you are locked into their platform, their AI model, and their processing logic. The Matt Giaro AI Second Brain uses Obsidian (local, free, open markdown files) with a fully customisable agents.md that you control. You choose the AI model, define the processing rules, own all data locally, and can switch AI providers. The trade-off is more setup effort for far greater control and extensibility.
// Advanced
Can I replace the CRM pillar with something else like workout logs or recipes?
Yes, the CRM pillar is fully customisable to your domain. Replace it with workout logs, client records, classroom notes, recipes, research papers, or any relational layer that benefits from being cross-linked to your wiki. Update agents.md with the new pillar's rules — folder location, file naming convention, index structure, and what triggers creation of new entries. The Wiki/Knowledge Base is the only non-negotiable pillar; the other two adapt to your needs.
How does pattern detection work across journal entries?
The journal layer scans past journal entries for recurring themes, struggles, and topics. When you write a new journal entry, the AI checks previous entries for patterns — repeated frustrations, ongoing goals, recurring questions — and surfaces those explicitly in its response. Over time, the AI's advice becomes progressively more personalised because it factors in your history. This is enforced through instructions in agents.md that tell the AI to review /Journal/index.md and relevant past entries before responding.
What AI model should I use for the hourly automation?
Use the strongest available model on the highest reasoning setting — such as GPT-4.5 or equivalent. Using a lightweight model for the automation produces shallow wiki pages with poor entity extraction and weak cross-linking. Since the automation runs hourly and processes whatever is in the RAW folder, the quality of every wiki page depends on this model choice. The processing cost is minimal compared to the value lost from poorly extracted and linked content.
Can I use this system without GitHub for backup?
Yes, GitHub backup is optional. The system works fully without it — all files live locally in your Obsidian vault folder. However, without GitHub backup, you risk data loss. An alternative is to place your Obsidian vault in a cloud-synced folder (Dropbox, iCloud, Google Drive), though this does not provide version history the way Git commits do. For the most robust setup, connect a private GitHub repository and append the commit instruction to your hourly automation prompt.