Question 1

Can I use the foundation lab method if I'm not building a foundation model from scratch?

Accepted Answer

Yes, but with adaptation. Even if you're fine-tuning or building on top of an existing foundation model, the core principles apply: keep your product stack as thin as possible, capture process data from customer usage, and ensure product decisions feed back into your model improvement cycle (even if that cycle is fine-tuning rather than pre-training). The key is maintaining the unified loop between deployment and model improvement regardless of your model-building approach.

Question 2

What's the difference between artifact data and process data in AI training?

Accepted Answer

Artifact data is the finished output — a completed image, a published film, a final codebase. Process data is the full path of how that artifact was created — every decision, revision, iteration, and intermediate step. The internet is rich in artifact data but almost entirely lacking in process data. Training end-to-end agents that can actually do complete workflows (not just produce isolated outputs) requires process data that can only be collected by deploying products to real professionals and logging their creative process.

Question 3

What does 'thin stack' mean in the context of AI product development?

Accepted Answer

A thin stack means building the minimal possible product layer on top of base model capability. Every piece of product engineering that compensates for something the model can't do natively represents 'fatness' — technical debt that the next model iteration should eliminate. Complex orchestration systems, multi-step prompt chains, and engineering harnesses built around model gaps are fat. They represent six-month dead ends that become irrelevant when the model improves. The next model's job is always to reduce stack fatness.

Question 4

Why does the foundation lab method say to think in professions instead of verticals?

Accepted Answer

Verticals (healthcare, entertainment, finance) are abstractions that obscure actual human workflows. Professions (surgeons, filmmakers, financial analysts) are specific humans with specific end-to-end problems. A filmmaker's complete workflow — concept to shoot to edit to final output — is concrete and designable. 'The entertainment vertical' is vague and leads to spot-work tools. Thinking in professions forces you to design for the full end-to-end workflow that a real person needs solved, which is where durable AI value lives.

Question 5

How do I identify the scarce data problem for my AI modality?

Accepted Answer

Ask two questions: Is there a YouTube-scale open dataset for this modality? Is there a Wikipedia-scale structured knowledge base for it? If either answer is no, you have a scarce data problem. Your first product must solve this by being something people love to use for free that generates training data at scale. For example, if you're building AI for architecture, there's no YouTube of the architectural design process — only finished buildings. Your product must capture the design process itself to generate the missing data.

Question 6

How do I map model capability gaps against my product promise?

Accepted Answer

List every feature your product promises to deliver. For each feature, ask: can the base model do this natively, or does it require an engineering workaround? Every gap requiring a workaround is not an engineering project — it is a data collection and training job. Categorize each gap: does it need fine-tuning (days), a new training run (weeks), or a pre-training investment (months)? This map becomes your unified product-research roadmap. Gaps should shrink with each model iteration, not accumulate engineering complexity.

Question 7

How do I set up a process data capture pipeline in my AI product?

Accepted Answer

Design your product so that every user interaction logs the full creation path, not just the final output. This means capturing prompts, iterations, edits, undo actions, branch points where the user chose one direction over another, and the sequence of decisions leading to the final artifact. Build this telemetry into the core product architecture from day one. Structure the data so it can be directly consumed by your training pipeline. Every professional user session becomes a training example for end-to-end agent behavior.

Question 8

How do I deploy Forward Deployed Creatives effectively?

Accepted Answer

FDCs must serve two masters simultaneously: the customer and the research team. Hire people who are genuinely skilled in the customer's creative domain (filmmaking, design, marketing) and also technically literate enough to translate customer workflows into research signals. Embed them deeply in enterprise accounts. Their deliverables are dual: (1) successful customer deployments and (2) structured intelligence reports that feed directly into model training priorities. Treat every enterprise engagement as an optimization experiment, not a support engagement.

Question 9

How do I apply the 10x logarithmic scaling test before a major training run?

Accepted Answer

Before committing to a major compute investment, ask explicitly: if this model were 10x larger in compute and parameters, would the result be categorically different — enabling things that were impossible before — or just incrementally better at what it already does? If incrementally better, the bottleneck is not scale. Diagnose whether the real constraint is missing modality coverage, insufficient or low-quality process data, or an architectural limitation. Fix the real constraint first. Scale is only the answer when the architecture and data are ready to exploit it.

Question 10

My AI product has good day-one engagement but poor retention — what's going wrong?

Accepted Answer

You're likely failing the intelligence threshold test. Generative products that rely on novelty ('look, AI made this!') produce strong day-one spikes followed by rapid retention collapse. Users generate content for a few days, then ask 'now what?' because generated content is not interesting because it is generated — it is interesting because of what is happening in it. The fix is either to increase model intelligence until it understands context and user state, or to pivot toward professional/enterprise use cases where end-to-end workflow value sustains engagement independent of novelty.

Question 11

Our product team and research team keep building conflicting roadmaps — how do we fix this?

Accepted Answer

This is the foundational anti-pattern the Foundation Lab Method eliminates. You don't have a coordination problem; you have a structural problem. Product and research should not be separate teams with separate roadmaps. Merge them into one function with one roadmap. Every product feature request should be reframed as a model capability question. Every research milestone should be defined by the product capability it unlocks. If a product decision doesn't feed back into the model and a model improvement doesn't make the product better, the decision is misaligned.

Question 12

We built a complex engineering harness to work around model limitations — should we keep investing in it?

Accepted Answer

Almost certainly not. Complex orchestration systems, multi-step prompt chains, and spaghetti harnesses built around model capability gaps are six-to-eight-month dead ends. The next model iteration will likely make them irrelevant. Treat the model gap as a data collection job for the next training run (a two-to-three-week effort) rather than an engineering project. Maintain the harness only as a stopgap while you collect the data needed to close the gap natively in the model. Plan for the harness to be deleted.

Question 13

How does the foundation lab method compare to the typical AI startup approach of fine-tuning open-source models?

Accepted Answer

Fine-tuning open-source models creates a product layer on top of someone else's research. The foundation lab method argues this is structurally fragile because you cannot jointly optimize the full stack — you are dependent on another organization's research roadmap. However, the principles still partially apply: keep your product stack thin, capture process data, and use customer deployments as data flywheels. The difference is that a true foundation lab controls the full loop from pre-training through deployment, enabling tighter optimization cycles.

Question 14

How is the foundation lab approach different from Palantir's forward deployed engineer model?

Accepted Answer

Palantir's forward deployed engineers primarily serve a customer success function — embedding in enterprises to deploy Palantir's software. The Foundation Lab Method's Forward Deployed Creatives (FDCs) explicitly serve a dual function: customer deployment and research intelligence. FDCs are not support; they are optimization loops. Every enterprise engagement generates structured signals about what the model needs to learn next. The creative emphasis also distinguishes FDCs — they must be domain experts in the customer's creative field, not just technically proficient implementers.

Question 15

How does building a unified multimodal model compare to building separate specialized models?

Accepted Answer

Separate specialized models (a language model, an image model, a video model) create isolated towers that cannot jointly optimize. A unified single-tower model processing language, audio, video, and images as one signal stream enables things categorically impossible with separate models — like understanding a character's identity across a long film production or reasoning about visual states in code. The unified approach is significantly harder to train but is the only architectural path to genuine world understanding per the Foundation Lab Method.

Question 16

What is the intelligence threshold test for consumer AI products?

Accepted Answer

The intelligence threshold test determines whether a consumer generative product is viable by asking: does the model understand context, humor, and the local state of this specific user well enough that the output would be genuinely interesting to that person? Below this threshold, consumer products produce novelty spikes followed by retention collapse. Content is not interesting because it was AI-generated; it is interesting because of what it contains. Until models pass this threshold, enterprise and professional deployments are the correct focus.

Question 17

How do I evaluate whether my AI company should focus on consumers or enterprises?

Accepted Answer

Apply the intelligence threshold test. Consumer success requires the model to understand context, humor, and individual user state — a very high bar. Enterprise success requires the model to solve clear end-to-end workflow problems for specific professions — a lower but still demanding bar. Businesses produce 99% of pixels on screens every day and have well-defined workflow problems. Unless your model genuinely passes the consumer intelligence threshold, enterprise is where you'll find sustainable product-market fit and the process data to improve the model.

Question 18

How do I know when to scale my model vs. fix architecture or data issues?

Accepted Answer

Apply the 10x logarithmic test. If a 10x increase in compute and parameters would produce a categorically different model — enabling entirely new capabilities — then scale is the right investment. If it would only produce incremental improvement, the real constraint is elsewhere. Diagnose specifically: is it missing modality coverage (e.g., no audio tower)? Poor data quality or missing process data? An architectural limitation preventing proper fusion? Fix the actual bottleneck before spending on scale. Scale amplifies what works; it doesn't fix what's broken.

Question 19

What does 'the promise of AI is not spot work' mean practically for product design?

Accepted Answer

It means designing your product to solve complete professional workflows, not isolated tasks. A product that generates a quick image or writes a paragraph of copy is doing spot work — a fragment of what a professional actually needs. The end-to-end solution for a filmmaker is concept through final output. For a marketer, it's understanding the environment through localized assets at scale. Spot-work tools will be commoditized as base models improve. End-to-end workflow products built on process data create durable competitive advantage.

Question 20

Can the foundation lab method work for AI companies outside of visual AI?

Accepted Answer

Yes. The principles are modality-agnostic. The scarce data identification step, the thin stack principle, the process data flywheel, the profession-not-vertical framing, and the end-to-end optimization directive apply to any AI modality — language, audio, robotics, biology, code. The specific examples in the original formulation center on visual AI because Luma operates in that domain, but the structural insight that product and research must be one unified system applies to any AI company where model capability is the core product value.

Question 21

What is a world model in the foundation lab framework?

Accepted Answer

A world model is a model that understands the physical world and can simulate it. It is not defined by real-time speed or a specific architecture like autoregressive generation. It is defined by understanding laws of physics, causality, time, and human language — all as one unified signal stream. The architectural shape of a world model is a single tower jointly modeling language, audio, video, images, and physical context. Language plus video plus audio covers approximately 90% of the path to a world model.

Question 22

How should I prioritize which modalities to fuse in my unified model?

Accepted Answer

Start with the highest-leverage fusion based on your target professions. Language plus image or language plus video covers the most ground for most creative and business professions. Language plus video plus audio covers approximately 90% of the path to a world model. At each fusion step, measure whether the combination enables things that were categorically impossible before — not just incrementally better. If a fusion doesn't produce categorical new capabilities, the constraint may be data quality or architecture rather than modality coverage.

Frequently Asked Questions About Emit Jane Luma Foundation Lab Method

// Basics