How Should AI Founders Structure Product and Research?

For AI startup founders and CTOs · Based on Emit Jane Luma Foundation Lab Method

// TL;DR

AI startup founders should structure their companies as foundation labs where product and research are one unified system, not separate departments. This means every product deployment generates training data for the next model, and every model improvement directly improves the product. Build the thinnest possible product on top of current model capability, target specific professions rather than verticals, capture process data from real customer usage, and evaluate scaling decisions using the 10x logarithmic test. The compound loop this creates is the fundamental competitive advantage of AI-native companies.

Why should AI founders unify product and research into one team?

The Foundation Lab Method argues that treating product and research as separate functions is the single most common structural mistake AI startups make. In a foundation lab, research produces the product and the product works in research. They are one thing.

Why does this matter for founders? Because separate teams create separate roadmaps with competing priorities. Research optimizes for benchmarks; product builds engineering harnesses to patch model gaps. The result is linear progress at best, and at worst, six-to-eight months of engineering work that the next model iteration makes irrelevant.

The alternative is a compound loop: your product serves real customers, generating process data (how artifacts are made, not just the finished outputs). That process data feeds directly into the next training run, producing a better model, which makes the product thinner and more capable, which attracts more users and data. This loop is the economic engine of foundation labs.

Practically, this means your research lead and product lead should be the same person—or at minimum, share one roadmap, one set of metrics, and one weekly planning cycle.

How do I decide what to build first as an AI startup?

Before building anything, ask two questions about your target modality: Is there a YouTube of it? Is there a Wikipedia of it?

If the answer is no, your first product isn't actually a product—it's a data generation engine. Build something people love to use for free that produces training data as a byproduct. Don't wait to know exact scale requirements; scaling laws for new modalities are unknown early on.

If the data exists, map your current model's capabilities honestly against the full end-to-end promise you want to deliver. Every gap between what the model can do and what the product needs is not an engineering project—it's a data collection job for the next training run. Categorize each gap: does it need fine-tuning, a new training run, or a full pre-training investment?

Then build the thinnest possible product on top of current capability. Resist the urge to build complex orchestration layers. Fatness in the product stack is technical debt that compounds against you.

How should AI founders think about their target market?

Drop the word 'vertical' from your vocabulary. Think in professions instead.

Verticals like 'entertainment' or 'healthcare' are abstractions that hide the actual end-to-end workflows real humans need solved. Professions are concrete: a filmmaker's end-to-end is concept → shoot → edit → set changes → final output. A marketer's end-to-end is understanding the environment → resonant message → localized assets at scale.

When you target professions, you discover specific failure modes, specific magic moments, and specific process data you can capture. This precision is what separates AI companies that compound from those that stall.

When should an AI startup pursue enterprise vs. consumer?

Apply the intelligence threshold test before chasing consumer markets. Ask: does the model understand context, humor, and the local state of the specific user well enough that generated content would be genuinely interesting to that person?

If not, consumer deployment will produce a strong day-one spike—the novelty of generation—followed by rapid retention collapse. Users scroll for a few days and churn because generated content is not interesting because it's generated; it's interesting because of what's happening in it.

Enterprise deployment is the correct focus until models pass this threshold. Businesses are responsible for 99% of pixels on screens every day, and they have clear end-to-end workflow problems your model can solve right now. Deploy Forward Deployed Creatives to enterprise customers who simultaneously help customers succeed and pipe intelligence back to your training pipeline.

Start building your foundation lab architecture today. Audit whether your product and research are truly unified, identify your scarce data problem, and map every engineering harness that should be replaced by training data.

// FREQUENTLY ASKED QUESTIONS

How do I structure my AI startup as a foundation lab?

Eliminate the separation between product and research teams. Create one unified roadmap where every product deployment generates training data and every model improvement directly improves the product. Your product lead and research lead should share metrics. Build thin product layers, not engineering harnesses. Treat every model capability gap as a data collection job for the next training run, not an engineering workaround project.

Should my AI startup build on top of GPT or train our own model?

If you want the full compound loop of the Foundation Lab Method, you need control over your training pipeline. Building on someone else's model means you can't feed product data back into training—the most powerful element of the framework. You can still apply thin-stack and profession-targeting principles, but the data flywheel requires training your own models or deep partnerships where your process data enters the provider's training runs.

How do I know if I'm building too thick a product stack?

Count the number of engineering workarounds, orchestration layers, and multi-step pipelines that exist because the model can't do something natively. Each one represents fatness. Ask: if the next model were 2x better, which of these would disappear? If the answer is most of them, your stack is too thick. Redirect that engineering effort toward collecting training data that would make those harnesses unnecessary in the next model iteration.