AI-Built MVP Due Diligence (2026): Four

AI-built MVP due diligence workspace with checklist and code review panels

The demo looked flawless. The code was a ghost town.

I sat through a pitch last month where a founder demoed an AI scheduling tool that booked meetings, sent follow-ups, and even apologized for rescheduling in a convincingly human tone. The UI was slick. The growth metrics showed 40% week-over-week user adoption. The founder had a story about scaling from zero to 5,000 users in six weeks. I almost wrote a check.

Then I ran AI-built MVP due diligence — the kind that looks past the demo and into the actual repository. What I found was a single Python script that called the OpenAI API, a frontend cloned from a Tailwind template, and zero test files. The "5,000 users" were phantom accounts seeded into a Firebase database. The product was a stage set, not a business.

This is not an edge case. According to the Stanford AI Index 2026 report, AI-generated code submissions to public repositories increased 340% year-over-year. The GitHub Octoverse 2025 found that 22% of all new repositories are now AI-generated. Combine that with the SEC's May 2026 guidance on AI-washing in fundraising, and you have a perfect storm: it has never been easier to build a fake startup, and it has never been harder to tell the difference.

This article is my attempt to fix that. I have spent eight years analyzing startup culture and watching the gap between pitch and product widen. Here is a practical, data-driven framework for AI-built MVP due diligence — four checks that separate real, scalable products from vaporware before you commit capital.

What is AI-built MVP due diligence?

AI-built MVP due diligence is the process of verifying that a startup's product is real, maintainable, and scalable when most of its code was generated by large language models. It is not the same as traditional technical due diligence, which assumes a human wrote the code and can explain it. The rise of AI-generated codebases introduces new failure modes: hallucinated dependencies, security vulnerabilities that no human reviewed, and a complete lack of architectural coherence.

The table below shows how AI-built MVP due diligence differs from traditional technical due diligence across four dimensions.

Dimension	Traditional due diligence	AI-built MVP due diligence
Code origin	Assumes human authorship	Must verify AI vs. human contribution
Security review	Standard OWASP Top 10	Must also check for AI-specific injection risks
Maintainability	Human can explain architecture	AI may have generated code with no coherent design
Scalability	Tested under load	Often collapses at 10x current traffic
Documentation	Usually exists	Often fabricated by AI alongside code

Why does AI-generated code change the due diligence game?

AI-generated code changes due diligence because the failure modes are invisible to traditional code review. When a human writes bad code, you can usually trace the logic error. When an AI writes bad code, the errors are often plausible nonsense — functions that compile but do nothing, dependencies that do not exist, and security holes that look like intentional design choices.

A 2025 study from NIST found that AI-generated codebases contained 2.8x more critical vulnerabilities than human-written equivalents, largely because the models hallucinated API calls and imported fake libraries. Traditional static analysis tools miss these because the code is syntactically valid.

How do you measure AI contribution in a codebase?

You measure AI contribution by analyzing commit patterns, code complexity, and dependency trees. The GitHub Octoverse 2025 data shows that AI-generated commits tend to have three signatures: they are large (500+ lines per commit), they lack incremental history, and they rarely include test files. A human-built MVP shows gradual refinement. An AI-built MVP appears fully formed, like Athena from Zeus's forehead.

Tools like GitHub Copilot's attribution API and Codeium's audit logs now let you estimate AI contribution per file. If more than 60% of the codebase was AI-generated without human review, the security and maintainability risks multiply.

What are the specific risks of investing in an AI-built MVP?

The specific risks fall into three buckets: security, scalability, and legal. On security, the OWASP Top 10 for LLM Applications 2025 added a new category for "AI-generated dependency poisoning" — where the model invents a package name, the developer installs it, and an attacker registers that package to deliver malware. On scalability, AI-generated code often uses brute-force patterns that work at 100 users but crash at 10,000. On legal, the Stanford AI Index 2026 notes that 78% of AI-generated codebases contain code that may violate open-source licenses, creating downstream liability for acquirers.

AI-generated code compiles. That does not mean it works.

Why AI-built MVP due diligence matters now

How many startups are actually faking it with AI?

The number is higher than most investors want to admit. The GitHub Octoverse 2025 reported that 22% of all new repositories are AI-generated. But that is just public repos. Private repositories — where most startups build — likely have a higher concentration. A 2026 survey by Carta found that 34% of seed-stage founders admitted to using AI to generate "significant portions" of their product code without reviewing it.

I have seen pitches where the entire backend was a single Firebase Cloud Function that called GPT-4. The founder called it "AI-native architecture." I called it a wrapper with a marketing budget.

What happens when an AI-built MVP fails under pressure?

It fails spectacularly. A 2025 post-mortem by Vercel documented a startup whose AI-generated Next.js app crashed when 47 concurrent users hit the payment endpoint. The reason: the AI had generated a synchronous payment handler that blocked the event loop. The founder had never load-tested because the AI told them the code was "production-ready."

This is the core problem with fake startup investment — the demo works in isolation but collapses under any real load. The AI does not know it is generating bad code. It just knows it is generating code that looks like the training data.

Why do traditional due diligence methods fail here?

Traditional due diligence assumes the founder understands their own codebase. When I ask a founder to walk me through the architecture, and they cannot explain why they chose PostgreSQL over MongoDB, that is a red flag. But with AI-built MVPs, the founder may genuinely not know — the AI made the choice, and the founder accepted it.

The SEC's May 2026 guidance specifically calls out this scenario: "Founders who cannot explain the technical architecture of their product may be misleading investors, even if the product functions." This is not just a risk for the founder. It is a risk for the investor who does not ask the right questions.

A founder who cannot explain their code is a founder who cannot fix their code.

What does the SEC guidance actually say about AI-washing?

The SEC's May 12, 2026 press release clarified that claiming "AI-powered" features without material technical backing constitutes fraud. The guidance specifically mentions codebases where AI generated the core logic and the founder cannot demonstrate understanding of that logic. This creates a legal obligation for investors to perform AI scam due diligence — verifying not just that the product works, but that the founder can maintain and improve it.

How to verify AI product legitimacy: a four-check framework

Check 1: Run a code attribution audit

The first step in verify AI product legitimacy is to determine how much of the codebase was AI-generated. This is not a judgment of quality — some AI-generated code is excellent. But you need to know the ratio because it changes your risk profile.

Start with GitHub's Copilot attribution API. It returns a percentage estimate of AI-generated lines per file. If the average across the codebase exceeds 60%, flag it. Then look at commit history. AI-generated commits tend to be large, infrequent, and lack incremental refinement. A healthy codebase has 50-200 small commits per developer per month. An AI-built MVP often has 3-5 massive commits that contain the entire product.

I once audited a startup where the entire codebase was a single commit of 12,000 lines, timestamped 48 hours before the first investor meeting. The founder claimed they had been "building in stealth for six months." The commit history told a different story.

According to a 2025 analysis by LinearB, teams that rely on AI for more than 50% of their code see a 40% increase in bug reintroduction rates. The AI does not learn from its mistakes. It generates the same bad patterns on every request.

Check 2: Scan for hallucinated dependencies and security holes

This is where AI scam due diligence gets specific. AI models hallucinate package names, API endpoints, and configuration values. A 2025 NIST study found that 12% of AI-generated codebases contained references to packages that did not exist. Attackers have started registering these hallucinated package names on npm and PyPI, creating supply chain attacks.

Run the codebase through Snyk's dependency scanner and Socket.dev's AI hallucination detector. Socket specifically checks for packages that were likely hallucinated — names that follow AI generation patterns like "utils-helper-ai" or "api-client-wrapper." If the scanner finds dependencies that do not resolve to real packages, the codebase is untrustworthy.

Also run the OWASP LLM Top 10 checklist. Pay special attention to LLM01 (prompt injection) and LLM06 (sensitive information disclosure). AI-generated code often hardcodes API keys, database credentials, and secret tokens because the model learned from training data that included leaked credentials.

Check 3: Load test with realistic traffic patterns

An AI-built MVP often works perfectly for one user. It falls apart at ten. The reason is that AI models generate code that passes unit tests but fails integration and load tests. The model does not understand concurrency, connection pooling, or database indexing. It generates code that works in isolation.

Set up a load test using k6 or Artillery with a ramp-up pattern: start at 10 concurrent users, increase to 100 over 5 minutes, then hold for 10 minutes. Monitor response times, error rates, and database connection counts. If the error rate exceeds 1% at any point, the codebase is not production-ready.

A 2025 benchmark by Grafana Labs found that 68% of AI-generated web applications failed at 50 concurrent users, compared to 12% of human-written equivalents. The most common failure mode was database connection exhaustion — the AI generated code that opened a new connection for every request and never closed it.

Check 4: Interview the founder about their own code

This is the most important step in startup verification checklist 2026. Sit the founder down and ask them to explain the architecture. Not the product vision — the actual code. Ask specific questions:

Why did you choose this database?
How do you handle database migrations?
What happens when the API rate limit is hit?
Where is the payment processing logic?
How do you debug a production issue?

If the founder cannot answer these questions without referencing "the AI handled that," you have a problem. The SEC's 2026 guidance is clear: founders must demonstrate material understanding of their technology. "The AI built it" is not a defense.

I have conducted over 200 technical founder interviews. The ones who pass this test can draw the architecture on a whiteboard from memory. The ones who fail pull up a diagram that the AI generated and read from it.

The table below summarizes the four checks and their pass/fail criteria.

Check	Tool	Pass criteria	Fail criteria
Code attribution	GitHub Copilot API	<60% AI-generated lines	>60% AI-generated lines
Dependency scan	Socket.dev, Snyk	Zero hallucinated packages	Any unresolved dependencies
Load test	k6, Artillery	<1% error rate at 100 users	>1% error rate or crash
Founder interview	Whiteboard session	Can explain architecture without notes	Reads from AI-generated diagram

If the founder cannot explain the code, the code is not the product. The pitch is.

Proven strategies to strengthen your startup verification checklist

How do you build a repeatable due diligence process?

Build a template. I use a Notion database with the four checks as columns, plus fields for commit history analysis, dependency tree depth, and test coverage percentage. Every time I evaluate a startup, I fill in the template before the first meeting. This forces me to look at the code before I get charmed by the founder.

The NIST AI Risk Management Framework provides a good structure for this. Map each check to a NIST category: code attribution maps to "Govern" (understanding the AI's role), dependency scanning maps to "Map" (identifying risks), load testing maps to "Measure" (testing performance), and founder interviews map to "Manage" (ensuring human oversight).

What red flags should trigger an immediate pass?

Three red flags end my due diligence immediately. First, a single-commit codebase. No legitimate product is built in one commit. Second, zero test files. According to a 2025 report by CodeClimate, 89% of AI-generated codebases have less than 5% test coverage. Third, the founder cannot name their database without checking a file. If they say "I think it's MongoDB" and it is actually PostgreSQL, the product is a facade.

How do you verify user growth claims without access to the backend?

This is where fake startup investment detection gets creative. Ask for read-only access to the database, or ask for a CSV export of user signup dates with timestamps. Then check for patterns: real user growth shows weekly cycles (fewer signups on weekends), geographic distribution, and gradual onboarding. Fake growth shows uniform daily numbers, no weekend dips, and all users from the same IP range.

I once caught a startup that had "50,000 users" — all registered between 2 AM and 4 AM on a single Tuesday. The founder had run a script. The SimilarWeb traffic analysis showed zero organic visits to their website. The product had no users.

What role does the OWASP LLM Top 10 play in your verification?

The OWASP LLM Top 10 is your checklist for AI-specific vulnerabilities. Run through each category during your dependency scan. LLM02 (insecure output handling) is common in AI-built MVPs — the code passes user input directly to the LLM without sanitization, enabling prompt injection attacks. LLM08 (excessive agency) appears when the AI-generated code gives the LLM access to system commands or database write operations without human approval.

A 2025 analysis by Bishop Fox found that 73% of AI-built MVPs had at least one critical OWASP LLM vulnerability. The most common was LLM06 (sensitive information disclosure) — the AI had hardcoded API keys into the frontend JavaScript bundle.

Key takeaways

AI-built MVP due diligence requires four specific checks: code attribution audit, dependency scan, load test, and founder interview.
22% of all new GitHub repositories are AI-generated, per the GitHub Octoverse 2025, making verification essential.
AI-generated codebases contain 2.8x more critical vulnerabilities than human-written code, according to NIST.
The SEC's May 2026 guidance makes founders legally responsible for understanding their AI-generated code.
A single-commit codebase with zero test files is an immediate red flag.
Load testing at 100 concurrent users reveals 68% of AI-built apps as non-production-ready.

Got questions about AI-built MVP due diligence? We have got answers

What is AI-built MVP due diligence and why do I need it?

AI-built MVP due diligence is the process of verifying that a startup's product is real, maintainable, and scalable when most of its code was generated by AI. You need it because the Stanford AI Index 2026 shows a 340% increase in AI-generated code submissions, making it trivially easy to build a fake product that looks real in a demo.

How much time does a thorough AI-built MVP due diligence take?

A thorough check takes 4-8 hours for a typical seed-stage codebase. The code attribution audit takes 30 minutes using GitHub's API. The dependency scan takes another 30 minutes. Load testing takes 2 hours including setup. The founder interview takes 1-2 hours. The total is less than a day of work — cheap compared to the cost of a bad investment.

Can AI-generated code ever be production-ready?

Yes, but only with significant human review. A 2025 study by Google Research found that AI-generated code with human review achieved comparable quality to human-written code after an average of 3.2 review cycles per file. The problem is that most startups skip the review. If the founder cannot show you review comments on their AI-generated code, assume it is unreviewed.

How do I verify a startup's user numbers without backend access?

Ask for a CSV export of user signup timestamps. Look for patterns: real users sign up throughout the day, with dips on weekends. Fake users appear in batches. Also check SimilarWeb for organic traffic. If the startup claims 5,000 users but has zero organic traffic, the numbers are fabricated.

What are the legal risks of investing in an AI-washing startup?

The SEC's May 2026 guidance makes AI-washing a form of fraud. Investors who fail to perform reasonable due diligence may face liability if the startup is later found to have misrepresented its technology. The guidance specifically mentions code attribution as a material factor in investment decisions.

How many AI-built MVPs fail within the first year?

The data is still emerging, but early indicators are grim. A 2026 survey by Startup Genome found that AI-built MVPs had a 58% failure rate within 12 months, compared to 38% for human-built equivalents. The primary cause was technical debt — the AI-generated code could not be extended or maintained without a complete rewrite.

Ready to spot the fakes before they take your money?

You now have the framework. The tools are free or cheap. The time investment is less than a day. The alternative is writing a check to a founder whose "product" is a single Python script calling GPT-4.

Learn to detect the larpers — or become one. Your call.

AI-Built MVP Due Diligence (2026): Four Critical Checks