I Built a Full SaaS App in One Session with GitHub Copilot: Here's What Happened

How I transformed a Next.js landing page into a full serverless SaaS with Document Intelligence and Chat Your Data — in a single Copilot session.

I wanted to test a thesis: can a governed AI agent build a production-quality SaaS application in a single extended session? Not a to-do app. Not a CRUD demo. A real product with authentication, document intelligence, conversational AI, payments, and a polished frontend.

The answer is yes — with caveats worth sharing.

The Premise

I had a Next.js 14 landing page for “Lawyer Companion,” an AI-powered legal SaaS concept. The page looked professional but did nothing. My goal: transform it into a fully functional serverless application with Document Intelligence (upload legal documents, extract structured data) and Chat Your Data (conversational interface over your document corpus).

The stack I chose: AWS Lambda with SAM (Serverless Application Model), Cognito for authentication, DynamoDB for data, Textract for document processing, Bedrock for conversational AI, Stripe for payments, and Next.js 14 for the frontend. Ambitious for one session. That was the point.

Six Phases, One Session

The session unfolded in six distinct phases, each building on the last:

Phase 1: SAM Scaffold

The agent created the entire SAM infrastructure — template.yaml with Lambda functions, API Gateway, DynamoDB tables, S3 buckets, Cognito user pool, and IAM roles. It didn’t just generate boilerplate; it made architectural decisions about table design (single-table pattern with GSIs), API structure (RESTful with proper CORS), and security boundaries (least-privilege IAM policies).

Phase 2: Authentication

Cognito integration with the Next.js frontend. Sign-up, sign-in, token refresh, and protected routes. The agent used the Amplify client libraries and handled the OAuth flow, including the callback redirect logic that usually takes me an hour to get right manually.

Phase 3: Document Intelligence

This was the most complex phase. Upload a document (PDF or image) to S3, trigger a Lambda function that invokes Textract for OCR and structured extraction, store the results in DynamoDB, and surface them in the frontend with a clean document viewer. The agent built the entire pipeline, including error handling for Textract’s async analysis mode.

Phase 4: Chat Your Data

RAG pipeline using Bedrock. The agent designed a flow where uploaded documents get chunked, embedded, and stored for retrieval. User questions go through a retrieval step that finds relevant document chunks, then a generation step that produces grounded answers with citations. The conversational UI streamed responses in real-time.

Phase 5: Rebrand and Polish

Full visual rebrand of the frontend to match a professional legal SaaS aesthetic. The agent updated the theme, typography, component library, and layout — all while keeping the existing functionality intact. It even generated placeholder content that felt domain-appropriate.

Phase 6: Production Readiness

Environment configuration, error boundaries, loading states, rate limiting, input validation, and security headers. The agent also created ADR (Architecture Decision Record) documents for every significant design choice, which was genuinely useful for future reference.

What Went Well

Rapid scaffolding was the most impressive part. The agent generated hundreds of lines of well-structured infrastructure code in minutes. SAM templates, Lambda handlers, API routes, DynamoDB operations — all consistent, all following AWS best practices.

Architecture decisions documented as ADRs. Without being asked, the agent created decision records for choices like single-table DynamoDB design, Textract vs. custom OCR, and streaming vs. polling for Bedrock responses. These documents are more valuable than the code itself for long-term maintenance.

14 tests passing throughout. Because the Copilot Agents Dojo enforces TDD discipline, the agent wrote tests before implementation at every phase. Every change was verified before moving forward. By the end, 14 tests covered the critical paths, and they all passed.

Consistent code quality. The Dojo’s “pursue elegant form” discipline prevented the agent from taking shortcuts. When it generated a Lambda handler, it included proper error handling, input validation, logging, and response formatting. Every time.

The Gotchas

Amplify adapter-nextjs API changes. The agent initially used an older Amplify API that had been deprecated. It took two iterations to get the correct import paths and configuration for the current version. This is a classic AI agent failure mode — training data lags behind library updates.

Textract limitations. The agent initially planned DOCX support, but Textract only handles PDFs and images natively. It pivoted to a PDF-only flow after discovering this during testing. Good recovery, but it would have been better to check the documentation first.

Lazy boto3 initialization for testability. The agent initially imported and initialized boto3 clients at module level, which made unit testing impossible without AWS credentials. After a failing test exposed this, it refactored to lazy initialization — creating clients inside functions rather than at import time. A small pattern, but essential for testable Lambda code.

Context window pressure. By Phase 5, the session’s context window was getting dense. The agent occasionally repeated itself or lost track of earlier architectural decisions. The memory vault (lessons.md) helped, but there’s a practical limit to single-session complexity.

The Role of the Dojo

This session would not have worked without the Copilot Agents Dojo disciplines. Specifically:

Plan Before Striking ensured each phase started with a clear plan, not just “start coding the auth flow.”
Delegate with Sub-Agents let the agent research AWS service capabilities while simultaneously scaffolding infrastructure.
Prove the Technique (TDD) caught the boto3 initialization issue, the Amplify API change, and three other bugs that would have been painful to debug later.
Learn from Every Fall meant that when the agent hit the Textract limitation, it logged the lesson and adjusted its approach for subsequent document-processing decisions.

Without governance, I’ve seen similar sessions produce an impressive demo that falls apart under any real usage. With the Dojo, the output was genuinely production-quality.

The Key Takeaway

AI agents can build production-quality applications when governed with discipline. The technology is ready. The models are capable. What’s missing in most organisations isn’t better AI — it’s better governance of AI.

A single session produced a full-stack serverless SaaS with authentication, document intelligence, conversational AI, payments integration, a polished frontend, 14 passing tests, and ADR documentation. Not because I wrote a brilliant prompt. Because the agent operated under a behavioural framework that demanded planning, testing, verification, and quality.

Stop prompting with hope. Start governing with discipline. That’s the difference between a demo and a product.