Pablo's Architecture Proposal

A faithful write-up of what Pablo wants to build and change in the Call Analyst architecture, framed so Mark can decide. Pablo's case is presented in full; an honest assessment section follows so the tradeoffs are visible.

Subject: Call Analyst (callanalyst.app) · Proposed by: Pablo (senior-engineer collaborator) · Companions: the build docs and the refactor plan.

Executive summary

Pablo is proposing to take Call Analyst from a solo proof-of-concept (a single-file PWA on managed serverless) to a production-grade, AWS-native, compliance-ready product. The proposal has two halves that reinforce each other:

The engineering half

Split the monolith into proper HTML/JS/CSS modules, host the frontend on a CDN (CloudFront + S3), move the backend to VPC-isolated Lambdas, containerize where needed, and express the whole thing as AWS CDK / CloudFormation. He has spot-optimized, ready-to-go CDK templates for both the CDN frontend and the isolated backends, and offers to own the infra.

The legal half

Treat this as a meeting-recorder that handles sensitive personal data: add GDPR/CCPA compliance, all-party recording consent, data residency, geofencing, retention and deletion, and subprocessor agreements. This is the part a POC skips and a real release cannot.

His stated goals: reliability, scalability, predictable cost, security, and the legal standing to actually operate. His core argument: use the right tools and "do it right" now, with infra he already has battle-tested, rather than ship a centralized POC and retrofit later.

The core thesis

"The decisions taking this into a localized monolith are not what we need. We need a grown-up build. From a legal perspective as well as an engineering one, are you sure this is a full release? Is a 20k single HTML file OK to ship just because it's centralized?"

Pablo's central point is that "it works and it's centralized" is not the same as "it's a product." Shipping a POC as a full-fledged product means crossing a set of engineering and legal thresholds that the current build has not crossed. He is arguing the gap is real, the tooling to close it already exists (his CDK templates), and the closing is "not a large lift" when the right tools are used and he carries the infra.

He also reframed an earlier reliability debate: the headline isn't a specific number of nines, it's whether the product is built to be operated, audited, and trusted at production standard. On that framing, the productionization gap is the real subject.

Where it stands today (what Pablo wants to move off of)

Layer	Today
Frontend	One ~20,000-line single-file PWA (`index.html`), HTML+CSS+JS inline, no build step
Hosting	Netlify (managed); the app auto-publishes from `main`
Backend	Netlify Node functions + Deno edge functions (managed serverless)
Data / auth	Supabase (managed Postgres + auth + storage), plus Cloudflare R2
Build segregation	Consumer vs internal via a runtime flag and a drifting git branch
Secrets	Scattered as per-site env vars; some keys in plaintext config; no rotation
Compliance	Not yet addressed — no consent framework, GDPR, residency, retention, or DPAs
Network posture	Public managed endpoints; no VPC / private subnets (there is no network to isolate in a managed-SaaS model)

1 · Decompose the monolith

Separate the HTML, JS, and CSS into distinct files and modules. End the single 20k-line file. Pablo frames this as table stakes for a "grown-up" build, and as a prerequisite for any security or privacy review (an opaque single file is effectively unauditable).

Point of agreement. Both Pablo and the prior refactor plan call for this. The open sub-question is monorepo modules vs multiple repos (see §6).

2 · Frontend on a CDN (CloudFront + S3)

Serve the static frontend from Amazon S3 behind CloudFront, defined by a spot-optimized, ready-to-go CDK template Pablo already has. This replaces Netlify hosting for the consumer app with AWS-native, globally cached delivery under unified AWS control.

3 · VPC-isolated Lambda backends

Move the backend from Netlify/Deno functions to AWS Lambda, deployed inside a VPC with private subnets (network isolation), via Pablo's second CDK template. The goal is a controlled network boundary, fine-grained IAM, and a single, auditable security perimeter for everything that touches user data. Pablo considers the Lambda lift small given the templates already exist.

4 · Containers & images

Pablo wants container images as part of the build ("we need our images"). In a Lambda-centric design this most directly means container-image-packaged Lambdas and/or a path to ECS/Fargate for any workload that needs it. It standardizes how code is packaged and deployed, and gives a clean home for any long-running worker (for example agentic background jobs or media/transcription processing that exceeds serverless time limits).

5 · Everything as CDK / CloudFormation

Express the entire stack as AWS CDK / CloudFormation — infrastructure as code — so the frontend, backend, network, and security are unified, versioned, reproducible, and owned in one place. This is the backbone of the proposal: one selected infrastructure that unifies security and operations, rather than a spread of managed SaaS each with its own controls.

6 · Repo & ownership structure

Pablo proposes feature-grouped separate repos rather than one general, multi-faceted repo, to force focus and a more mature build. Critically, he offers to own the AWS infrastructure himself. That ownership is a central variable: it changes who carries the operational burden of the migration (see §assessment).

7 · The legal / compliance layer the part a POC skips

Pablo's strongest point: Call Analyst records people's meetings, which makes it a processor of voice and transcript data full of personal information, often including people who never opted in. That front-loads a legal layer from the first real user, independent of how clean the code is:

Recording consent — all-party consent. US two-party-consent states (California, Florida, others) and GDPR both require it. This is the defining legal risk for any notetaker.
GDPR / CCPA — lawful basis, privacy policy, ToS, right to access, right to erasure (the ability to actually delete a user's data and their meeting data on request).
Data residency & geofencing — keep EU user data in-region, or geofence the launch (e.g. US-first) to defer GDPR by intent rather than by accident.
Retention & deletion policy, records of processing.
Subprocessor DPAs — data-processing agreements with everyone the audio/transcript touches: Recall, Anthropic, OpenAI, Supabase, Stripe.

Why the two halves connect. Controlled AWS infra gives genuinely better primitives for the legal layer: region-pinned storage for residency, KMS for key management and rotation, fine-grained IAM for least privilege, VPC + WAF for network controls, and real geofencing. The infra argument and the compliance argument reinforce each other.

Before → after, at a glance

Layer	Today (POC)	Pablo's target
Frontend code	20k-line single file	→ split HTML/JS/CSS, modular
Frontend hosting	Netlify	→ S3 + CloudFront (CDK)
Backend	Netlify + Deno functions	→ VPC-isolated Lambdas (CDK)
Packaging	raw files	→ container images
Infra definition	console + scattered env	→ CDK / CloudFormation, unified
Network	public managed endpoints	→ VPC, private subnets, IAM
Repos	one repo	→ feature-grouped repos
Secrets	plaintext / per-site env	→ KMS + rotation + least privilege
Compliance	none	→ consent, GDPR, residency, retention, DPAs
Ownership	Mark + AI	→ Pablo owns the infra

What Pablo is optimizing for

Reliability

A stack built to production standard: isolated, monitored, reproducible.

Scalability

Infra that grows with the user base under one control plane.

Predictable cost

Spot-optimized, IaC-defined spend rather than ad-hoc managed-SaaS bills.

Security

Network isolation + unified IAM + proper secret management.

Legal standing

The compliance layer that lets the product legally operate, especially in the EU.

"Do it right"

Use the right tools now, on infra he has already proven, rather than retrofit.

Honest assessment (so the tradeoffs are visible)

Where Pablo is clearly right

The compliance gap is real and front-loaded. For a meeting recorder, consent + GDPR + deletion are not "after product-market fit." They apply from the first real user. This was the strongest point in the discussion, and it was missing from the earlier engineering-only plan.
"Centralized" is not "shippable." A 20k-line file serves traffic fine, but it is close to unauditable, which is a liability the moment a privacy or security review arrives. Splitting it is more urgent under a compliance lens, not less.
Controlled infra is the right home for the compliance primitives (residency, KMS, IAM, geofencing).
Ready templates + Pablo owning the infra changes the cost calculus. The original "premature, months of work, spends Mark's runway" objection weakens substantially when the senior engineer carries it on infra he has already built.

The distinctions Mark should hold onto

AWS is not compliance. The migration buys better primitives; it does not make the product GDPR-compliant. The consent flow, deletion pipeline, DPAs, privacy policy, and retention policy are application + legal + process work that must happen regardless of host. Treat them as two tracks that meet, not one.
The template is the easy 20%; the migration is the 80%. Standing up the CDN/VPC skeleton is fast. The real labor is porting ~40 functions to Lambda and migrating auth + Postgres + storage off Supabase, plus the cutover. Worth estimating eyes-open.
Reliability is earned, not bought. AWS provides the floor (the current stack already rides on AWS via Supabase/Netlify); production reliability comes from architecture and operations, and someone must own the pager.
Monorepo vs polyrepo is an open question. The frontend is one deployable app, not independent services, so feature-grouped repos add coordination cost without mapping to separate deployments. A monorepo with clean module boundaries delivers the same "grown-up" structure with less friction. Worth deciding deliberately.
Containerize where a workload needs it (long-running workers), not everywhere, to avoid orchestration overhead on short-lived handlers.

Proposed sequencing (do it right without blowing the launch)

Scope the launch by geography. US-first, recording consent handled per two-party states, GDPR deferred by deliberate geofence. Makes a near-term launch legal without boiling the ocean.
Stand up Pablo's CDK foundation — S3 + CloudFront frontend, VPC-isolated Lambda backend, KMS for secrets, region-pinned storage. The skeleton he already has.
Run the legal / data track in parallel — consent capture, a real deletion pipeline, subprocessor DPAs, privacy policy + ToS. The part no infra does for you.
Split the frontend into modules as it lands on the CDN, so it is auditable for the review compliance will trigger.
Port functions to Lambda and migrate auth/data off Supabase — the genuine labor, sequenced after the skeleton, with a tested cutover and rollback.

Decisions for Mark

Decision	The choice
Migrate to AWS now, or after PMF?	Pablo owning the infra + ready templates makes "now" reasonable. The gate is sequencing it so it doesn't consume the launch window.
Launch scope	US-first geofence (defer GDPR) vs full EU readiness on day one.
Repos	Feature-grouped repos (Pablo) vs one monorepo with module boundaries.
Who owns the infra long-term	Pablo carries AWS — confirm this is durable, since reliability follows ownership.
Compliance track owner + budget	The legal work (consent, deletion, DPAs, policy) needs an owner and likely outside counsel. This is the non-optional half.

Report compiled 2026-06-17 from the Mark–Pablo architecture consultation. Pablo's proposal presented in full; assessment added for decision support. Companions: build docs · refactor plan. Behind the Daxos password gate.