Serverless AI on a Shoestring — Santiago Díaz de Valdés Williamson

There is a persistent myth in the AI SaaS world that you need significant infrastructure spend to run production AI workloads. GPU instances, always-on inference servers, vector database clusters -- the conventional wisdom pushes you toward a $500+/month baseline before you have served your first paying customer. When I was building AskMyCourse and CandidatePilot, I decided to challenge that assumption directly. The result: both products run on less than $1/month in AWS infrastructure costs, with 90%+ gross margins even at early-stage revenue levels.

The key insight was recognizing that most AI SaaS products are not compute-bound in the way people assume. The expensive part -- the LLM inference -- happens on the provider's infrastructure (OpenAI, Anthropic, etc.) and is billed per token. Your infrastructure just needs to orchestrate: receive the request, retrieve relevant context, call the API, format the response, and return it. That orchestration workload is perfectly suited for AWS Lambda. A typical RAG request takes 200-400ms of compute time on my side, well within Lambda's sweet spot. At early-stage traffic volumes, you are comfortably inside the free tier for Lambda invocations and will stay there for a surprisingly long time.

The architecture choices that make this work are deliberate. I use PostgreSQL on RDS (the smallest instance class, or Aurora Serverless v2 when I need auto-scaling to zero) instead of a dedicated vector database -- pgvector gives you 90% of the capability at a fraction of the cost. Static assets live on S3 behind CloudFront. Authentication runs through Cognito. The entire backend is a handful of Lambda functions behind API Gateway. There is no EC2 instance, no ECS cluster, no Kubernetes. The deployment is a single SAM template. When I compare this to the EC2-based architectures I see teams default to, the cost difference is not 2x or 5x -- it is often 50-100x at equivalent traffic levels.

The tradeoff is real but manageable: cold starts add latency to the first request after idle periods, and you lose the ability to run long-lived background processes natively. I mitigate cold starts with provisioned concurrency on the critical path (one instance costs about $3/month) and handle background work with SQS queues feeding back into Lambda. The bigger lesson is about mindset. When you are a solo founder or a small team, infrastructure cost is not just an engineering concern -- it is a survival concern. Every dollar you do not spend on servers is a dollar you can spend on acquiring customers or extending your runway. Designing for cost efficiency from day one is not premature optimization; it is the difference between a project that can sustain itself and one that bleeds money waiting for scale that may never come.