Case Study All Work

Deploy Verify

Post-deployment verification tool that uses AI-driven browser automation to answer the question CI pipelines don’t: “Does the site actually work for a user right now?”

Role: Solo developer — architecture, frontend, backend, security, deployment
Timeline: 4–5 weeks to production-ready, 2026
Stack: Laravel, Amazon Nova Act, Nova 2 Lite, Amazon Bedrock, SQL, self-hosted
Links: Live

The test creation page with an AI writing assistant

The detail page when a test is run, part 1

The detail page when a test is run, part 2, with artifacts from a test run

The Problem

Modern CI pipelines are good at answering “Did the build succeed?” They are much worse at answering “Does the site actually work for a user right now?”

Unit tests and integration tests verify implementation details — selectors, DOM structure, and tightly coupled assumptions. They break as UIs evolve, and they miss the failures that matter most: a user can’t log in, a form doesn’t submit, a dashboard won’t load.

For most of my career, the final check was manual. I’d open the production site after a deploy and click through critical flows myself. Over time, I moved most of my major deployments to Fridays — not because I followed the “never ship on Friday” rule, but because I worked for myself and wanted weekends as a buffer. If something broke in production, I’d rather catch it during a slow period and fix it before Monday than have a client find it first. That approach worked, but it didn’t scale, and it meant I was always on call after a deploy.

I wanted a tool that runs after deployment, from the outside, and validates behavior the way a real user would.

Constraints

Natural language had to be precise enough to execute. Tests are written in plain English, but a browser agent needs deterministic instructions. Too rigid, and it feels like writing code again. Too loose and tests fail unpredictably.
Browser isolation. Running a real browser on behalf of users introduces risk. Without safeguards, test instructions could reach internal services, access the AWS metadata endpoint, or probe the local network.
AI integration had to be structural, not decorative. I didn’t want AI bolted onto a conventional test runner. The tool needed AI at its core — for writing tests, executing them, and diagnosing failures.
Solo timeline. The entire application — architecture, security model, frontend, backend, AI integration, deployment pipeline — had to ship production-ready in under five weeks.

Approach

Architecture

Deploy Verify is a Laravel application with three AI-powered services running through Amazon Bedrock’s Converse API, all using Nova:

An instruction helper that converts loose, conversational test descriptions into structured Nova Act instructions and assertions
A failure diagnosis service that analyzes failed runs using browser artifacts, console logs, HTML source, and recent Git commit history to identify the likely cause and suggest a fix
A run summary generator that produces concise overviews of verification results

Each site in Deploy Verify receives a unique webhook. When a deployment pipeline calls the webhook, Nova Act launches a real browser session and executes the tests against the live site.

Why AI-first, not AI-assisted

I could have built verification tests using regular expressions and hard-coded selectors — the traditional approach. But brittle, selector-dependent testing is exactly the problem I was trying to solve. Nova Act let me describe what a user does in near-human language, which means tests stay meaningful as UIs change. As AI-generated code becomes more common and developers ship changes they didn’t write line by line, verifying behavior rather than implementation details becomes essential.

Making failure the most valuable feature

An early failed test gave me the insight that shaped the product. If I provided enough context, the AI could help diagnose why a deployment broke — not just report that it did.

Deploy Verify collects screenshots, HTML snapshots, and JavaScript console logs during every test run. When a test fails, it also pulls in every Git commit message since the last successful deployment. All of this context feeds into Nova 2 Lite, which generates an analysis of the failure and suggests possible fixes. The result turns a failed deploy from a generic error state into actionable debugging information.

Security: defense in depth

Running a real browser on behalf of users is inherently dangerous. I addressed this with multiple overlapping layers:

URL validation blocks internal IP ranges and dangerous schemes before a test starts
iptables rules restrict the browser’s network access at the kernel level
A dedicated low-privilege OS user (dv-browser) runs the browser process with no access to application code, secrets, or SSH keys
JavaScript hardening is injected into every page to stub window.open() and restrict navigation to safe schemes

Each layer catches what the others might miss. No single safeguard is trusted alone.

Outcome

Deploy Verify is live and in use on my own production sites. I’ve built an invite system for onboarding early users and plan to begin broader access.

The most meaningful outcome so far is personal: I no longer schedule deployments around my weekends. Deploy Verify runs after every deploy and tells me whether the site works — the way a user would find out, but before they do.

Reflection

The hardest design problem wasn’t technical — it was finding the right level of structure for natural-language test instructions. Too rigid and you’re just writing code with extra steps. Too loose and the AI can’t execute reliably. The AI writing assistant emerged directly from this tension, and I think it’s the feature that will determine whether other developers actually adopt the tool.

If I could restart the project, I’d invest more in the test instruction format earlier. I went through several iterations before landing on a structure that felt approachable but ran deterministically. Starting there would have saved time downstream.

The broader lesson is about timing. I’ve wanted a tool like this for years, but the technology to build it didn’t exist until recently. Nova Act made it possible to describe verification in human language rather than brittle selectors. Twenty years of knowing what breaks in production, combined with AI tooling that finally matched the idea — that’s what produced Deploy Verify.

Video walk-thru

Below is the submission video I created for the Hackathon. It will give you a quick overview of the app.