Deploy Verify
Post-deployment verification tool that uses AI-driven browser automation to answer the question CI pipelines don’t: “Does the site actually work for a user right now?”
- Role: Solo developer — architecture, frontend, backend, security, deployment
- Timeline: 4–5 weeks to production-ready, 2026
- Stack: Laravel, Amazon Nova Act, Amazon Bedrock (Nova 2 Lite), SQL, self-hosted
- Links: Live
The Problem
Modern CI pipelines are good at answering “Did the build succeed?” They are much worse at answering “Does the site actually work for a user right now?”
Unit tests and integration tests verify implementation details — selectors, DOM structure, tightly coupled assumptions. They break as UIs evolve, and they miss the failures that matter most: a user can’t log in, a form doesn’t submit, a dashboard won’t load.
For most of my career, the final check was manual. I’d open the production site after a deploy and click through critical flows myself. Over time, I moved most of my major deployments to Fridays — not because I followed the “never ship on Friday” rule, but because I worked for myself and wanted weekends as a buffer. If something broke in production, I’d rather catch it during a slow period and fix it before Monday than have a client find it first. That approach worked, but it didn’t scale, and it meant I was always on call after a deploy.
I wanted a tool that runs after deployment, from the outside, and validates behavior the way a real user would.
Constraints
- Natural language had to be precise enough to execute. Tests are written in plain English, but a browser agent needs deterministic instructions. Too rigid and it feels like writing code again. Too loose and tests fail unpredictably.
- Browser isolation. Running a real browser on behalf of users introduces risk. Without safeguards, test instructions could reach internal services, access the AWS metadata endpoint, or probe the local network.
- AI integration had to be structural, not decorative. I didn’t want AI bolted onto a conventional test runner. The tool needed AI at its core — for writing tests, executing them, and diagnosing failures.
- Solo timeline. The entire application — architecture, security model, frontend, backend, AI integration, deployment pipeline — had to ship production-ready in under five weeks.
Approach
Architecture
Deploy Verify is a Laravel application with three AI-powered services running through Amazon Bedrock’s Converse API, all using Nova 2 Lite:
- An instruction helper that converts loose, conversational test descriptions into structured Nova Act instructions and assertions
- A failure diagnosis service that analyzes failed runs using browser artifacts, console logs, HTML source, and recent Git commit history to identify the likely cause and suggest a fix
- A run summary generator that produces concise overviews of verification results
Each site in Deploy Verify receives a unique webhook. When a deployment pipeline calls the webhook, Nova Act launches a real browser session and executes the tests against the live site.
Why AI-first, not AI-assisted
I could have built verification tests using regular expressions and hard-coded selectors — the traditional approach. But brittle, selector-dependent testing is exactly the problem I was trying to solve. Nova Act let me describe what a user does in near-human language, which means tests stay meaningful as UIs change. As AI-generated code becomes more common and developers ship changes they didn’t write line by line, verifying behavior rather than implementation details becomes essential.
Making failure the most valuable feature
An early failed test gave me the insight that shaped the product. If I provided enough context, the AI could help diagnose why a deployment broke — not just report that it did.
Deploy Verify collects screenshots, HTML snapshots, and JavaScript console logs during every test run. When a test fails, it also pulls in every Git commit message since the last successful deployment. All of this context feeds into Nova 2 Lite, which generates an analysis of the failure and suggests possible fixes. The result turns a failed deploy from a generic error state into actionable debugging information.
Security: defense in depth
Running a real browser on behalf of users is inherently dangerous. I addressed this with multiple overlapping layers:
- URL validation blocks internal IP ranges and dangerous schemes before a test starts
- iptables rules restrict the browser’s network access at the kernel level
- A dedicated low-privilege OS user (dv-browser) runs the browser process with no access to application code, secrets, or SSH keys
- JavaScript hardening is injected into every page to stub window.open() and restrict navigation to safe schemes
Each layer catches what the others might miss. No single safeguard is trusted alone.
Outcome
Deploy Verify is live and in use on my own production sites. I’ve built an invite system for onboarding early users and plan to begin broader access after the current hackathon judging period ends.
The app was built for the Amazon Nova AI Hackathon. As of this writing, results haven’t been announced, but I’ve been invited to the awards ceremony as one of roughly 45 recognized projects.
The most meaningful outcome so far is personal: I no longer schedule deployments around my weekends. Deploy Verify runs after every deploy and tells me whether the site works — the way a user would find out, but before they do.
Reflection
The hardest design problem wasn’t technical — it was finding the right level of structure for natural-language test instructions. Too rigid and you’re just writing code with extra steps. Too loose and the AI can’t execute reliably. The AI writing assistant emerged directly from this tension, and I think it’s the feature that will determine whether other developers actually adopt the tool.
If I could restart the project, I’d invest more in the test instruction format earlier. I went through several iterations before landing on a structure that felt approachable but ran deterministically. Starting there would have saved time downstream.
The broader lesson is about timing. I’ve wanted a tool like this for years, but the technology to build it didn’t exist until recently. Nova Act made it possible to describe verification in human language rather than brittle selectors. Twenty years of knowing what breaks in production, combined with AI tooling that finally matched the idea — that’s what produced Deploy Verify.