TL;DR
AI can make implementation faster, but production software still depends on engineering disciplines: security, monitoring, maintenance, and recovery planning. Coding faster is not enough if the system itself is not built to withstand real-world risk.
~~~
I’ve been thinking a lot lately about how to use AI tools for coding. One of my main conclusions has been that implementation is no longer the bottleneck. A recent security incident involving one of our customers illustrated that idea from a completely different angle.
This customer came to us for some help modernizing their CMS, but before we even got started, their website simply stopped working. Their site had been compromised. The response wasn’t really about writing code. It was about investigation, containment, careful recovery, and understanding downstream risks because we were dealing with an adversary rather than a bug.
The experience ended up being a useful reminder about what engineering effort is valuable and highlighted some lessons worth sharing.
The Incident
The site failed without warning. No monitoring alert, no anomaly flag. The first signal was an HTTP 500, and the investigation that followed revealed something much worse than a bad deploy. Random-named directories had appeared under the webroot. Large portions of the deployed application appeared overwritten. Tens of thousands of media files had their timestamps updated, all as the web server process. Whether those files were actually modified or just touched, we couldn’t say for certain. What we knew was that the attacker had broad write access across the whole media library, which made every file suspect when it came time to recover. Scans found nothing obvious, but a clean scan is not the same as a clean server.
We never confirmed the exact initial vector. Our strongest working theory was remote exploitation through the web application stack (Laravel/Filament/Livewire), with outdated dependencies and unsanitized HTML output in the codebase as contributing factors rather than proven causes. Files written as the web server user pointed to HTTP/app-layer compromise, not SSH, though we could not rule out other paths, such as a malicious admin upload. We found out the site was compromised because it broke. That gap is the real story.
The Response
Initially, we tried to confirm how they got in, but it became clear we weren’t going to perfectly solve the puzzle. So we made the call to stop chasing the initial vector and focus on containment. We stopped nginx on the compromised host to cut off further exposure, then shifted to working out what we were actually dealing with. Database passwords got rotated. We confirmed that backups existed, both database snapshots and EC2 images. We audited dependencies, found multiple known advisories against Livewire and Filament, and opened a PR to clear them. We also identified an unsafe rendering pattern in which raw database content was being output without sanitization, and planned a fix using a proper HTML purifier.
Malware scans came back clean. That sounds reassuring, but a clean scan only means nothing matched a known threat signature. It told us nothing about when access started or whether data was taken. We couldn’t answer either question. There were no centralized logs, no query auditing, and the web logs were high-noise with limited actionable signal. We knew what the attacker did to the filesystem. We didn’t know when they first got in.
Recovery meant spinning up a replacement instance from a pre-incident EC2 image rather than trying to clean the compromised host. We kept nginx off on the new instance, updated credentials, merged the dependency and sanitization fixes, and validated that everything worked correctly in the isolated environment before reassigning the Elastic IP to put it in front of real traffic. The compromised server was preserved and set aside. Media files referenced by the pre-incident image came with it; nothing was copied over from the live compromised host.
Then we rotated Stripe and Mailgun keys. The risk with Stripe was not the customer’s login password, which the attacker could not have accessed, but the API secrets that had been stored in the application’s environment config. We reviewed Stripe activity directly (no anomalous charges) and cleared the old keys. We also checked whether the domain had been flagged by spam or malware databases, since a compromised server sitting on a trusted domain can be quietly used to send phishing emails or abuse the domain’s search and email reputation before the owner even knows. Everything came back clean.
A significant part of the work was customer communication, which ran in parallel with the technical response. We had to explain what happened, what risks it created, and what precautions made sense, to a customer who was simultaneously fielding questions from his own users. The honest answer on several fronts was “we don’t know yet.” We couldn’t confirm the initial vector, couldn’t prove whether data had been accessed, and were still in the middle of the investigation. That uncertainty had to be communicated carefully and clearly, without creating panic or overstating what the evidence actually supported. That is judgment work, not implementation work, and it was happening all day.
Take Aways
What slowed us down had nothing to do with writing code. We had no alerting for unauthorized file writes, no centralized log retention, only about eight days of database snapshots, and no documented process for vendor access during an incident. Local media storage meant even a clean rebuild required handling potentially tainted artifacts. These are design and operations decisions that never show up in sprint velocity, but they dominated the incident timeline. Coding faster would not have helped with any of it.
Operational Reality
The customer had come to us to modernize their CMS. Instead, we spent two days containing a compromised production site and getting it back online with stronger controls. That work didn’t ship a feature or close a ticket on the product roadmap. It was still the most important engineering we did that week. It also made the modernization scope clearer: dependency currency, observability, backup strategy, and safer rendering patterns had all been accumulating as technical debt for years. Addressing them was always going to be part of the job.
This incident was an interesting reminder that supporting a production application is much more than a CI/CD pipeline and a New Relic account. AI is rapidly making implementation cheaper, but production systems still need to be monitored, maintained, secured, and occasionally recovered. Teams are already capable of writing software faster than they can decide what to build. Spending all of your energy trying to code even faster while neglecting operational excellence is optimizing the wrong thing.
~~~
To learn more about how Flower Press Interactive helps secure systems for our customers check out our Ongoing Website & App Support Services.