A developer using the Claude Code AI agent accidentally deleted his production website infrastructure, including a database with 2.5 years of records and its backup snapshots. The error occurred because a critical Terraform state file was initially missing, leading the AI to create duplicate resources and then, after receiving the file, execute a "destroy" command that wiped both the old and new setups.
The data was restored within a day with help from Amazon Business support. The developer's post-mortem highlighted key lessons, including over-reliance on the AI agent, the need for manual review of destructive commands, and implementing better safeguards like delete protections and remote state file storage.
The main topics covered are an AI coding assistant error, infrastructure deletion, data recovery, and lessons learned about AI oversight and DevOps safety practices.
Claude Code deletes developers' production setup, including its database and snapshots — 2.5 years of records were nuked in an instant
Story has a happy ending of sorts, but should serve as a cautionary tale.
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
You are now subscribed
Your newsletter sign-up was successful
Everyone loves a good story about agent bots gone wrong, and those often come with a bit of schadenfreude towards our virtual companions. Sometimes, though, the errors can be attributed to improper supervision, as was the case of Alexey Grigorev, who was brave enough to detail how he got Claude Code to wipe years' worth of records on a website, including the recovery snapshots.
The story begins when Grigorev wanted to move his website, AI Shipping Labs, to AWS and have it share the same infrastructure as DataTalks.Club. Claude itself advised against that option, but Grigorev considered it wasn't worth the hassle or cost of keeping two separate setups.
Gregory uses Terraform, an infrastructure management utility that can create (or destroy) entire setups, including networks, load balancing, databases, and, naturally, the servers themselves. He had Claude run a Terraform plan to set up the new website, but forgot to upload a vital state file that contains a full description of the setup as it exists at any moment in time.
Claude did what Gregory wanted and created a setup for the Shipping Labs site, however, the operator stopped it halfway. Because it was missing the state file, it created duplicate resources. Gregory had Claude identify the duplicate resources to correct the situation, then uploaded the state file, believing he had the situation sussed out.
Unfortunately, Gregory assumed at this point that the bot would continue cleaning up duplicate resources and only then look into the state file to see how it was meant to be set up in the first place. Terraform and similar tools can be very unforgiving, particularly when coupled with blind obedience. As Claude now had the state file, it logically followed it, issuing a Terraform "destroy" operation in preparation to set up things correctly this time.
Given that the infrastructure description included the DataTalks.Club website, this resulted in a full wipe of the setup for both sites, including a database with 2.5 years of records, and database snapshots that Grigorev had counted on as backups. The operator had to contact Amazon Business support, which helped restore the data within about a day.
In the post-mortem, Gregory describes a few measures he's taking to avoid similar incidents in the future, including setting up a period test for database restoring, applying delete protections to Terraform and AWS permissions, and moving the Terraform state file to S3 storage instead of his local machine. He also admitted he "over-relied on the AI agent to run Terraform commands", and is now stopping the agent from doing so, and will manually review every plan Claude presents so he can run any destructive actions himself.
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
It's tempting to mark this story as another one of "dumb bot gone wrong," but it's a fair guess that most sysadmins will spot the baseline issues with Grigorev's approach, including granting wide-ranging permissions to what's effectively a subordinate of his, as well as not scoping permissions in a production environment to begin with.
Perhaps the biggest lesson is assuming that Claude would even have the context (pun unintended) to understand what the existence of the second website meant, just like a junior sysadmin wouldn't.
Bruno Ferreira is a contributing writer for Tom's Hardware. He has decades of experience with PC hardware and assorted sundries, alongside a career as a developer. He's obsessed with detail and has a tendency to ramble on the topics he loves. When not doing that, he's usually playing games, or at live music shows and festivals.