[visitor_weather]
[gtranslate]
Breaking News

Amazon instituted a 90-day safety reset after losing millions of orders due to outages. One incident was caused by its AI tool Q, while a separate case had 6.3 million orders lost. Engineers must now obtain two approvals before any code change is deployed.

Key Highlights

  • After a series of outages struck its shopping platform, Amazon instituted a 90-day safety reset.
  • Amazon’s A.I. tool Q helped cause an outage on March 2 that lost nearly 120,000 orders.
  • On March 5 a different outage resulted in a 99% decrease in orders, leading to the loss of 6.3 million sales
  • Engineers now need two people to approve any code changes in the future.
  • Some 335 of Amazon’s most valuable internal systems are now subject to the new rules.

It All Went Wrong In Late 2025

Amazon has been grappling with increasing internal outages since the shallow third quarter of 2025. Dave Treadwell, Amazon’s senior vice president of e-commerce services, wrote in a message to the staff on Tuesday that it had noticed a clear trend of problems and that multiple significant ones had occurred only in recent weeks. The message is clear that things had gotten bad enough to require a proper fix.

Some were caused by software updates that went too far, too fast without the right safety precautions in place. In other instances, data was corrupted and took hours to resolve. Some of the failures boiled down to something as fundamental as failing to have two people sign off on a code change before it went live. Those are simple rules to follow but they were absent or overlooked.

One Reason AI Tool Built by Amazon 

Customers across Amazon’s platforms began encountering incorrect delivery times when they attempted to add items to their carts on March 2. This might seem trivial yet it was not. The issue resulted in nearly 120,000 misplaced orders and about 1.6 million errors on the website. It was driven in part by Amazon’s own AI coding tool Q, an internal review found.

This is where the plot thickens. Amazon developed Q to enable its engineers to code more quickly. And it’s working, engineers using tools like Q can create vastly more code than they used to. But more code moving faster also creates more potential for something to go wrong if the review process fails to keep pace. Amazon’s own internal documents bluntly stated what should have been obvious from the start: The fact that AI is being used in core operations will continue to reveal weak points where appropriate guardrails don’t exist yet.

Then Things Got Even Worse

March 5 presented a far greater problem. Another outage struck Amazon’s North American shopping platform on Wednesday, and this time orders didn’t just slow down. They nearly stopped. Orders fell by 99% on the platform and a total of 6.3 million sales were lost. The cause was a modification to a live production system that skipped the proper documentation and approval process required by Amazon.

Automated checks were not run before the change went live. Thus, one human could issue a large-scale configuration change with no other reviewer and without safety guards. Amazon’s internal documents referred to it as a high blast radius change, which means one malicious update can rapidly propagate harmful effects throughout the entire system. It was just the sort of thing that safety rules are supposed to avoid, and this time those rules were simply not observed.

Amazon Is Now Tightening the Rules

To address the issues, Amazon is instituting a 90-day safety reset. The updated regulations apply to about 335 of its most critical internal systems, the ones that directly impact what customers can see and do on the platform. All of those systems are now swept up in more rigorous controls for the duration of the reset.

New rules require engineers to have two people review their work before any change to code goes live. They also must use an internal tool that monitors and approves changes, and run their code through an automated system that tests it against Amazon’s own reliability standards. And every senior leader who has one of those 335 systems assigned to them is being told to go back and audit everything their teams have done recently. Amazon is also exploring longer-term solutions that bring newer tool-sets and technologies company-wide, blending rule-based verifications with AI tools to construct something more robust than its current form.

FAQs

1.What caused Amazon’s outages? 

A combination of factors, Amazon’s AI coding tool Q, lack of safety checks and unapproved code changes, led to a series of issues.

2. How many orders did Amazon lose? 

On March 2 about 120,000 orders went missing and by March 5 that number jumped to lost orders of 6.3 million across North America.

3. What is the 90-day reset? 

It is a temporary series of stricter rules that Amazon implemented to prevent the same issues from recurring as it develops longer-term solutions.

4. What systems are affected by the new rules? 

Of Amazon’s most critical internal systems, the kind that have a direct impact on customer shopping experience, approximately 335.

5. Is AI getting the blame for all of it? 

Not all of it. Amazon said only one of the reviewed incidents was directly related to AI, and that none involved code that had been wholly written by it.


Follow Inspirepreneur Magazine for the business news.

Table of Contents