On Tuesday, a bevy of popular sites including Amazon, Spotify, eBay, Reddit, Twitch, and Pinterest were temporarily unavailable in a widespread outage that hit Fastly, a leading web services provider. Major news outlets such as the New York Times, the BBC, and CNN were also down, as well as some government websites in the UK. Users who attempted to visit the sites ran into 503 server errors, and some speculated that a hacker was responsible.
However, the culprit was someone far less nefarious: an unsuspecting customer at Fastly who accidentally triggered a software bug. How did it happen? CNET relates:
In mid-May, Fastly issued a software deployment that contained a bug, which if triggered in specific circumstances could take down vast swaths of its network. The bug lay dormant until June 8, when one Fastly customer inadvertently triggered it during a “valid configuration change,” which caused 85% of the company’s network to return errors.
How was the problem fixed?
Luckily, the team at Fastly was on it relatively quickly. Nick Rockwell, Senior Vice President of Engineering and Infrastructure at Fastly, wrote on the company blog:
We detected the disruption within 1 minute, then identified and isolated the cause, and disabled the configuration. Within 49 minutes, 95% of our network was operating as normal. This outage was broad and severe, and we’re truly sorry for the impact to our customers and everyone who relies on them.
What is Fastly and why is it so critical to so many websites?
Fastly is an edge cloud provider, meaning that it brings web content closer to the users trying to access it. CNET explains:
[I]f you’re accessing a website hosted in another country, it will store some of that website closer to you so that there’s no need to waste bandwidth by going to fetch all of that website’s content from far away every time you need it. This makes for faster website load times, and optimizes images, videos and other high-payload content to show up quickly and smoothly when you land on a web page.
The Fastly outage is just one example of how devastating of an impact disruptions at a single company can have when only a handful of companies provide essential services for countless websites. Corinne Cath-Speth, a Ph.D. candidate at Oxford Internet Institute and the Alan Turing Institute observes:
[A] technical hiccup in a single company can have huge ramifications… This in turn — raises major questions about the dangers of (power) consolidation in the cloud market and the unquestioned influence these often invisible actors have over access to information.