Last night’s widespread outage of iinet, Australia’s second-largest Internet Service Provider, was bad enough. For hours, many of iinet’s customers had little or no access to the assorted services that they were paying for due to a cooling failure at an iinet data-centre, during record-breaking heat.
Bad enough, but iinet’s communications people actually managed to make things even worse than that, failing to communicate clearly, just when the company and its customers needed it most.
The data-centre issues took down a lot of things. Everything from iinet’s service status pages and much of its Web-sites, to hosted Web-sites, email accounts, and (rather critically) the authentication framework allowing broadband customers to connect to the Internet. This didn’t affect people who were already connected, but if you weren’t already connected or your connection had dropped out for any reason, you couldn’t get it back during the outage.
In that sense, a reported 98% of customers were largely unaffected – which seems great, unless you were one of the two percent. While my figures on iinet’s base of broadband customers is incomplete, 2% is somewhere well North of 2,000 families, based on the most conservative of estimates.
So, a minimum of 2,000 odd families across Australia who were directly affected by the outage, based on iinet’s estimate. Not counting those who were indirectly affected by such things as some of iinet’s domain name servers having intermittent cows.
Now that we’ve got an idea of the scope, let me say that (in my opinion) iinet actually performed really well in dealing with this, in pretty much every area except communications, during this incident. The problem was identified, addressed, and solved in far less time than you’d expect if you were aware of the scale and nature of the problem (which was admittedly hours).
Of course, most people aren’t aware of how much went into getting this sorted. That’s where this gets messed up.
Things apparently went into serious failure around 17:00AEDT, and the first real communication (via Facebook) about the nature of the problem wasn’t for another hour and a half.
Some customers outside of WA may be offline due to ongoing issues with heat in our Perth data centre.
Additional cooling is being brought online, and once confirmed effective we’ll update with an estimated time for services to come back online.
The first sentence is absolutely true, but hopelessly incomplete in describing both the scale and nature of the problems that customers would experience.
As for the second sentence? No. No ETA was provided. Eventually services started working, and approximately another hour and a half after the restoration of services, there was this:
We’ve resolved the heat impact to our Perth data centre, and we’ve brought all impacted services back online.
If you are still offline, we’d recommend turning your modem off, waiting for a minute, then turning it back on again.
For every one of the hundreds of users who actually did see the Facebook notice many more did not and tried to call iinet support.
Within a quarter of an hour, call-queues had already exceeded an hour. Shortly thereafter, the phone system basically overloaded and stopped being able to accept calls, as queues overflowed.
Customers thought that support was actually ignoring them – and iinet did not provide any information to lead them to believe otherwise.
Customers thought their problem was in addition to the ongoing problem – iinet did not provide any information that would be useful in determining whether or not they were among the affected or in what way they might be affected if they were.
Customers were waiting for an ETA, or more information, or even just a “we’re working as hard as we can” – but they didn’t get that either. Not until it was all over.
Some Customers had assignments. Others had work. Others just wanted to use the Internet for entertainment or socialising.
What iinet did not provide was information that would allow customers to make an informed choice about the use of their time or information that might have helped them choose alternative activities, venues or methods.
And there’s the rub, really. iinet not only failed to provide the service (an understandable and forgivable matter), but they wasted customer’s time and attention and left them in the dark about what to expect, when what people really wanted was for iinet to communicate with them.
And that’s what the communications staff is actually for, right? They weren’t, after all, the ones actually wrestling with the problem. They’re there to use what communications channels are there in order to soothe and inform angry or disappointed customers, and keep people from feeling ignored.
In that, the communications staff failed spectacularly at their primary mission.