blog

Aug 2, 2024

CrowdStrike incident – Our response to the crisis

In this blog, we delve into the recent CrowdStrike incident—a pivotal moment for the tech industry. Will the industry as a whole take the necessary steps to improve its resilience?

 

 

 

CrowdStrike incident – Our response to the crisis

I am going to write about CrowdStrike, as this is a watershed moment for the Tech industry, but first I want to be transparent:

At Shipnet around 15% of our customers services were impacted by this outage. Of that 15%,around three quarters were up and running within 24 hours, with the remaining few returned to service within 48 hours. Not a single customer missed a single minute of service, whenever their Monday started.

 

We think we did ok…

Of course, we will find out more as we go through our own investigation and root cause analysis this week, and I am sure we will find things we would do better if it happens again, but I am proud of the work my colleagues put in to restore services, and grateful for the support and kind words from the customers who were affected.

We will learn from this, and we will get better.
But will the industry?

On CrowdStrike specifically, I am struck by the way that so many businesses have architected an enormous single point of failure into their systems. Lots of people use CrowdStrike as it makes compliance and audit easier.

I saw a CIO on Twitter at the weekend saying that when he says to an Auditor:

“ ‘we use CrowdStrike’, a box is ticked, and we move on. When I say we use something else, a whole new (and very expensive book) is opened up and the questions start coming…’

 

This seems to be unique. we are incentivising people in a wholly novel way to architect in a single point of failure.

Few CIOs or CTOs are going to argue in favour of making audit more onerous, and CFOs are often more than prepared to pay more (CrowdStrike isnot cheap) to reduce audit-related risks. As a result of this, we have a component embedded into individual computers that can cause a complete failure, and those individual computers collectively drive global industry: I saw posts talking about impact on Finance institutions, Logistics, Broadcast Media,Health, Government and so on. 

In fact, barely any industry was unscathed by Friday afternoon.

So, perhaps the change that ends up happening as a result of all of the fuss in the last few days is a bit more trust in our internal knowledge, a preparedness to go to bat against audit, and a greater diversity in your digital supply chain.

For now, I just want to say a massive thank you to all those, not just within Shipnet, who worked all the hours necessary for this to create as little impact to customers as possible. Not an easy task. In fact, it was an unprecedented task and one- considering what could have been- which was undertaken with the tenacity and the seriousness required. 

 

john new profile

John Wills

Director, Customer Experience