Why IT professionals must learn to love chaos

Take any crisis right now—from climate change to supply chain shortages to global pandemics—and much of the narrative is around resilience and robustness. In other words, preparing for the worst case scenario, and being able to limit the damage if it happens. But in a business context, this well-intentioned view of risk-management can be short-sighted. A strategy that prioritises future-proofing above everything else is futile because it presumes the future can be predicted.

Attempting to eliminate risks is also naive because it assumes that all volatility is bad. But systems and organisations can ultimately benefit from shocks. They can learn and develop when exposed to randomness, disorder, and uncertainty. Traditional approaches to risk pit preservation against transformation in a zero-sum game—resilience resists, robustness absorbs, the ‘problem’ is defeated and the status-quo maintained—whereas an ‘antifragile’ mindset leverages crises to generate improvements. Chaos isn’t to be feared, but embraced… even sought after.

This is a lesson that IT professionals should learn. Many of us spend too much of our focus (and budgets) trying to avert potential crises. We develop and invest sophisticated redundancy strategies, and demand SLAs are peppered with business continuity clauses. In doing so, we not only direct resources away from new ideas and innovations, but also deny ourselves the learning opportunities that failures bring.

It does not require too much of a change of mindset. If you work in any DevOps function, you’ll be comfortable with the ‘fail fast, early and often’ mantra. Failure shows you what’s wrong, so you can work out what’s right. And it avoids those failings cascading into production. But rarely do we adopt this on a larger and with a longer-term view. When something goes wrong in a small and controlled test environment, it can be scratched off and we move on. But at scale, in the open, mistakes are less forgiving; so we learn not to make them.

Emily Brand, Chief Architect of Transformation and Adoption, Red Hat

This dual-approach to risk and failure is flawed. Culture needs to be consistent throughout an organisation for it to be effective. You can’t have one part of the business embracing the benefits of failure, and another determined to avoid them at all costs. It will leave everybody befuddled about which hat to wear with which projects.

Of course, there are gradients. You can’t have the same gung-ho attitude to risk with every project. Some matter more than others. But by learning from failures through the full development lifecycle, you can identify trends affecting your IT infrastructure and make improvements quickly. The alternative is to limit change, or change too slowly, believing that strategy provides stability. In fact, it is only by continually changing that you are able to roll with the punches when they come.

It’s about finding the balance; understanding the worse-case scenarios, but not letting risk and caution suffocate innovation and progress. Too many organisations set out to stack the see-saw on the side of mitigation. They double-down on redundancy and cybersecurity strategies that may never be required, and which depress creativity in the meantime. It’s time that the cautious tail stopped wagging the productive dog.

IT professionals must drive this change in their organisations. C-Suites and Boards are by nature conservative and have too much riding on short-term goals—sales targets, brand reputation, the share price—to entertain notions of chaos. And we must be ambitious too. Let’s not settle for being adaptable – that is limiting, because it presumes you protect and merely tweak what you have. And we must think bigger than ‘agility’. Instead of being beholden to any one system or software vendor, we should force ourselves to continually work with new partners and concepts, regardless of whether we have an immediate need for them.

In the same way, we must recruit the widest range of talent for our teams, for the challenges we don’t yet know about. Diversity of backgrounds overcomes the ills of group thinking, and so also allows us to see the likelihood of the implausible event.

And we must break what we have so we know how to rebuild one day. The modern IT team should include a division of internal saboteurs—one team to make, and another to break.

We must bring these principles and ideas (and ourselves) in from the fringes of our organisations, and hard-code chaos into our cultures.

Embracing chaos is not about losing control, it’s quite the opposite. It’s about amassing the different capabilities and experiences you may need, instead of resourcing for a future that has been almost certainly predicted incorrectly. The latter wastes time, money and opportunities. Instead, we need to rethink ‘shocks’ in a positive light: not something that should be reacted to and nullified, but as part of the organic evolution of your organisation and competitive advantage.

In summary, as an IT leader you should create a chaos team of diverse roles and backgrounds and encourage them to go break stuff. But then ensure product groups are blameless and support them to fix the failures and to consider it a valiant exercise. You should also view your IT landscape as a whole—conceptually and practically with observability tools. Make this a key tenet of your overall IT strategy, instead of a last-minute add-on. Finally, understand where (and why) you are operating at the edge of efficiency. Accept that you will never have all the answers for growth, so be ready to benefit from unforeseen circumstances.

Related