Anti-fragility and the Chaos Monkey: this is what IaaS is all about!

Post 14 of 42

All this money, and brains too

One of my favorite non-fiction authors at the moment is Nassim Nicholas Taleb. His claim to fame: he foresaw the meltdown of the Financial System – pre-2008.  In his books ‘Fooled by Randomness’ and ‘The Black Swan’ he makes a very compeling case that our current financial system is as safe as the modern-day Titanic on it’s maiden trip in Arctic waters. Nassim is not a Monday-Morning Quarterback. As a securities dealer he has put his money were his mouth is. He saw the trainwreck coming and has profited handsomely. Or, in his own colorful prose: he made enought ‘F*** You’ money from the financial meltdown to pursue a career outside the financial services sector. Nassim uses his new-found time to write about the consequences of our ridiculous belief that we are able to predict or model the future.

 How antifragile we are

Nassim’s latest book is ‘Antifragile’. The concept of ‘Antifragile’ is exactly what the word suggests: the polar opposite of fragile. Something is ‘Antifragile’ if it benefits from stressors, just as ‘fragile’ means that something will deteriorate when under the influence of a stressor. An example: when I drop my iPhone on the floor, the glass display will crack. It is hurt. ‘Antifragile’ means (this is an imaginary example) that, if my iPhone were antifragile, it would improve when dropped from a 50-story building. Nassim goes to great lengths to explain that Antifragile is not equal to ‘robust’, ‘resilient’ or ‘strong’. Back to the iPhone example: a protective rubber ball around my iPhone makes it robust, but not Antifragile. If you’re now puzzled and intrigued by this idea, I encourage you to read the book.

Ancient history

 Nassim calls the ‘fragile – robust – antifragile’ spectrum the Central Triad of his latest book. To illustrate the Centrial Triad at the beginning of his book, Nassim uses three old Mediterrenean myths/legends:

  • An illustration of a fragile system is Damocles’ sword. According to the Roman legend, Damocles is sitting right under a sword that is hanging on the ceiling with a single hair. If the hair breaks, the sword will chop up Damocles.
  • An example of a robust system is the Phoenix. The most remarkable feature of this ancient bird is its ability to rise from its ashes.
  • Finally, the Antifragile example is the (lernaean) Hydra. When you chop of on of the many head of this snake-like, 2 new ones will replace it. The Hydra grows stronger from an attempt to do damage.

 A nuclear sword of Damocles

This is all interesting, but what does this have to do with Infrastructure-as-a-Service? Well, it struck me that Nassim’s Centrial Triad explains the value of Infrastructure-as-a-Service. With IaaS, it is possible to build an IT infrastructure that is Antifragile, as I will illustrate with the example of NetFlix’ Chaos Monkey.

To understand this, consider first the class of legacy-applications that are fragile and are not capable of surviving a physical outage of the machine they are running on. They are in mythological terms, ‘swords of Damocles’. Here’s an extreme – and real -  example: the control software of an (old) nuclear plant that runs on a single physical computer.  This means: if this computer fails,  the control software goes down. I’m not suggesting that this will lead to a nuclear meltdown immediately, but I hope that none of my loved ones is within a 100-mile radius when this happens  – the comparison with Damocles is too close for comfort. Another example of the fragility of older code: the Y2K problem a.ka. the Millenium Bug. A single bug was enough to cause unforeseen damage, which created a whole industry by itself.

It’s SPOF, not Spock

Thanks to the popularity of 21-st century IT technologies that have become mainstream over the last decade or so, such as virtualization*, distributed computing applications and redundancy concepts, we have come a long way in the Information Technology. There’s plenty of experience in finding and circumventing so-called SPOF’s – Single Points Of Failure. In fact, IT has become quite good at building robust systems: the world has seen it fair share of natural & manmade disasters, from massive power outages and blackouts to nuclear disaster, the internet has not seen a single millisecond of downtime. IT can be called robust.

There’s a monkey on my app

 Robust is not anti-fragile. These Antifragile systems, it turns out, are ‘organic’ systems: for long-term survival and growth, they rely on constant (random) mutation and adaptation. This is how the cells in our body behave. Compare this with Chaos Monkey: a piece of software with a particularly well-chosen name, developed by Netflix (for the European crowd: a hugely popular US-based provider of Video-on-Demand). Chaos Monkey will throw an infinite amount of monkey-wrenches at applications that run in Amazon’s popular Infrastructure as a Cloud Service (AWS) by randomly turning off the instances. Chaos Monkey generates random mutuations to find (and improve) those applications with weaknesses. This is anti-fragility.

 

 

 

* Yes, I know: timesharing was invented in the 60-ies. And why did you geezers not use it for that Power plant? Please shut up!

 

 

 

 

 

 

 

 

This article was written by rondekko

Menu