Turbonomic Brings Autonomic Computing Back to IBM — But This Time It Might Just Work
IBM, the pioneers of autonomic computing, the concept of creating self-aware systems that automatically optimize their own cost, availability, and user experience (Robert Morris, IBM, Almaden Research Laboratory, 2001) announced their intend to acquire Turbonomic, one of today’s most exciting deep-learning and streaming-analytics driven application resource management plays.
“The growing complexity of the I.T. infrastructure threatens to undermine the very benefits information technology aims to provide” IBM manifesto on autonomic computing from 2001.
What Turbonomic Brings to the Table
Turbonomic uses today’s extreme complexity of applications stacks, interdependencies, and public cloud service offerings to its advantage. This complexity may have prematurely ended the first era of autonomic computing in the early 2000s, but today we have the storage, memory, and compute power available to continuously feed deep learning algorithms with a massive stream of operations data (independent variables) and application performance, cost, and reliability data (dependent variables).
Turbonomic has created inference models that continuously watch and evaluate resource allocations and configurations within their individual business context. For example, the platform may see 100 Java apps running on top of a VMware hypervisor stack, all of them doing just fine. However, information coming from the vCenter API shows Turbonomic that 7 out of these 100 apps have been accumulating an increasing level of storage latency. API data from the DataDog APM shows no sign of user experience degradation and the ServiceNow API shows no tickets associated with any of these 7 apps. Turbonomic might now wait until Monday morning, as of course all of this happened over the weekend but is clearly not an emergency, to make an offer the VMware admin team will most likely not refuse:
“The following 7 apps show an increasing number of storage latency. This is due to increased use and the fact that all 7 of them share the same storage volume. Would you like me to tell vCenter to separate storage of these apps so that they share storage volumes with apps that are much less busy? Press OK to proceed.”
In reality, there would be hundreds or thousands of additional variables playing into this decision, some of which are based on hard and fast rules (regulatory compliance might not allow moving some or all of the data) while others will be based on lessons learned from approaching similar situations in the past in a similar manner. Maybe one of the 7 apps relies on other storage volumes that constantly exchange data with our culprit volume and therefore moving this volume might also require moving other volumes, which in turn could negatively affect different apps. While this type of inferencing might not be a challenge for deep learning models, human operators would most likely not be able to identify and cross-correlate the data necessary to arrive at the same conclusion.
Bringing Down Public Cloud Cost
Over-provisioning public cloud resources is typically seen as an insurance policy against performance problems and outages. Approximately 50% of public cloud resources are dedicated “insurance policies.” However, in many cases over-provisioning does not bring the desired protection but simply drains the organization of funding. AWS, Azure, and Google Cloud have little incentive to curb this waste as invoices paid for unused resources are a big part of what fuels public cloud vendors’ ability to compete by heavily investing into new services, data centers, or clever marketing campaigns. Based on the observation of thousands of similar application deployments across customers, the Turbonomic platform can make recommendation regarding where resources can be reallocated for increased efficiency, protection, or application performance. Customers receive advice where overallocation does not show any positive effect at all so that the respective services should be cancelled. For example, my cloud engineers may have created a number of failover resource targets that the application itself would not be able to use, maybe due to the fact that a certain service is not able to dynamically adjust its network configuration for the desired type of failover procedure. Not only will this lead to wasted funds, it also adds to the unknown operational risk for the organization.
Turbonomic + Instana + Watson = Autonomic Computing 2.0
The combination of Turbonomic (resource level), Instana (application code level), and Watson (AIOps) provides IBM with a very strong basis for getting Autonomic Computing right in the end. The vision is certainly exciting and the fact that Big Blue has placed some serious funds behind this crucial initiative is a clear signal for the competition to step up their game.