KubeCon 2024: The Cloud Native Universe Is Evolving Around AI

5 min read3 days ago

KubeCon 2024 in Salt Lake City is all about demonstrating how the cloud native landscape, with Kubernetes at the center, is the ideal future home for AI applications. “AI at Scale” is the #1 topic we find when analyzing the 419 KubeCon sessions with the help of, drumroll, OpenAI GPT4o. The remaining topics (see chart) also include a significant focus on successfully deploying, running, scaling, and managing AI models and applications in cloud native environments (interesting keynote on this topic by the SVP, Head of AI, at Capital One. Also: 5 minute lightning talk).

Chart is based on the KubeCon 2024 — Salt Lake City Session Guide. Topic clustering was done through the GPT4o API. (source: cncf)

Onboarding AI Workloads to Kubernetes Clusters Is Highest Priority

Numerous sessions at KubeCon 2024 in Salt Lake City focus on illustrating that running AI models on Kubernetes-based platforms is the new normal. Many speakers are from organizations that have successfully created scalable and secure AI applications on open source environments centered around Kubernetes. Here are a few examples of sessions by speakers showing real-life success of their AI projects running mostly on Kubernetes:

Operational Side of AI/ML
Training AI Models on Kubernetes
Designing Production Infrastructure for LLM Acceleration
Sharing GPUs on Kubernetes

The largest group of AI-related KubeCon sessions focuses on serving and optimizing AI models in a scalable and repeatable manner (see chart).

The top 5 AI-focused topics show a clear focus on conveying the advantages and broad feasibility of running AI models and applications on Kubernetes infrastructure. — Salt Lake City Session Guide. Topic clustering was done through the GPT4o API. (source: cncf)

Ray (interesting KubeCon session by Google and AnyScale on project Ray), for example, is an open source framework that helps DevOps teams overcome the traditional scalability limitations of the Python programming language.

Ray provides easy-access to parallel processing capabilities for Python apps on Kubernetes. It comes with Python libraries for training, tuning, service, etc. AI models. Source: Ray.io

Ray enables distributed computing and parallel processing across Kubernetes clusters, simplifying the otherwise tricky deployment from the developer laptop into large scale production environments. The idea behind Ray is to transform existing Python code into distributed apps with only minimal modifications.

Top 5 Priorities based on the analysis of the 3,621 GitHub issues currently open as of Nov 7, 2024. — github.com

Needless to say that Ray is targeted primarily toward AI workloads. The platform provides observability into the current state of production AI models, and comes with a set of libraries to simplify distributed training, model tuning, and serving, all without the traditional limitations of the Python runtime. As a result, the Ray open source project has shown (almost) hockey stick growth on GitHub (metric: GitHub stars).

Ray shows hockeystick growth, while Kubeflow’s growth curve has flattened (based on GitHub stars) — github.com

Platform Engineering Is What Makes the AI Magic Happen

Breaking down the platform engineering sessions from the KubeCon 2024 (Salt Lake) agenda into core themes.

Deploying and running (AI) applications on multiple Kubernetes clusters and in multiple clouds takes the top spot in the list of platform engineering-centric KubeCon topics (see chart). This is all about making it as easy for developers to take advantage of today’s wide range of AI-capabilities for their current and future applications. The developer platform provides the orchestration, automation, policy, and security guardrails to make this happen. Of course, the above discussed project Ray can be a significant part of a developer platform aiming at accelerating the adoption of AI in production. Thus, platform engineers are becoming the crucial enterprise personas for accelerating AI adoption. This offers corporate IT the opportunity to move on from being seen as a cost center into the spotlight of a strategic function with direct impact on the bottom line.

Web Assembly (WASM) Was “Made for” AI

WASMCON is a two day event right before KubeCon. The topics at WASMCON 2024 in Salt Lake City show how WASM can become crucial for running AI, LLMs, and machine learning workloads across various environments — from cloud to edge devices. The chart show the growth of the currently most starred GitHub repositories related to WASM.

WASM’s core characteristics of being able to consistently execute workloads across multiple platforms driven by different types of hardware (ARM, x86, etc) in combination with its resource efficiency makes it an excellent container technology for moving AI compute requirements directly next to the data. The almost complete isolation and granular access control of WASM from its underlying operating system is critical for reducing security and data compliance risks.

WASM allows AI applications to run consistently across different platforms, including browsers, servers, and IoT devices. There are 11 sessions at WASMCON talking about exactly this ‘serverless’ aspect of WASM (see chart).

Nine sessions at WASMCON aim to educate developers and DevOps pros on how exactly WASM works and how to best develop applications for it. WASM requires a significant shift in thinking on the side of developers and operators alike, as WASM containers are significantly more isolated from their underlying operating systems compared to application containers. Therefore, developers need to understand how to manage dependencies, memory, and system interactions, as they do not have the level of access to the underlying Linux kernel as they are currently used to. This increased isolation demands a deeper understanding of WebAssembly’s system interface (such as WASI) for interacting with file systems, networking, and other system resources, as well as a shift toward more modular, composable application designs that align with WebAssembly’s strengths in security, portability, and lightweight execution.

However, the payoff for embracing this paradigm change in developing and operating applications could be significant, as WASM allows developers to truly ‘code once’ and then deploy their code module ‘anywhere.’ This modularized approach toward software development and operations could relieve developers of the burden of worrying about how to make their apps work on different clouds, while allowing operators to take charge of all of application lifecycle management without depending on developers. Applying this to our AI use case, organizations can create granular data and inference pipelines that will ensure a high degree of consistency, compliance, and efficiency.

Summing It Up

Kubernetes is now over 10 years old and ready for prime time. Prime time, in this case, means that organizations expect the Kubernetes ecosystem to be able to handle AI workloads and applications in a simple, secure, compliant, and scalable manner. KubeCon 2024 in Salt Lake City will be laser-focused on proving that the CNCF portfolio of cloud native apps, with Kubernetes at the center, is ready to take on this challenge.

Time Travel to…

… KubeCon 2024 in Paris

… KubeCon 2023 in Chicago

… KubeCon 2023 in Amsterdam

… KubeCon 2022 in Detroit

… KubeCon 2021 in Los Angeles

… KubeCon 2020 Virtual

KubeCon 2024: The Cloud Native Universe Is Evolving Around AI

Onboarding AI Workloads to Kubernetes Clusters Is Highest Priority

Platform Engineering Is What Makes the AI Magic Happen

Web Assembly (WASM) Was “Made for” AI

Summing It Up

Written by Torsten Volk