eBPF and OpenTelemetry Rule At KubeCon 2023 in Chicago: Observability Is King

Torsten Volk
FAUN — Developer Community 🐾
8 min readNov 13, 2023

--

Observability was the undisputed king of KubeCon 2023 in Chicago.

Moscaic of Booth Photos from KubeCon 2023 in Chicago

Observability was the most frequently mentioned topic in the official schedule of 378 conference sessions and it was also the most prominent theme amongst the vendor booths on the show-floor.

Top 10 Topics From KubeCon 2023 Based On Official Session Catalog

The ever-present Kubernetes container orchestration and management platform, the OpenTelemetry platform for the collection, processing, aggregation, and exporting of telemetry data, and the elastic Berkeley Package Filter (eBPF) that enables auto-instrumentation, event-driven automation, and high-performance code execution were the three topics closest related to observability.

Top 10 Topics From KubeCon 2023 Related to Observability Based On Official Session Catalog

While the Kubernetes platform constituted the overall ‘backdrop’ for KubeCon 2023, OpenTelemetry and eBPF (extended Berkeley Packet Filter) were the most discussed topics of the show. Both technologies aim to standardize, simplify, and automate observability, visibility, and monitoring of cloud-native applications across data centers, private clouds, public clouds, and edge locations.

11 Speakers from Intel, RedHat, Sysdig, Isovalent, Microsoft, and DataDog talked about eBPF in 14 sessions at KubeCon 2023, while Apple, Red Hat, ObservIQ, Dynatrace, Google, Honeycomb, Grafana, Lumigo, AWS, Splunk, Coralogix, and Chainguard sent 24 speakers to hold 22 total sessions related to OpenTelemetry. These are big names and impressive numbers. But now let us take a closer look at both topics.

eBPF Gets The Inside Scoop Straight From The Horse’s Mouth (also known as the Linux kernel)

eBee Standing in front of the Isovalent Booth at KubeCon 2023.

eBPF allows application code to run in pre-compiled form (bytecode) at the level of the Linux kernel. Running as bytecode in the Linux kernel significantly enhances app performance, as the interpreter gets cut out of the equation and the app code can directly access system resources without having to traverse the OSI stack. Applications can directly listen to system-level events such as…

…network traffic: knowing the type, number, size, and origin of incoming and outgoing packets is key for optimizing network configurations, detecting security threats, and planning future upgrades.

…filesystem I/O: file operations such as open, close, read, and write can be correlated with other events, such as network events, to gain insights into app usage patterns and how they impact the rest of the application.

…system calls: watching system calls made by applications to learn how they interact with different parts of the application stack, such as the Kubernetes scheduler, Docker containers, or VMs.

…process lifecycle: understand application behavior through following the creation, execution, and termination of their processes. This becomes especially important when operating distributed apps that depend on numerous microservices working together.

…resource utilization: Memory allocation, CPU scheduling events, and disk I/O are all observable with eBPF. This real-time data is crucial for performance tuning and capacity planning.

…security monitoring: eBPF is capable of detecting changes to critical files, monitoring for common exploit signatures, and tracking user-level login events, making it an ideal tool for intrusion detection systems.

…kernel tracing: eBPF can listen to kernel tracepoints, which are static hooks in the kernel, to gather metrics or logs for debugging and performance analysis without impacting system performance significantly.

…hardware events: Through Performance Monitoring Counters (PMCs), eBPF can collect data on hardware-level events like CPU cache misses or memory paging aiding in low-level performance analysis.

…custom metrics: Beyond standard system events, eBPF allows for creating custom metrics based on specific needs. Users can write eBPF programs that define exactly what data to collect, how to aggregate it, and when to report it.

…scheduling and threading: Observations on how the kernel scheduler operates, how threads are created and destroyed, and how context switches occur are possible with eBPF. This is crucial for understanding the performance of concurrent applications.

The ability to directly listen to the Linux kernel’s event stream significantly reduces the need for instrumentation and therefore minimizes the risk originating from unmonitored systems. Correlating all of these event streams is the foundation for the automatic creation and continuous updating of a comprehensive dependency map for the entire organization.

eBPF Highlights From KubeCon 2023

Cilium Mesh Connects to Non-Kubernetes Infrastructure

At KubeCon 2023 in Chicago, Cilium announced the ability of Cilium Mesh (Cilium’s service-mesh platform) to connect to resources outside of Kubernetes. Based on eBPF, Cilium Mesh can now discover, observe and control any Linux-based application node independently of this node being part of a Kubernetes cluster. This allows Cilium users to consistently apply network policies and use the Hubble networking and security observability platform across cloud native and traditional application environments.

Isovalent Tetragon Collects Data With Minimal Performance Overhead

Isovalent, the creators of Cilium and eBPF, announced the general availability of Isovalent Enterprise for Cilium 1.14 release and major new Cilium Tetragon runtime security capabilities. Isovalent published a performance benchmark showing how the addition of observability via the Tetragon kernel runtime adds next to zero performance overhead.

source: Isovalent.com

Solo.io Gloo Fabric Core Simplifies and Optimizes Istio

Solo.io launched Gloo Fabric Core to simplify the installation, security, monitoring, and lifecycle management of Istio or Cilium service mesh technologies. Gloo Fabric Core provides enterprises with a unified dashboard for their existing service mesh clusters that can reside in different networks and on multiple clouds. eBPF is the core technology that enable’s Gloo to minimize network latency through bypassing the traditional TCP/IP networking stack and directly exchanging data at the socket level.

Source: Solo.io

Kubescape 3.0 Adds Kubernetes-native Security

ARMO introduced Kubescape 3.0 focused on protecting Kubernetes application stacks by continuously scanning runtime libraries, container images, CI/CD automations, and other additions and changes to a Kubernetes environment. Kubescape uses eBPF for scanning and now provides security data straight through the Kubernetes API, providing universal access for any Kubernetes tools, platforms, and workflows. For a detailed and critical discussion of Kubescape 3.0, see Bruce Gain’s excellent article.

IPv6 Support for Calico’s eBPF Dataplane

Calico now enables containers and virtual machines across Kubernetes clusters to communicate directly via IPv6. Calico uses eBPF to maximize network throughput, use less CPU resources, and natively supporting Kubernetes services without the performance overhead of a kube-proxy handling connection processing. With IPv6 support on the eBPF dataplane, Calico can ensure scalable, high-performance networking and security to address the demands of modern applications across diverse cloud and distributed environments. This is particularly beneficial for latency-sensitive applications, as IPv6 and eBPF can enhance performance and alleviate IP shortages. In summary, IPv6 support for eBPF in Calico means that Calico’s eBPF dataplane can handle IPv6 network traffic, providing high-performance networking and security for modern applications.

Touch Points Between OpenTelemetry And eBPF

On GitHub, both, eBPF and OpenTelemetry-related repos showed rapid sustained growth.

source: GitHub API
source: GitHub API

OpenTelemetry and eBPF are two powerful technologies that, when combined, provide deep visibility into applications and the underlying system with minimal performance and operations overhead. Here are the key connection points between OpenTelemetry and eBPF:

Telemetry Collection: Both OpenTelemetry and eBPF are capable of collecting telemetry data, albeit in different ways. OpenTelemetry, using its SDK, collects telemetry data such as traces, metrics, and logs. On the other hand, eBPF, due to its proximity to the kernel, excels at collecting operating system metrics, generating deep profiling, or any purpose that requires deep packet visibility.

Observability: eBPF provides a powerful mechanism for dynamic tracing and analysis within the Linux kernel. OpenTelemetry, a set of open standards and tools, collects, exports, and visualizes this telemetry data. By combining eBPF and OpenTelemetry, we can gain deep visibility into your application’s internals, as well as the underlying system, with minimal overhead

Performance Enhancement: eBPF can reduce the amount of overhead involved with the processing of packets at the application level since it can offload some activities to the kernel. This leads to enhanced overall performance and faster response times, which is especially beneficial when OpenTelemetry is used for monitoring and troubleshooting complex systems.

Automatic Instrumentation: eBPF’s ability to access user code and variables by analyzing the stack and CPU registers enables the development of powerful and flexible instrumentation. This feature is particularly useful in the context of OpenTelemetry, which aims to provide automatic instrumentation for services

Integration with Other Tools: OpenTelemetry is vendor- and tool-agnostic, meaning that it can be used with a broad variety of observability backends, including open-source tools like Jaeger and Prometheus, as well as commercial offerings. eBPF-generated telemetry data can be collected using tools like Pixie and then streamed to these backends using OpenTelemetry.

In summary, OpenTelemetry and eBPF are highly complementary, with eBPF extracting the observability data that is then collected, transformed, and exported via OpenTelemetry.

OpenTelemetry: One Telemetry Standard To Rule Them All

OpenTelemetry Logging is now available in several languages, fulfilling the commitment to providing a unified framework for cloud-native telemetry, including traces, metrics, and logging. Next on the roadmap continuous profiling, real user minitoring (RUM), establishing a strongly typed logging format, and extending support to front-end/client instrumentation.

Jaeger and Prometheus integration with OTLP are significant milestones on the way to standardizing telemetry data collection.

A collaborative effort with Elastic to align the Elastic Common Schema with OpenTelemetry’s Semantic Conventions, aiming to deliver a comprehensive metadata specification for structured telemetry data in cloud-native systems

Final Thoughts

eBPF and OpenTelemetry both aim at making simplified comprehensive observability accessible to anyone using Kubernetes. Auto-instrumentation is a key topic for both platforms, eBPF instruments by directly listening to the system bus inside of the Linux kernel, while OpenTelemetry offers a large number of autoinstrumentation libraries injecting bytecode into the application at runtime. Making instrumentation as simple as possible is critical for successfully implementing observability, visiblity, and monitoring, and this is exactly what both platforms aim to do.

Of course there is much more going on ‘behind the scenes’ of eBPF and OpenTelemetry. For example, the OpenTelemetry Transformation Language (OTTL) enables the transformation of telemetry data within the OpenTelemetry Collector runtime. This capability allows data filtering, processing, tagging, and routing before ingestion into an observability platform. Ingesting less, means a smaller observability bill.

eBPF wants to support additional operating systems, mainly Windows, further speed up packet processing, and aggregating metrics directly within the kernel.

👋 If you find this helpful, please click the clap 👏 button below a few times to show your support for the author 👇

🚀Join FAUN Developer Community & Get Similar Stories in your Inbox Each Week

--

--

Artificial Intelligence, Cognitive Computing, Automatic Machine Learning in DevOps, IT, and Business are at the center of my industry analyst practice at EMA.