OpenTelemetry Auto-Instrumentation With Jaeger

In earlier days, it was easy to deduct and debug a problem in monolithic applications because there was only one service running in the back end and front end. Now, we are moving toward microservices architecture, where applications are divided into multiple independently deployable services. These services have their own goal and logic to serve. In this kind of application architecture, it becomes difficult to observe how one service depends on or affects other services.

To make the system observable, some logs, metrics, or traces must be emitted from the code, and this data must be sent to an observability back end. This is where OpenTelemetry and Jaeger come into the picture.

In this article, we will see how to monitor application trace data (Traces and Spans) with the help of OpenTelemetry and Jaeger. A trace is used to observe the requests as they propagate through the services in a distributed system. Spans are a basic unit of the trace; they represent a single event within the trace, and a trace can have one or multiple spans. A span consists of log messages, time-related data, and other attributes to provide information about the operation it tracks.

We will use the distributed tracing method to observe requests moving across microservices, generating data about the request and making it available for analysis. The produced data will have a record of the flow of requests in our microservices, and it will help us understand our application's performance.

OpenTelemetry

Telemetry is the collection and transmission of data using agents and protocols from the source in observability. The telemetry data includes logs, metrics, and traces, which help us understand what is happening in our application.

OpenTelemetry (also known as OTel) is an open source framework comprising a collection of tools, APIs, and SDKs. OpenTelemetry makes generating, instrumenting, collecting, and exporting telemetry data easy. The data collected from OpenTelemetry is vendor-agnostic and can be exported in many formats. OpenTelemetry is formed after merging two projects OpenCensus and OpenTracing.

Instrumenting

The process of adding observability code to your application is known as instrumentation. Instrumentation helps make our application observable, meaning the code must produce some metrics, traces, and logs. OpenTelemetry provides two ways to instrument our code:

  1. Manual instrumentation
  2. Auto instrumentation

1. Manual Instrumentation

The user needs to add an OpenTelemetry code to the application. The manual instrumentation provides more options for customization in spans and traces. Languages supported for manual instrumentations are C++, .NET, Go, Java, Python, and so on.

2. Automatic Instrumentation

This is the easiest way of instrumentation as it requires no code changes and no need to recompile the application. It uses an intelligent agent that gets attached to an application, reads its activity, and extracts the traces. Automatic instrumentation supports Java, NodeJS, Python, and so on.

Difference Between Manual and Automatic Instrumentation

Both manual and automatic instrumentation have advantages and disadvantages that you might consider while writing your code. A few of them are listed below:

Manual Instrumentation
Automatic Instrumentation
Code changes are required. Code changes are not required.
It supports maximum programming languages. Currently, .Net, Java, NodeJS, and Python are supported.
It consumes a lot of time as code changes are required. Easy to implement as we do not need to touch the code.
Provide more options for the customization of spans and traces. As you have more control over the telemetry data generated by your application. Fewer options for customization.
Possibilities of error are high as manual changes are required. No error possibilities. As we don't have to touch our application code.

To make the instrumentation process hassle-free, use automatic instrumentation, as it does not require any modification in the code and reduces the possibility of errors. Automatic instrumentation is done by an agent which reads your application's telemetry data, so no manual changes are required.

For the scope of this post, we will see how you can use automatic instrumentation in a Kubernetes-based microservices environment.

Jaeger

Jaeger is a distributed tracing tool initially built by Uber and released as open-source in 2015. Jaeger is also a Cloud Native Computing Foundation graduate project and was influenced by Dapper and OpenZipkin. It is used for monitoring and troubleshooting microservices-based distributed systems.

The Jaeger components which we have used for this blog are:

  1. Jaeger Collector
  2. Jaeger Query
  3. Jaeger UI / Console
  4. Storage Backend

Jaeger Collector: The Jaeger distributed tracing system includes the Jaeger collector. It is in charge of gathering and keeping the information. After receiving spans, the collector adds them to a processing queue. Collectors need a persistent storage backend, hence Jaeger also provides a pluggable span storage mechanism.

Jaeger Query: This is a service used to get traces out of storage. The web-based user interface for the Jaeger distributed tracing system is called Jaeger Query. It provides various features and tools to help you understand the performance and behavior of your distributed application and enables you to search, filter, and visualise the data gathered by Jaeger.

Jaeger UI/Console: Jaeger UI lets you view and analyze traces generated by your application.

Storage Back End: This is used to store the traces generated by an application for the long term. In this article, we are going to use Elasticsearch to store the traces.

What Is the Need for Integrating OpenTelemetry With Jaeger?

OpenTelemetry and Jaeger are the tools that help us in setting the observability in microservices-based distributed systems, but they are intended to address different issues.

OpenTelemetry provides an instrumentation layer for the application, which helps us generate, collect and export the telemetry data for analysis. In contrast, Jaeger is used to store and visualize telemetry data.

OpenTelemetry can only generate and collect the data. It does not have a UI for the visualization. So we need to integrate Jaeger with OpenTelemetry as it has a storage backend and a web UI for the visualization of the telemetry data. With the help of Jaeger UI, we can quickly troubleshoot microservices-based distributed systems.

Note: OpenTelemetry can generate logs, metrics, and traces. Jaeger does not support logs and metrics.

Now you have an idea about OpenTelemetry and Jaeger. Let's see how we can integrate them with each other to visualize the traces and spans generated by our application.

Implementing OpenTelemetry Auto-Instrumentation

We will integrate OpenTelemetry with Jaeger, where OpenTelemetry will act as an instrumentation layer for our application, and Jaeger will act as the back-end analysis tool to visualize the trace data.

Jaeger will get the telemetry data from the OpenTelemetry agent. It will store the data in the storage backend, from where we will query the stored data and visualize it in the Jaeger UI.

Prerequisites for this article are:

  1. The target Kubernetes cluster is up and running.
  2. You have access to run the kubectl command against the Kubernetes cluster to deploy resources.
  3. Cert manager is installed and running. You can install it from the website cert-manager.io if it is not installed.

We assume that you have all the prerequisites and now you are ready for the installation. The files we have used for this post are available in this GitHub repo.

Installation

The installation part contains three steps:

  1. Elasticsearch installation
  2. Jaeger installation
  3. OpenTelemetry installation

Elasticsearch

By default, Jaeger uses in-memory storage to store spans, which is not a recommended approach for the production environment. There are various tools available to use as a storage back end in Jaeger; you can read about them in the official documentation of Jaeger span storage back end.

In this article, we will use Elasticsearch as a storage back end. You can deploy Elasticsearch in your Kubernetes cluster using the Elasticsearch Helm chart. While deploying Elasticsearch, ensure you have enabled the password-based authentication and deploy that Elasticsearch in observability namespaces.

Elasticsearch is deployed in our Kubernetes cluster, and you can see the output by running the following command.

Shell
 
$ kubectl get all -n observability

NAME                            READY   STATUS    RESTARTS   AGE
pod/elasticsearch-0             1/1     Running   0          17m

NAME                            TYPE           CLUSTER-IP       EXTERNAL-IP          PORT(S)           AGE
service/elasticsearch         ClusterIP          None            <none>         9200/TCP,9300/TCP      17m

NAME                             READY   AGE
statefulset.apps/elasticsearch   1/1     17m

Jaeger Installation

We are going to use Jaeger to visualize the trace data. Let's deploy the Jaeger Operator on our cluster.

Before proceeding with the installation, we will deploy a ConfigMap in the observability namespace. In this ConfigMap, we will pass the username and password of the Elasticsearch which we have deployed in the previous step. Replace the credentials based on your setup.

YAML
 
kubectl -n observability apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: jaeger-configuration
  labels:
    app: jaeger
    app.kubernetes.io/name: jaeger
data:
  span-storage-type: elasticsearch
  collector: |
    es:
      server-urls: http://elasticsearch:9200
      username: elastic
      password: changeme
    collector:
      zipkin:
        http-port: 9411
  query: |
    es:
      server-urls: http://elasticsearch:9200
      username: elastic
      password: changeme
  agent: |
    collector:
      host-port: "jaeger-collector:14267"
EOF

If you are going to deploy Jaeger in another namespace and you have changed the Jaeger collector service name, then you need to change the values of the host-port value under the agent collector.

Jaeger Operator

The Jaeger Operator is a Kubernetes operator for deploying and managing Jaeger, an open source, distributed tracing system. It works by automating the deployment, scaling, and management of Jaeger components on a Kubernetes cluster. The Jaeger Operator uses custom resources and custom controllers to extend the Kubernetes API with Jaeger-specific functionality. It manages the creation, update, and deletion of Jaeger components, such as the Jaeger collector, query, and agent components. When a Jaeger instance is created, the Jaeger Operator deploys the necessary components and sets up the required services and configurations.

We are going to deploy the Jaeger Operator in the observability namespace. Use the below-mentioned command to deploy the operator.

Shell
 
$ kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.38.0/jaeger-operator.yaml -n observability

We are using the latest version of Jaeger, which is 1.38.0 at the time of writing this article.

By default, the Jaeger script is provided for cluster-wide mode. Suppose you want to watch only a particular namespace. In that case, you need to change the ClusterRole to Role and ClusterBindingRole to RoleBinding in the operator manifest and set the WATCH_NAMESPACE env variable on the Jaeger Operator deployment.

To verify whether Jaeger is deployed successfully or not, run the following command:

Shell
 
$ kubectl get all -n observability

NAME                                    READY   STATUS    RESTARTS   AGE
pod/elasticsearch-0                     1/1     Running   0          17m
pod/jaeger-operator-5597f99c79-hd9pw    2/2     Running   0          11m

NAME                                      TYPE           CLUSTER-IP       EXTERNAL-IP             PORT(S)                  AGE
service/elasticsearch                     ClusterIP      None             <none>              9200/TCP,9300/TCP            17m
service/jaeger-operator-metrics           ClusterIP      172.20.220.212   <none>                 8443/TCP                  11m
service/jaeger-operator-webhook-service   ClusterIP      172.20.224.23    <none>                 443/TCP                   11m


NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/jaeger-operator     1/1        1            1       11m

NAME                                          DESIRED   CURRENT   READY   AGE
replicaset.apps/jaeger-operator-5597f99c79       1         1        1     11m

NAME                             READY   AGE
statefulset.apps/elasticsearch   1/1     17m

As we can see in the above output, our Jaeger Operator is deployed successfully, and all of its pods are up and running; this means Jaeger Operator is ready to install the Jaeger instances (CRs). The Jaeger instance will contain Jaeger components (Query, Collector, Agent); later, we will use these components to query OpenTelemetry metrics.

Jaeger Instance

A Jaeger Instance is a deployment of the Jaeger distributed tracing system. It is used to collect and store trace data from microservices or distributed applications, and provide a UI to visualize and analyze the trace data. To deploy the Jaeger instance, use the following command.

Shell
 
$ kubectl apply -f https://raw.githubusercontent.com/infracloudio/Opentelemertrywithjaeger/master/jaeger-production-template.yaml

To verify the status of the Jaeger instance, run the following command:

Shell
 
$ kubectl get all -n observability

NAME                                    READY   STATUS    RESTARTS   AGE
pod/elasticsearch-0                     1/1     Running   0          17m
pod/jaeger-agent-27fcp                  1/1     Running   0          14s
pod/jaeger-agent-6lvp2                  1/1     Running   0          15s
pod/jaeger-collector-69d7cd5df9-t6nz9   1/1     Running   0          19s
pod/jaeger-operator-5597f99c79-hd9pw    2/2     Running   0          11m
pod/jaeger-query-6c975459b6-8xlwc       1/1     Running   0          16s

NAME                                      TYPE           CLUSTER-IP       EXTERNAL-IP             PORT(S)                                AGE
service/elasticsearch                     ClusterIP      None             <none>              9200/TCP,9300/TCP                          17m
service/jaeger-collector                  ClusterIP      172.20.24.132    <none>             14267/TCP,14268/TCP,9411/TCP,14250/TCP      19s
service/jaeger-operator-metrics           ClusterIP      172.20.220.212   <none>                    8443/TCP                             11m
service/jaeger-operator-webhook-service   ClusterIP      172.20.224.23    <none>                    443/TCP                              11m
service/jaeger-query                      LoadBalancer   172.20.74.114    a567a8de8fd5149409c7edeb54bd39ef-365075103.us-west-2.elb.amazonaws.com   80:32406/TCP                   16s
service/zipkin                            ClusterIP      172.20.61.72     <none>                 9411/TCP                                 18s

NAME                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/jaeger-agent      2         2         2       2            2           <none>       16s

NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/jaeger-collector    1/1     1            1          21s
deployment.apps/jaeger-operator     1/1     1            1          11m
deployment.apps/jaeger-query        1/1     1            1          18s

NAME                                          DESIRED   CURRENT   READY   AGE
replicaset.apps/jaeger-collector-69d7cd5df9     1         1         1     21s
replicaset.apps/jaeger-operator-5597f99c79      1         1         1     11m
replicaset.apps/jaeger-query-6c975459b6         1         1         1     18s

NAME                             READY   AGE
statefulset.apps/elasticsearch    1/1    17m

As we can see in the above screenshot, our Jaeger instance is up and running.

OpenTelemetry

To install the OpenTelemetry, we need to install the OpenTelemetry Operator. The OpenTelemetry Operator uses custom resources and custom controllers to extend the Kubernetes API with OpenTelemetry-specific functionality, making it easier to deploy and manage the OpenTelemetry observability stack in a Kubernetes environment.

The operator manages two things:

OpenTelemetry Operator

To implement the auto-instrumentation, we need to deploy the OpenTelemetry operator on our Kubernetes cluster. To deploy the k8s operator for OpenTelemetry, follow the K8s operator documentation.

You can verify the deployment of the OpenTelemetry operator by running the below-mentioned command:

Shell
 
$ kubectl get all -n  opentelemetry-operator-system

NAME                                                             READY   STATUS    RESTARTS    AGE
pod/opentelemetry-operator-controller-manager-7f479c786d-zzfd8    2/2    Running      0        30s

NAME                                                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/opentelemetry-operator-controller-manager-metrics-service   ClusterIP   172.20.70.244    <none>        8443/TCP   32s
service/opentelemetry-operator-webhook-service                      ClusterIP   172.20.150.120   <none>        443/TCP    31s

NAME                                                        READY   UP-TO-DATE   AVAILABLE      AGE
deployment.apps/opentelemetry-operator-controller-manager    1/1        1            1          31s

NAME                                                                   DESIRED   CURRENT   READY    AGE
replicaset.apps/opentelemetry-operator-controller-manager-7f479c786d      1         1        1      31s

As we can see in the above output, the opentelemetry-operator-controller-manager deployment is running in the opentelemetry-operator-system namespace.

OpenTelemetry Collector

The OpenTelemetry facilitates the collection of telemetry data via the OpenTelemetry Collector. Collector offers a vendor-agnostic implementation on how to receive, process, and export the telemetry data.

The collector is made up of the following components:

We also need to export the telemetry data to the Jaeger instance. Use the following manifest to deploy the collector.

YAML
 
kubectl apply -f - <<EOF
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel
spec:
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:

    processors:

    exporters:
      logging:
      jaeger:
        endpoint: "jaeger-collector.observability.svc.cluster.local:14250"
        tls:
          insecure: true
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: []
          exporters: [logging, jaeger]
EOF

In the above code, the Jaeger endpoint is the address of the Jaeger service which is running inside the observability namespace.

We need to deploy this manifest in the same namespace where our application is deployed, so that it can fetch the traces from the application and export them to Jaeger.

To verify the deployment of the collector, run the following command.

Shell
 
$ kubectl get deploy otel-collector

NAME              READY    UP-TO-DATE   AVAILABLE      AGE
otel-collector     1/1         1            1          41s

OpenTelemetry Auto-Instrumentation Injection

The above-deployed operator can inject and configure the auto-instrumentation libraries of OpenTelemetry into an application's codebase as it runs. To enable the auto-instrumentation on our cluster, we need to configure an instrumentation resource with the configuration for the SDK and instrumentation.

Use the below-given manifest to create the auto-instrumentation.

YAML
 
kubectl apply -f - <<EOF
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: my-instrumentation
spec:
  exporter:
    endpoint: http://otel-collector:4317
  propagators:
    - tracecontext
    - baggage
    - b3
  sampler:
    type: parentbased_traceidratio
    argument: "0.25"
EOF

In the above manifest, we have used three things: exporter, propagator, and sampler.

  1. Exporter: Used to send data to OpenTelemetry collector at the specified endpoint. In our scenario, it is "http://otel-collector:4317".

  2. Propagators: Carry traces, context, and baggage data between distributed tracing systems. They have three propagation mechanisms:

    • tracecontext: This refers to the W3C Trace Context specification, which defines a standard way to propagate trace context information between services.

    • baggage: This refers to the OpenTelemetry baggage mechanism, which allows for the propagation of arbitrary key-value pairs along with the trace context information.

    • b3: This refers to the B3 header format, which is a popular trace context propagation format used by the Zipkin tracing system.

  3. Sampler: Uses a "parent-based trace ID ratio" strategy with a sample rate of 0.25 (25%). This means that when tracing a request, if any of its parent requests has already been sampled (with a probability of 0.25), then this request will also be sampled, otherwise it will not be traced.

To verify that our custom resource is created or not, we can use the below-mentioned command.

Shell
 
$ kubectl get otelinst

NAME                   AGE          ENDPOINT                          SAMPLER                 SAMPLER ARG
my-instrumentation      6s     http://otel-collector:4317       parentbased_traceidratio          0.25

This means our custom resource is created successfully.

We are using the OpenTelemetry auto-instrumented method, so we don’t need to write instrumentation code in our application. All we need to do is, add an annotation in the pod of our application for auto-instrumentation.

As we are going to demo a Java application, the annotation that we will have to use here is:

Shell
 
instrumentation.opentelemetry.io/inject-java: "true"

Note: The annotation can be added to a namespace as well so that all pods within that namespace will get instrumentation, or by adding the annotation to individual PodSpec objects, available as part of Deployment, Statefulset, and other resources.

Below is an example of how your manifest will look after adding the annotations. In the below example, we are using annotation for a Java application.

YAML
 
apiVersion: apps/v1
kind: Deployment
metadata:
 name: demo-sagar
spec:
 replicas: 1
 selector:
   matchLabels:
     app: demo-sagar
 template:
   metadata:
     labels:
       app: demo-sagar
     annotations:
       instrumentation.opentelemetry.io/inject-java: "true"
       instrumentation.opentelemetry.io/container-names: "spring"
   spec:
     containers:
     - name: spring
       image: sagar27/petclinic-demo
       ports:
       - containerPort: 8080

We have added instrumentation “inject-java” and “container-name” under annotations. If you have multiple container pods, you can add them in the same “container-names” annotation, separated by a comma. For example, “container-name1,container-name-2,container-name-3” etc.

After adding the annotations, deploy your application and access it on the browser. Here in our scenario, we are using port-forward to access the application.

Shell
 
$ kubectl port-forward service/demo-sagar  8080:8080

To generate traces, either you can navigate through all the pages of this website or you can use the following Bash script:

Shell
 
while true;
do
  curl http://localhost:8080/
  curl http://localhost:8080/owners/find
  curl http://localhost:8080/owners?lastName=
  curl http://localhost:8080/vets.html
  curl http://localhost:8080/oups
  curl http://localhost:8080/oups
  sleep 0.01
done

The above-given script will make a curl request to all the pages of the website, and we will see the traces of the request on the Jaeger UI. We are making curl requests to https://localhost:8080 because we use the port-forwarding technique to access the application. You can make changes in the Bash script according to your scenario.

Now let’s access the Jaeger UI, as our service jaeger-query uses service type LoadBalancer, we can access the Jaeger UI on the browser by using the load balancer domain/IP.

Paste the load balancer domain/IP on the browser and you will see the Jaeger UI there. We have to select our app from the service list and it will show us the traces it generates.

In the above screenshot, we have selected our app name “demo-sagar” under the services option and its traces are visible on Jaeger UI. We can further click on the traces to get more details about it.

Summary

In this article, we have gone through how you can easily instrument your application using the OpenTelemetry auto-instrumentation method. We also learned how this telemetric data could be exported to the Elasticsearch backend and visualized it using Jaeger.

Integrating OpenTelemetry with Jaeger will help you in monitoring and troubleshooting. It also helps perform root cause analysis of any bug/issues in your microservice-based distributed systems, performance/latency optimization, service dependency analysis, and so on.

We hope you found this post informative and engaging. 

References

 

 

 

 

Top