High Availability Kubernetes Monitoring Using Prometheus and Thanos

Introduction

The Need for Prometheus High Availability

Kubernetes adoption has grown multifold in the past few months and it is now clear that Kubernetes is the defacto for container orchestration. That being said, Prometheus is also considered an excellent choice for monitoring both containerized and non-containerized workloads. Monitoring is an essential aspect of any infrastructure, and we should make sure that our monitoring set-up is highly-available and highly-scalable in order to match the needs of an ever growing infrastructure, especially in the case of Kubernetes.

Therefore, today we will deploy a clustered Prometheus set-up which is not only resilient to node failures, but also ensures appropriate data archiving for future references. Our set-up is also very scalable, to the extent that we can span multiple Kubernetes clusters under the same monitoring umbrella.

Present Scenario

Majority of Prometheus deployments use persistent volume for pods, while Prometheus is scaled using a federated set-up. However, not all data can be aggregated using a federated mechanism, where you often need a mechanism to manage Prometheus configuration when you add additional servers.

The Solution

Thanos aims at solving the above problems. With the help of Thanos, we can not only multiply instances of Prometheus and de-duplicate data across them, but also archive data in a long term storage such as GCS or S3.

Implementation

Thanos Architecture

High Availability Kubernetes Monitoring Using Prometheus and Thanos

Image Source: https://thanos.io/quick-tutorial.md/

Thanos consists of the following components:

Run-time Deduplication of HA Groups

Prometheus is stateful and does not allow replicating its database. This means that increasing high-availability by running multiple Prometheus replicas are not very easy to use. Simple load balancing will not work, as for example after some crash, a replica might be up but querying such replica will result in a small gap during the period it was down. You have a second replica that maybe was up, but it could be down in another moment (e.g rolling restart), so load balancing on top of those will not work well.


Configuration

Prerequisite

In order to completely understand this tutorial, the following are needed:

  1. Working knowledge of Kubernetes and using kubectl
  2. A running Kubernetes cluster with at least 3 nodes (for the purpose of this demo a GKE cluster is being used)
  3. Implementing Ingress Controller and ingress objects (for the purpose of this demo Nginx Ingress Controller is being used). Although this is not mandatory but it is highly recommended inorder to decrease the number of external endpoints created.
  4. Creating credentials to be used by Thanos components to access object store (in this case GCS bucket)
  5. Create 2 GCS buckets and name them as prometheus-long-term and thanos-ruler
  6. Create a service account with the role as Storage Object Admin
  7. Download the key file as json credentials and name it as thanos-gcs-credentials.json
  8. Create kubernetes secret using the credentials
    kubectl create secret generic thanos-gcs-credentials --from-file=thanos-gcs-credentials.json -n monitoring

Deploying Various Components

Deploying Prometheus Services Accounts, Clusterrole and Clusterrolebinding

Shell
 




x
45


 
1
apiVersion: v1
2
kind: Namespace
3
metadata:
4
  name: monitoring
5
---
6
apiVersion: v1
7
kind: ServiceAccount
8
metadata:
9
  name: monitoring
10
  namespace: monitoring
11
---
12
apiVersion: rbac.authorization.k8s.io/v1beta1
13
kind: ClusterRole
14
metadata:
15
  name: monitoring
16
  namespace: monitoring
17
rules:
18
- apiGroups: [""]
19
  resources:
20
  - nodes
21
  - nodes/proxy
22
  - services
23
  - endpoints
24
  - pods
25
  verbs: ["get", "list", "watch"]
26
- apiGroups: [""]
27
  resources:
28
  - configmaps
29
  verbs: ["get"]
30
- nonResourceURLs: ["/metrics"]
31
  verbs: ["get"]
32
---
33
apiVersion: rbac.authorization.k8s.io/v1beta1
34
kind: ClusterRoleBinding
35
metadata:
36
  name: monitoring
37
subjects:
38
  - kind: ServiceAccount
39
    name: monitoring
40
    namespace: monitoring
41
roleRef:
42
  kind: ClusterRole
43
  Name: monitoring
44
  apiGroup: rbac.authorization.k8s.io
45
---



The above manifest creates the monitoring namespace and service accounts, clusterrole and clusterrolebinding needed by Prometheus.

Deploying Prometheus Configuration configmap

Shell
 




xxxxxxxxxx
1
138


 
1
apiVersion: v1
2
kind: ConfigMap
3
metadata:
4
  name: prometheus-server-conf
5
  labels:
6
    name: prometheus-server-conf
7
  namespace: monitoring
8
data:
9
  prometheus.yaml.tmpl: |-
10
    global:
11
      scrape_interval: 5s
12
      evaluation_interval: 5s
13
      external_labels:
14
        cluster: prometheus-ha
15
        # Each Prometheus has to have unique labels.
16
        replica: $(POD_NAME)
17
 
          
18
    rule_files:
19
      - /etc/prometheus/rules/*rules.yaml
20
 
          
21
    alerting:
22
 
          
23
      # We want our alerts to be deduplicated
24
      # from different replicas.
25
      alert_relabel_configs:
26
      - regex: replica
27
        action: labeldrop
28
 
          
29
      alertmanagers:
30
        - scheme: http
31
          path_prefix: /
32
          static_configs:
33
            - targets: ['alertmanager:9093']
34
 
          
35
    scrape_configs:
36
    - job_name: kubernetes-nodes-cadvisor
37
      scrape_interval: 10s
38
      scrape_timeout: 10s
39
      scheme: https
40
      tls_config:
41
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
42
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
43
      kubernetes_sd_configs:
44
        - role: node
45
      relabel_configs:
46
        - action: labelmap
47
          regex: __meta_kubernetes_node_label_(.+)
48
        # Only for Kubernetes ^1.7.3.
49
        # See: https://github.com/prometheus/prometheus/issues/2916
50
        - target_label: __address__
51
          replacement: kubernetes.default.svc:443
52
        - source_labels: [__meta_kubernetes_node_name]
53
          regex: (.+)
54
          target_label: __metrics_path__
55
          replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
56
      metric_relabel_configs:
57
        - action: replace
58
          source_labels: [id]
59
          regex: '^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$'
60
          target_label: rkt_container_name
61
          replacement: '${2}-${1}'
62
        - action: replace
63
          source_labels: [id]
64
          regex: '^/system\.slice/(.+)\.service$'
65
          target_label: systemd_service_name
66
          replacement: '${1}'
67
 
          
68
    - job_name: 'kubernetes-pods'
69
      kubernetes_sd_configs:
70
        - role: pod
71
      relabel_configs:
72
        - action: labelmap
73
          regex: __meta_kubernetes_pod_label_(.+)
74
        - source_labels: [__meta_kubernetes_namespace]
75
          action: replace
76
          target_label: kubernetes_namespace
77
        - source_labels: [__meta_kubernetes_pod_name]
78
          action: replace
79
          target_label: kubernetes_pod_name
80
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
81
          action: keep
82
          regex: true
83
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
84
          action: replace
85
          target_label: __scheme__
86
          regex: (https?)
87
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
88
          action: replace
89
          target_label: __metrics_path__
90
          regex: (.+)
91
        - source_labels: [__address__, __meta_kubernetes_pod_prometheus_io_port]
92
          action: replace
93
          target_label: __address__
94
          regex: ([^:]+)(?::\d+)?;(\d+)
95
          replacement: $1:$2
96
 
          
97
 
          
98
    - job_name: 'kubernetes-apiservers'
99
      kubernetes_sd_configs:
100
        - role: endpoints
101
      scheme: https 
102
      tls_config:
103
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
104
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
105
      relabel_configs:
106
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
107
          action: keep
108
          regex: default;kubernetes;https
109
 
          
110
    - job_name: 'kubernetes-service-endpoints'
111
      kubernetes_sd_configs:
112
        - role: endpoints
113
      relabel_configs:
114
        - action: labelmap
115
          regex: __meta_kubernetes_service_label_(.+)
116
        - source_labels: [__meta_kubernetes_namespace]
117
          action: replace
118
          target_label: kubernetes_namespace
119
        - source_labels: [__meta_kubernetes_service_name]
120
          action: replace
121
          target_label: kubernetes_name
122
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
123
          action: keep
124
          regex: true
125
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
126
          action: replace
127
          target_label: __scheme__
128
          regex: (https?)
129
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
130
          action: replace
131
          target_label: __metrics_path__
132
          regex: (.+)
133
        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
134
          action: replace
135
          target_label: __address__
136
          regex: (.+)(?::\d+);(\d+)
137
          replacement: $1:$2
138
 
          



The above Configmap creates Prometheus configuration file template. This configuration file template will be read by the Thanos sidecar component and it will generate the actual configuration file, which will in turn be consumed by the Prometheus container running in the same pod. It is extremely important to add the external_labels section in the config file so that the Querier can deduplicate data based on that.

Deploying Prometheus Rules configmap
This will create our alert rules which will be relayed to alertmanager for delivery

Shell
 




xxxxxxxxxx
1
97


 
1
apiVersion: v1
2
kind: ConfigMap
3
metadata:
4
  name: prometheus-rules
5
  labels:
6
    name: prometheus-rules
7
  namespace: monitoring
8
data:
9
  alert-rules.yaml: |-
10
    groups:
11
      - name: Deployment
12
        rules:
13
        - alert: Deployment at 0 Replicas
14
          annotations:
15
            summary: Deployment {{$labels.deployment}} in {{$labels.namespace}} is currently having no pods running
16
          expr: |
17
            sum(kube_deployment_status_replicas{pod_template_hash=""}) by (deployment,namespace)  < 1
18
          for: 1m
19
          labels:
20
            team: devops
21
 
          
22
        - alert: HPA Scaling Limited  
23
          annotations: 
24
            summary: HPA named {{$labels.hpa}} in {{$labels.namespace}} namespace has reached scaling limited state
25
          expr: | 
26
            (sum(kube_hpa_status_condition{condition="ScalingLimited",status="true"}) by (hpa,namespace)) == 1
27
          for: 1m
28
          labels: 
29
            team: devops
30
 
          
31
        - alert: HPA at MaxCapacity 
32
          annotations: 
33
            summary: HPA named {{$labels.hpa}} in {{$labels.namespace}} namespace is running at Max Capacity
34
          expr: | 
35
            ((sum(kube_hpa_spec_max_replicas) by (hpa,namespace)) - (sum(kube_hpa_status_current_replicas) by (hpa,namespace))) == 0
36
          for: 1m
37
          labels: 
38
            team: devops
39
 
          
40
      - name: Pods
41
        rules:
42
        - alert: Container restarted
43
          annotations:
44
            summary: Container named {{$labels.container}} in {{$labels.pod}} in {{$labels.namespace}} was restarted
45
          expr: |
46
            sum(increase(kube_pod_container_status_restarts_total{namespace!="kube-system",pod_template_hash=""}[1m])) by (pod,namespace,container) > 0
47
          for: 0m
48
          labels:
49
            team: dev
50
 
          
51
        - alert: High Memory Usage of Container 
52
          annotations: 
53
            summary: Container named {{$labels.container}} in {{$labels.pod}} in {{$labels.namespace}} is using more than 75% of Memory Limit
54
          expr: | 
55
            ((( sum(container_memory_usage_bytes{image!="",container_name!="POD", namespace!="kube-system"}) by (namespace,container_name,pod_name)  / sum(container_spec_memory_limit_bytes{image!="",container_name!="POD",namespace!="kube-system"}) by (namespace,container_name,pod_name) ) * 100 ) < +Inf ) > 75
56
          for: 5m
57
          labels: 
58
            team: dev
59
 
          
60
        - alert: High CPU Usage of Container 
61
          annotations: 
62
            summary: Container named {{$labels.container}} in {{$labels.pod}} in {{$labels.namespace}} is using more than 75% of CPU Limit
63
          expr: | 
64
            ((sum(irate(container_cpu_usage_seconds_total{image!="",container_name!="POD", namespace!="kube-system"}[30s])) by (namespace,container_name,pod_name) / sum(container_spec_cpu_quota{image!="",container_name!="POD", namespace!="kube-system"} / container_spec_cpu_period{image!="",container_name!="POD", namespace!="kube-system"}) by (namespace,container_name,pod_name) ) * 100)  > 75
65
          for: 5m
66
          labels: 
67
            team: dev
68
 
          
69
      - name: Nodes
70
        rules:
71
        - alert: High Node Memory Usage
72
          annotations:
73
            summary: Node {{$labels.kubernetes_io_hostname}} has more than 80% memory used. Plan Capcity
74
          expr: |
75
            (sum (container_memory_working_set_bytes{id="/",container_name!="POD"}) by (kubernetes_io_hostname) / sum (machine_memory_bytes{}) by (kubernetes_io_hostname) * 100) > 80
76
          for: 5m
77
          labels:
78
            team: devops
79
 
          
80
        - alert: High Node CPU Usage
81
          annotations:
82
            summary: Node {{$labels.kubernetes_io_hostname}} has more than 80% allocatable cpu used. Plan Capacity.
83
          expr: |
84
            (sum(rate(container_cpu_usage_seconds_total{id="/", container_name!="POD"}[1m])) by (kubernetes_io_hostname) / sum(machine_cpu_cores) by (kubernetes_io_hostname)  * 100) > 80
85
          for: 5m
86
          labels:
87
            team: devops
88
 
          
89
        - alert: High Node Disk Usage
90
          annotations:
91
            summary: Node {{$labels.kubernetes_io_hostname}} has more than 85% disk used. Plan Capacity.
92
          expr: |
93
            (sum(container_fs_usage_bytes{device=~"^/dev/[sv]d[a-z][1-9]$",id="/",container_name!="POD"}) by (kubernetes_io_hostname) / sum(container_fs_limit_bytes{container_name!="POD",device=~"^/dev/[sv]d[a-z][1-9]$",id="/"}) by (kubernetes_io_hostname)) * 100 > 85
94
          for: 5m
95
          labels:
96
            team: devops
97
 
          



Deploying Prometheus Stateful Set

Shell
 




xxxxxxxxxx
1
113


 
1
apiVersion: storage.k8s.io/v1beta1
2
kind: StorageClass
3
metadata:
4
  name: fast
5
  namespace: monitoring
6
provisioner: kubernetes.io/gce-pd
7
allowVolumeExpansion: true
8
---
9
apiVersion: apps/v1beta1
10
kind: StatefulSet
11
metadata:
12
  name: prometheus
13
  namespace: monitoring
14
spec:
15
  replicas: 3
16
  serviceName: prometheus-service
17
  template:
18
    metadata:
19
      labels:
20
        app: prometheus
21
        thanos-store-api: "true"
22
    spec:
23
      serviceAccountName: monitoring
24
      containers:
25
        - name: prometheus
26
          image: prom/prometheus:v2.4.3
27
          args:
28
            - "--config.file=/etc/prometheus-shared/prometheus.yaml"
29
            - "--storage.tsdb.path=/prometheus/"
30
            - "--web.enable-lifecycle"
31
            - "--storage.tsdb.no-lockfile"
32
            - "--storage.tsdb.min-block-duration=2h"
33
            - "--storage.tsdb.max-block-duration=2h"
34
          ports:
35
            - name: prometheus
36
              containerPort: 9090
37
          volumeMounts:
38
            - name: prometheus-storage
39
              mountPath: /prometheus/
40
            - name: prometheus-config-shared
41
              mountPath: /etc/prometheus-shared/
42
            - name: prometheus-rules
43
              mountPath: /etc/prometheus/rules
44
        - name: thanos
45
          image: quay.io/thanos/thanos:v0.8.0
46
          args:
47
            - "sidecar"
48
            - "--log.level=debug"
49
            - "--tsdb.path=/prometheus"
50
            - "--prometheus.url=http://127.0.0.1:9090"
51
            - "--objstore.config={type: GCS, config: {bucket: prometheus-long-term}}"
52
            - "--reloader.config-file=/etc/prometheus/prometheus.yaml.tmpl"
53
            - "--reloader.config-envsubst-file=/etc/prometheus-shared/prometheus.yaml"
54
            - "--reloader.rule-dir=/etc/prometheus/rules/"
55
          env:
56
            - name: POD_NAME
57
              valueFrom:
58
                fieldRef:
59
                  fieldPath: metadata.name
60
            - name : GOOGLE_APPLICATION_CREDENTIALS
61
              value: /etc/secret/thanos-gcs-credentials.json
62
          ports:
63
            - name: http-sidecar
64
              containerPort: 10902
65
            - name: grpc
66
              containerPort: 10901
67
          livenessProbe:
68
              httpGet:
69
                port: 10902
70
                path: /-/healthy
71
          readinessProbe:
72
            httpGet:
73
              port: 10902
74
              path: /-/ready
75
          volumeMounts:
76
            - name: prometheus-storage
77
              mountPath: /prometheus
78
            - name: prometheus-config-shared
79
              mountPath: /etc/prometheus-shared/
80
            - name: prometheus-config
81
              mountPath: /etc/prometheus
82
            - name: prometheus-rules
83
              mountPath: /etc/prometheus/rules
84
            - name: thanos-gcs-credentials
85
              mountPath: /etc/secret
86
              readOnly: false
87
      securityContext:
88
        fsGroup: 2000
89
        runAsNonRoot: true
90
        runAsUser: 1000
91
      volumes:
92
        - name: prometheus-config
93
          configMap:
94
            defaultMode: 420
95
            name: prometheus-server-conf
96
        - name: prometheus-config-shared
97
          emptyDir: {}
98
        - name: prometheus-rules
99
          configMap:
100
            name: prometheus-rules
101
        - name: thanos-gcs-credentials
102
          secret:
103
            secretName: thanos-gcs-credentials
104
  volumeClaimTemplates:
105
  - metadata:
106
      name: prometheus-storage
107
      namespace: monitoring
108
    spec:
109
      accessModes: [ "ReadWriteOnce" ]
110
      storageClassName: fast
111
      resources:
112
        requests:
113
          storage: 20Gi



It is important to understand the following about the manifest provided above:

  1. Prometheus is deployed as a stateful set with 3 replicas and each replica provisions its own persistent volume dynamically.
  2. Prometheus configuration is generated by the Thanos sidecar container using the template file we created above.
  3. Thanos handles data compaction and therefore we need to set --storage.tsdb.min-block-duration=2h and --storage.tsdb.max-block-duration=2h
  4. Prometheus stateful set is labelled as thanos-store-api: true so that each pod gets discovered by the headless service, which we will create next. It is this headless service which will be used by the Thanos Querier to query data across all Prometheus instances. We also apply the same label to the Thanos Store and Thanos Ruler component so that they are also discovered by the Querier and can be used for querying metrics.
  5. GCS bucket credentials path is provided using the GOOGLE_APPLICATION_CREDENTIALS environment variable, and the configuration file is mounted to it from the secret which we created as a part of prerequisites.

Deploying Prometheus Services

Shell
 




xxxxxxxxxx
1
69


 
1
apiVersion: v1
2
kind: Service
3
metadata: 
4
  name: prometheus-0-service
5
  annotations: 
6
    prometheus.io/scrape: "true"
7
    prometheus.io/port: "9090"
8
  namespace: monitoring
9
  labels:
10
    name: prometheus
11
spec:
12
  selector: 
13
    statefulset.kubernetes.io/pod-name: prometheus-0
14
  ports: 
15
    - name: prometheus 
16
      port: 8080
17
      targetPort: prometheus
18
---
19
apiVersion: v1
20
kind: Service
21
metadata: 
22
  name: prometheus-1-service
23
  annotations: 
24
    prometheus.io/scrape: "true"
25
    prometheus.io/port: "9090"
26
  namespace: monitoring
27
  labels:
28
    name: prometheus
29
spec:
30
  selector: 
31
    statefulset.kubernetes.io/pod-name: prometheus-1
32
  ports: 
33
    - name: prometheus 
34
      port: 8080
35
      targetPort: prometheus
36
---
37
apiVersion: v1
38
kind: Service
39
metadata: 
40
  name: prometheus-2-service
41
  annotations: 
42
    prometheus.io/scrape: "true"
43
    prometheus.io/port: "9090"
44
  namespace: monitoring
45
  labels:
46
    name: prometheus
47
spec:
48
  selector: 
49
    statefulset.kubernetes.io/pod-name: prometheus-2
50
  ports: 
51
    - name: prometheus 
52
      port: 8080
53
      targetPort: prometheus
54
---
55
#This service creates a srv record for querier to find about store-api's
56
apiVersion: v1
57
kind: Service
58
metadata:
59
  name: thanos-store-gateway
60
  namespace: monitoring
61
spec:
62
  type: ClusterIP
63
  clusterIP: None
64
  ports:
65
    - name: grpc
66
      port: 10901
67
      targetPort: grpc
68
  selector:
69
    thanos-store-api: "true"



We create different services for each Prometheus pod in the stateful set, although it is not needed. These are created only for debugging purposes. The purpose of thanos-store-gateway headless service has been explained above. We will later expose Prometheus services using an ingress object.

Deploying Thanos Querier

Shell
 




xxxxxxxxxx
1
59


1
apiVersion: v1
2
kind: Namespace
3
metadata:
4
  name: monitoring
5
---
6
apiVersion: apps/v1
7
kind: Deployment
8
metadata:
9
  name: thanos-querier
10
  namespace: monitoring
11
  labels:
12
    app: thanos-querier
13
spec:
14
  replicas: 1
15
  selector:
16
    matchLabels:
17
      app: thanos-querier
18
  template:
19
    metadata:
20
      labels:
21
        app: thanos-querier
22
    spec:
23
      containers:
24
      - name: thanos
25
        image: quay.io/thanos/thanos:v0.8.0
26
        args:
27
        - query
28
        - --log.level=debug
29
        - --query.replica-label=replica
30
        - --store=dnssrv+thanos-store-gateway:10901
31
        ports:
32
        - name: http
33
          containerPort: 10902
34
        - name: grpc
35
          containerPort: 10901
36
        livenessProbe:
37
          httpGet:
38
            port: http
39
            path: /-/healthy
40
        readinessProbe:
41
          httpGet:
42
            port: http
43
            path: /-/ready
44
---
45
apiVersion: v1
46
kind: Service
47
metadata:
48
  labels:
49
    app: thanos-querier
50
  name: thanos-querier
51
  namespace: monitoring
52
spec:
53
  ports:
54
  - port: 9090
55
    protocol: TCP
56
    targetPort: http
57
    name: http
58
  selector:
59
    app: thanos-querier



This is one of the main components of Thanos deployment. Note the following:

  1. The container argument --store=dnssrv+thanos-store-gateway:10901 helps to discover all components from which metric data should be queried.
  2. The service thanos-querier provided a web interface to run PromQL queries. It also has the option to de-duplicate data across various Prometheus clusters.
  3. This is the end point where we provide Grafana as a datasource for all dashboards.

Deploying Thanos Store Gateway

Shell
 




xxxxxxxxxx
1
59


 
1
apiVersion: v1
2
kind: Namespace
3
metadata:
4
  name: monitoring
5
---
6
apiVersion: apps/v1beta1
7
kind: StatefulSet
8
metadata:
9
  name: thanos-store-gateway
10
  namespace: monitoring
11
  labels:
12
    app: thanos-store-gateway
13
spec:
14
  replicas: 1
15
  selector:
16
    matchLabels:
17
      app: thanos-store-gateway
18
  serviceName: thanos-store-gateway
19
  template:
20
    metadata:
21
      labels:
22
        app: thanos-store-gateway
23
        thanos-store-api: "true"
24
    spec:
25
      containers:
26
        - name: thanos
27
          image: quay.io/thanos/thanos:v0.8.0
28
          args:
29
          - "store"
30
          - "--log.level=debug"
31
          - "--data-dir=/data"
32
          - "--objstore.config={type: GCS, config: {bucket: prometheus-long-term}}"
33
          - "--index-cache-size=500MB"
34
          - "--chunk-pool-size=500MB"
35
          env:
36
            - name : GOOGLE_APPLICATION_CREDENTIALS
37
              value: /etc/secret/thanos-gcs-credentials.json
38
          ports:
39
          - name: http
40
            containerPort: 10902
41
          - name: grpc
42
            containerPort: 10901
43
          livenessProbe:
44
            httpGet:
45
              port: 10902
46
              path: /-/healthy
47
          readinessProbe:
48
            httpGet:
49
              port: 10902
50
              path: /-/ready
51
          volumeMounts:
52
            - name: thanos-gcs-credentials
53
              mountPath: /etc/secret
54
              readOnly: false
55
      volumes:
56
        - name: thanos-gcs-credentials
57
          secret:
58
            secretName: thanos-gcs-credentials
59
---



This will create the store component which serves metrics from object storage to the Querier.

Deploying Thanos Ruler

Shell
 




xxxxxxxxxx
1
106


 
1
apiVersion: v1
2
kind: Namespace
3
metadata:
4
  name: monitoring
5
---
6
apiVersion: v1
7
kind: ConfigMap
8
metadata:
9
  name: thanos-ruler-rules
10
  namespace: monitoring
11
data:
12
  alert_down_services.rules.yaml: |
13
    groups:
14
    - name: metamonitoring
15
      rules:
16
      - alert: PrometheusReplicaDown
17
        annotations:
18
          message: Prometheus replica in cluster {{$labels.cluster}} has disappeared from Prometheus target discovery.
19
        expr: |
20
          sum(up{cluster="prometheus-ha", instance=~".*:9090", job="kubernetes-service-endpoints"}) by (job,cluster) < 3
21
        for: 15s
22
        labels:
23
          severity: critical
24
---
25
apiVersion: apps/v1beta1
26
kind: StatefulSet
27
metadata:
28
  labels:
29
    app: thanos-ruler
30
  name: thanos-ruler
31
  namespace: monitoring
32
spec:
33
  replicas: 1
34
  selector:
35
    matchLabels:
36
      app: thanos-ruler
37
  serviceName: thanos-ruler
38
  template:
39
    metadata:
40
      labels:
41
        app: thanos-ruler
42
        thanos-store-api: "true"
43
    spec:
44
      containers:
45
        - name: thanos
46
          image: quay.io/thanos/thanos:v0.8.0
47
          args:
48
            - rule
49
            - --log.level=debug
50
            - --data-dir=/data
51
            - --eval-interval=15s
52
            - --rule-file=/etc/thanos-ruler/*.rules.yaml
53
            - --alertmanagers.url=http://alertmanager:9093
54
            - --query=thanos-querier:9090
55
            - "--objstore.config={type: GCS, config: {bucket: thanos-ruler}}"
56
            - --label=ruler_cluster="prometheus-ha"
57
            - --label=replica="$(POD_NAME)"
58
          env:
59
            - name : GOOGLE_APPLICATION_CREDENTIALS
60
              value: /etc/secret/thanos-gcs-credentials.json
61
            - name: POD_NAME
62
              valueFrom:
63
                fieldRef:
64
                  fieldPath: metadata.name
65
          ports:
66
            - name: http
67
              containerPort: 10902
68
            - name: grpc
69
              containerPort: 10901
70
          livenessProbe:
71
            httpGet:
72
              port: http
73
              path: /-/healthy
74
          readinessProbe:
75
            httpGet:
76
              port: http
77
              path: /-/ready
78
          volumeMounts:
79
            - mountPath: /etc/thanos-ruler
80
              name: config
81
            - name: thanos-gcs-credentials
82
              mountPath: /etc/secret
83
              readOnly: false
84
      volumes:
85
        - configMap:
86
            name: thanos-ruler-rules
87
          name: config
88
        - name: thanos-gcs-credentials
89
          secret:
90
            secretName: thanos-gcs-credentials
91
---
92
apiVersion: v1
93
kind: Service
94
metadata:
95
  labels:
96
    app: thanos-ruler
97
  name: thanos-ruler
98
  namespace: monitoring
99
spec:
100
  ports:
101
    - port: 9090
102
      protocol: TCP
103
      targetPort: http
104
      name: http
105
  selector:
106
    app: thanos-ruler



Now if you fire-up on interactive shell in the same namespace as our workloads, and try to see to which all pods does our thanos-store-gateway resolves, you will see something like this: 

Shell
 




xxxxxxxxxx
1
16


1
root@my-shell-95cb5df57-4q6w8:/# nslookup thanos-store-gateway
2
Server:  10.63.240.10
3
Address: 10.63.240.10#53
4
 
          
5
Name: thanos-store-gateway.monitoring.svc.cluster.local
6
Address: 10.60.25.2
7
Name: thanos-store-gateway.monitoring.svc.cluster.local
8
Address: 10.60.25.4
9
Name: thanos-store-gateway.monitoring.svc.cluster.local
10
Address: 10.60.30.2
11
Name: thanos-store-gateway.monitoring.svc.cluster.local
12
Address: 10.60.30.8
13
Name: thanos-store-gateway.monitoring.svc.cluster.local
14
Address: 10.60.31.2
15
 
          
16
root@my-shell-95cb5df57-4q6w8:/# exit



The IP’s returned above correspond to our Prometheus pods, thanos-store and thanos-ruler. This can be verified as 

Shell
 




xxxxxxxxxx
1


1
$ kubectl get pods -o wide -l thanos-store-api="true"
2
NAME                     READY   STATUS    RESTARTS   AGE    IP           NODE                              NOMINATED NODE   READINESS GATES
3
prometheus-0             2/2     Running   0          100m   10.60.31.2   gke-demo-1-pool-1-649cbe02-jdnv   <none>           <none>
4
prometheus-1             2/2     Running   0          14h    10.60.30.2   gke-demo-1-pool-1-7533d618-kxkd   <none>           <none>
5
prometheus-2             2/2     Running   0          31h    10.60.25.2   gke-demo-1-pool-1-4e9889dd-27gc   <none>           <none>
6
thanos-ruler-0           1/1     Running   0          100m   10.60.30.8   gke-demo-1-pool-1-7533d618-kxkd   <none>           <none>
7
thanos-store-gateway-0   1/1     Running   0          14h    10.60.25.4   gke-demo-1-pool-1-4e9889dd-27gc   <none>           <none>



Deploying Alertmanager

Shell
 




xxxxxxxxxx
1
118


 
1
apiVersion: v1
2
kind: Namespace
3
metadata:
4
  name: monitoring
5
---
6
kind: ConfigMap
7
apiVersion: v1
8
metadata:
9
  name: alertmanager
10
  namespace: monitoring
11
data:
12
  config.yml: |-
13
    global:
14
      resolve_timeout: 5m
15
      slack_api_url: "<your_slack_hook>"
16
      victorops_api_url: "<your_victorops_hook>"
17
 
          
18
    templates:
19
    - '/etc/alertmanager-templates/*.tmpl'
20
    route:
21
      group_by: ['alertname', 'cluster', 'service']
22
      group_wait: 10s
23
      group_interval: 1m
24
      repeat_interval: 5m  
25
      receiver: default 
26
      routes:
27
      - match:
28
          team: devops
29
        receiver: devops
30
        continue: true 
31
      - match: 
32
          team: dev
33
        receiver: dev
34
        continue: true
35
 
          
36
    receivers:
37
    - name: 'default'
38
 
          
39
    - name: 'devops'
40
      victorops_configs:
41
      - api_key: '<YOUR_API_KEY>'
42
        routing_key: 'devops'
43
        message_type: 'CRITICAL'
44
        entity_display_name: '{{ .CommonLabels.alertname }}'
45
        state_message: 'Alert: {{ .CommonLabels.alertname }}. Summary:{{ .CommonAnnotations.summary }}. RawData: {{ .CommonLabels }}'
46
      slack_configs:
47
      - channel: '#k8-alerts'
48
        send_resolved: true
49
 
          
50
 
          
51
    - name: 'dev'
52
      victorops_configs:
53
      - api_key: '<YOUR_API_KEY>'
54
        routing_key: 'dev'
55
        message_type: 'CRITICAL'
56
        entity_display_name: '{{ .CommonLabels.alertname }}'
57
        state_message: 'Alert: {{ .CommonLabels.alertname }}. Summary:{{ .CommonAnnotations.summary }}. RawData: {{ .CommonLabels }}'
58
      slack_configs:
59
      - channel: '#k8-alerts'
60
        send_resolved: true
61
 
          
62
---
63
apiVersion: extensions/v1beta1
64
kind: Deployment
65
metadata:
66
  name: alertmanager
67
  namespace: monitoring
68
spec:
69
  replicas: 1
70
  selector:
71
    matchLabels:
72
      app: alertmanager
73
  template:
74
    metadata:
75
      name: alertmanager
76
      labels:
77
        app: alertmanager
78
    spec:
79
      containers:
80
      - name: alertmanager
81
        image: prom/alertmanager:v0.15.3
82
        args:
83
          - '--config.file=/etc/alertmanager/config.yml'
84
          - '--storage.path=/alertmanager'
85
        ports:
86
        - name: alertmanager
87
          containerPort: 9093
88
        volumeMounts:
89
        - name: config-volume
90
          mountPath: /etc/alertmanager
91
        - name: alertmanager
92
          mountPath: /alertmanager
93
      volumes:
94
      - name: config-volume
95
        configMap:
96
          name: alertmanager
97
      - name: alertmanager
98
        emptyDir: {}
99
---
100
apiVersion: v1
101
kind: Service
102
metadata:
103
  annotations:
104
    prometheus.io/scrape: 'true'
105
    prometheus.io/path: '/metrics'
106
  labels:
107
    name: alertmanager
108
  name: alertmanager
109
  namespace: monitoring
110
spec:
111
  selector:
112
    app: alertmanager
113
  ports:
114
  - name: alertmanager
115
    protocol: TCP
116
    port: 9093
117
    targetPort: 9093
118
 
          



This will create our alertmanager deployment which will deliver all alerts generated as per Prometheus rules.

Deploying Kubestate Metrics

Shell
 




xxxxxxxxxx
1
177


 
1
apiVersion: v1
2
kind: Namespace
3
metadata:
4
  name: monitoring
5
---
6
apiVersion: rbac.authorization.k8s.io/v1 
7
# kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
8
kind: ClusterRoleBinding
9
metadata:
10
  name: kube-state-metrics
11
roleRef:
12
  apiGroup: rbac.authorization.k8s.io
13
  kind: ClusterRole
14
  name: kube-state-metrics
15
subjects:
16
- kind: ServiceAccount
17
  name: kube-state-metrics
18
  namespace: monitoring
19
---
20
apiVersion: rbac.authorization.k8s.io/v1
21
# kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
22
kind: ClusterRole
23
metadata:
24
  name: kube-state-metrics
25
rules:
26
- apiGroups: [""]
27
  resources:
28
  - configmaps
29
  - secrets
30
  - nodes
31
  - pods
32
  - services
33
  - resourcequotas
34
  - replicationcontrollers
35
  - limitranges
36
  - persistentvolumeclaims
37
  - persistentvolumes
38
  - namespaces
39
  - endpoints
40
  verbs: ["list", "watch"]
41
- apiGroups: ["extensions"]
42
  resources:
43
  - daemonsets
44
  - deployments
45
  - replicasets
46
  verbs: ["list", "watch"]
47
- apiGroups: ["apps"]
48
  resources:
49
  - statefulsets
50
  verbs: ["list", "watch"]
51
- apiGroups: ["batch"]
52
  resources:
53
  - cronjobs
54
  - jobs
55
  verbs: ["list", "watch"]
56
- apiGroups: ["autoscaling"]
57
  resources:
58
  - horizontalpodautoscalers
59
  verbs: ["list", "watch"]
60
---
61
apiVersion: rbac.authorization.k8s.io/v1
62
# kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
63
kind: RoleBinding
64
metadata:
65
  name: kube-state-metrics
66
  namespace: monitoring
67
roleRef:
68
  apiGroup: rbac.authorization.k8s.io
69
  kind: Role
70
  name: kube-state-metrics-resizer
71
subjects:
72
- kind: ServiceAccount
73
  name: kube-state-metrics
74
  namespace: monitoring
75
---
76
apiVersion: rbac.authorization.k8s.io/v1
77
# kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
78
kind: Role
79
metadata:
80
  namespace: monitoring
81
  name: kube-state-metrics-resizer
82
rules:
83
- apiGroups: [""]
84
  resources:
85
  - pods
86
  verbs: ["get"]
87
- apiGroups: ["extensions"]
88
  resources:
89
  - deployments
90
  resourceNames: ["kube-state-metrics"]
91
  verbs: ["get", "update"]
92
---
93
apiVersion: v1
94
kind: ServiceAccount
95
metadata:
96
  name: kube-state-metrics
97
  namespace: monitoring
98
---
99
apiVersion: apps/v1
100
kind: Deployment
101
metadata:
102
  name: kube-state-metrics
103
  namespace: monitoring
104
spec:
105
  selector:
106
    matchLabels:
107
      k8s-app: kube-state-metrics
108
  replicas: 1
109
  template:
110
    metadata:
111
      labels:
112
        k8s-app: kube-state-metrics
113
    spec:
114
      serviceAccountName: kube-state-metrics
115
      containers:
116
      - name: kube-state-metrics
117
        image: quay.io/mxinden/kube-state-metrics:v1.4.0-gzip.3
118
        ports:
119
        - name: http-metrics
120
          containerPort: 8080
121
        - name: telemetry
122
          containerPort: 8081
123
        readinessProbe:
124
          httpGet:
125
            path: /healthz
126
            port: 8080
127
          initialDelaySeconds: 5
128
          timeoutSeconds: 5
129
      - name: addon-resizer
130
        image: k8s.gcr.io/addon-resizer:1.8.3
131
        resources:
132
          limits:
133
            cpu: 150m
134
            memory: 50Mi
135
          requests:
136
            cpu: 150m
137
            memory: 50Mi
138
        env:
139
          - name: MY_POD_NAME
140
            valueFrom:
141
              fieldRef:
142
                fieldPath: metadata.name
143
          - name: MY_POD_NAMESPACE
144
            valueFrom:
145
              fieldRef:
146
                fieldPath: metadata.namespace
147
        command:
148
          - /pod_nanny
149
          - --container=kube-state-metrics
150
          - --cpu=100m
151
          - --extra-cpu=1m
152
          - --memory=100Mi
153
          - --extra-memory=2Mi
154
          - --threshold=5
155
          - --deployment=kube-state-metrics
156
---
157
apiVersion: v1
158
kind: Service
159
metadata:
160
  name: kube-state-metrics
161
  namespace: monitoring
162
  labels:
163
    k8s-app: kube-state-metrics
164
  annotations:
165
    prometheus.io/scrape: 'true'
166
spec:
167
  ports:
168
  - name: http-metrics
169
    port: 8080
170
    targetPort: http-metrics
171
    protocol: TCP
172
  - name: telemetry
173
    port: 8081
174
    targetPort: telemetry
175
    protocol: TCP
176
  selector:
177
    k8s-app: kube-state-metrics



Kubestate metrics deployment is needed to relay some important container metrics which are not natively exposed by the kubelet and hence are not directly available to Prometheus.

Deploying Node-Exporter Daemonset

Shell
 




xxxxxxxxxx
1
64


 
1
apiVersion: v1
2
kind: Namespace
3
metadata:
4
  name: monitoring
5
---
6
apiVersion: extensions/v1beta1
7
kind: DaemonSet
8
metadata:
9
  name: node-exporter
10
  namespace: monitoring
11
  labels:
12
    name: node-exporter
13
spec:
14
  template:
15
    metadata:
16
      labels:
17
        name: node-exporter
18
      annotations:
19
         prometheus.io/scrape: "true"
20
         prometheus.io/port: "9100"
21
    spec:
22
      hostPID: true
23
      hostIPC: true
24
      hostNetwork: true
25
      containers:
26
        - name: node-exporter
27
          image: prom/node-exporter:v0.16.0
28
          securityContext:
29
            privileged: true
30
          args:
31
            - --path.procfs=/host/proc
32
            - --path.sysfs=/host/sys
33
          ports:
34
            - containerPort: 9100
35
              protocol: TCP
36
          resources:
37
            limits:
38
              cpu: 100m
39
              memory: 100Mi
40
            requests:
41
              cpu: 10m
42
              memory: 100Mi
43
          volumeMounts:
44
            - name: dev
45
              mountPath: /host/dev
46
            - name: proc
47
              mountPath: /host/proc
48
            - name: sys
49
              mountPath: /host/sys
50
            - name: rootfs
51
              mountPath: /rootfs
52
      volumes:
53
        - name: proc
54
          hostPath:
55
            path: /proc
56
        - name: dev
57
          hostPath:
58
            path: /dev
59
        - name: sys
60
          hostPath:
61
            path: /sys
62
        - name: rootfs
63
          hostPath:
64
            path: /



Node-Exporter daemonset runs a pod of node-exporter on each node and exposes very important node related metrics which can be pulled by Prometheus instances.
Deploying Grafana

Shell
 




xxxxxxxxxx
1
86


 
1
apiVersion: v1
2
kind: Namespace
3
metadata:
4
  name: monitoring
5
---
6
apiVersion: storage.k8s.io/v1beta1
7
kind: StorageClass
8
metadata:
9
  name: fast
10
  namespace: monitoring
11
provisioner: kubernetes.io/gce-pd
12
allowVolumeExpansion: true
13
---
14
apiVersion: apps/v1beta1
15
kind: StatefulSet
16
metadata:
17
  name: grafana
18
  namespace: monitoring
19
spec:
20
  replicas: 1
21
  serviceName: grafana
22
  template:
23
    metadata:
24
      labels:
25
        task: monitoring
26
        k8s-app: grafana
27
    spec:
28
      containers:
29
      - name: grafana
30
        image: k8s.gcr.io/heapster-grafana-amd64:v5.0.4
31
        ports:
32
        - containerPort: 3000
33
          protocol: TCP
34
        volumeMounts:
35
        - mountPath: /etc/ssl/certs
36
          name: ca-certificates
37
          readOnly: true
38
        - mountPath: /var
39
          name: grafana-storage
40
        env:
41
        - name: GF_SERVER_HTTP_PORT
42
          value: "3000"
43
          # The following env variables are required to make Grafana accessible via
44
          # the kubernetes api-server proxy. On production clusters, we recommend
45
          # removing these env variables, setup auth for grafana, and expose the grafana
46
          # service using a LoadBalancer or a public IP.
47
        - name: GF_AUTH_BASIC_ENABLED
48
          value: "false"
49
        - name: GF_AUTH_ANONYMOUS_ENABLED
50
          value: "true"
51
        - name: GF_AUTH_ANONYMOUS_ORG_ROLE
52
          value: Admin
53
        - name: GF_SERVER_ROOT_URL
54
          # If you're only using the API Server proxy, set this value instead:
55
          # value: /api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
56
          value: /
57
      volumes:
58
      - name: ca-certificates
59
        hostPath:
60
          path: /etc/ssl/certs
61
  volumeClaimTemplates:
62
  - metadata:
63
      name: grafana-storage
64
      namespace: monitoring
65
    spec:
66
      accessModes: [ "ReadWriteOnce" ]
67
      storageClassName: fast
68
      resources:
69
        requests:
70
          storage: 5Gi
71
---
72
apiVersion: v1
73
kind: Service
74
metadata:
75
  labels:
76
    kubernetes.io/cluster-service: 'true'
77
    kubernetes.io/name: grafana
78
  name: grafana
79
  namespace: monitoring
80
spec:
81
  ports:
82
  - port: 3000
83
    targetPort: 3000
84
  selector:
85
    k8s-app: grafana
86
 
          



This will create our Grafana Deployment and Service which will be exposed using our Ingress Object. We should add Thanos-Querier as the datasource for our Grafana deployment. In order to do so:

  1. Click on Add DataSource
  2. Set Name: DS_PROMETHEUS
  3. Set Type: Prometheus
  4. Set URL: http://thanos-querier:9090
  5. Save and Test. You can now build your custom dashboards or simply import dashboards from grafana.net. Dashboard #315 and #1471 are good to start with.

Deploying the Ingress Object

Shell
 




xxxxxxxxxx
1
58


 
1
apiVersion: extensions/v1beta1
2
kind: Ingress
3
metadata:
4
  name: monitoring-ingress
5
  namespace: monitoring
6
  annotations:
7
    kubernetes.io/ingress.class: "nginx"
8
spec:
9
  rules:
10
  - host: grafana.<yourdomain>.com
11
    http:
12
      paths:
13
      - path: /
14
        backend:
15
          serviceName: grafana
16
          servicePort: 3000
17
  - host: prometheus-0.<yourdomain>.com
18
    http:
19
      paths:
20
      - path: /
21
        backend:
22
          serviceName: prometheus-0-service
23
          servicePort: 8080
24
  - host: prometheus-1.<yourdomain>.com
25
    http:
26
      paths:
27
      - path: /
28
        backend:
29
          serviceName: prometheus-1-service
30
          servicePort: 8080
31
  - host: prometheus-2.<yourdomain>.com
32
    http:
33
      paths:
34
      - path: /
35
        backend:
36
          serviceName: prometheus-2-service
37
          servicePort: 8080
38
  - host: alertmanager.<yourdomain>.com
39
    http: 
40
      paths:
41
      - path: /
42
        backend:
43
          serviceName: alertmanager
44
          servicePort: 9093
45
  - host: thanos-querier.<yourdomain>.com
46
    http:
47
      paths:
48
      - path: /
49
        backend:
50
          serviceName: thanos-querier
51
          servicePort: 9090
52
  - host: thanos-ruler.<yourdomain>.com
53
    http:
54
      paths:
55
      - path: /
56
        backend:
57
          serviceName: thanos-ruler
58
          servicePort: 9090



This is the final piece in the puzzle. This will help expose all our services outside the Kubernetes cluster and help us access them. Make sure you replace <yourdomain> with a domain name which is accessible to you and you can point the Ingress-Controller’s service to.

You should now be able to access Thanos Querier at http://thanos-querier.<yourdomain>.com . It will look something like this:

High Availability Kubernetes Monitoring Using Prometheus and Thanos

Make sure deduplication is selected.

If you click on Stores all the active endpoints discovered by thanos-store-gateway service can be seen

High Availability Kubernetes Monitoring Using Prometheus and Thanos


Now you add Thanos Querier as the datasource in Grafana and start creating dashboards

High Availability Kubernetes Monitoring Using Prometheus and Thanos

Kubernetes Cluster Monitoring Dashboard

High Availability Kubernetes Monitoring Using Prometheus and Thanos

Kubernetes Node Monitoring Dashboard

High Availability Kubernetes Monitoring Using Prometheus and Thanos

Conclusion

Integrating Thanos with Prometheus definitely provides the ability to scale Prometheus horizontally, and also since Thanos-Querier is able to pull metrics from other querier instances, you can practically pull metrics across clusters visualize them in a single dashboard.

We are also able to archive metric data in an object store that provides infinite storage to our monitoring system along with serving metrics from the object storage itself. A major part of cost for this set-up can be attributed to the object storage (S3 or GCS). This can be further reduced if we apply appropriate retention policies to them.

However, achieving all this requires quite a bit of configuration on your part. The manifests provided above have been tested in a production environment. Feel free to reach out should you have any questions around them.

 

 

 

 

Top