Monitoring DC/OS Nodes and Containers With cAdvisor, InfluxDB, and Grafana

This blog post illustrates a DC/OS usage example. There are usage examples for most of the packages available in the DC/OS Universe. For a full collection of examples, please check here.

Monitoring of servers and applications in a DC/OS cluster is a need that becomes particularly evident as clusters grow over time in size, number of users, and/or number of running applications. There are a plethora of options for monitoring a DC/OS cluster, including many open-source and commercial solutions from different vendors. Many of them can be installed directly from the DC/OS Universe, making it straightforward to try them for your particular use case and run them in your infrastructure.

In this blog, we will describe how to install and configure a DC/OS monitoring stack composed of three open-source components that can be used together in order to gather, export, store, and display detailed metrics about the hosts, applications, and containers in your DC/OS cluster. These components are:

Estimated time for completion: Up to 30 minutes.

Target audience: Operators and application admins.

Scope: You’ll learn how to install and configure cAdvisor, InfluxDB, and Grafana in order to monitor your cluster’s health and performance.

Prerequisites

Install

Let's see how to install cAdvisor, InfluxDB, and Grafana.

cAdvisor Installation

Log into DC/OS, go to Universe, and select the cAdvisor package from Universe. Click Install or, optionally, click Advanced Installation and modify the following parameters according to your needs:

Once these parameters are set, you can simply click Install. A copy of cAdvisor will be spawned in each node of your cluster and will automatically start streaming metrics.

InfluxDB Installation

Log into DC/OS, go to Universe, and select the influxdb package from Universe. Click Install or, optionally, click Advanced Installation and modify the following parameters according to your needs:

Once these parameters are set, you can simply click Install.

Grafana Installation

Log into DC/OS, go to Universe, and select the grafana package from Universe. Click Install or, optionally, click Advanced Installation and modify the following parameters according to your needs:

Use

Once the three packages are up and running, the instances of cAdvisor running on each node of your cluster should be streaming metrics towards InfluxDB.

DC/OS Running State

These metrics can be displayed and graphed according to Grafana’s powerful “dashboard” options. In order to display the metrics of a cluster, log into the Grafana cluster through your public node’s IP address, and the port chosen to display the interface (by default, port 13000): http://[your_public_node_ip_address]:13000.

You will be presented with Grafana’s login screen. Use the default admin account with the default admin password (or the password value you chose during installation).

Grafana login screen

After logging in, you are presented with Grafana’s “Home Dashboard” screen. Grafana boots with no data sources or dashboards pre-loaded, so we’ll need to connect it to InfluxDB, and then load a dashboard that reads and graphs the information stored in it.

Add InfluxdB Data Source to Grafana

Click on the top-left corner button of the Grafana interface and select Data Sources.

Grafana data sources

Click on the Add data source button and you’ll be presented with the Add data source screen.

Grafana add data source

Fill in the following parameters.

Name: influxdb (Check the Default box to make this the default data source for the system.)

Type: select InfluxDB

HTTP Settings

URL: http://influxdb.marathon.l4lb.thisdcos.directory:8086

Access: proxy (Check “Basic Auth” only on “Http Auth.”)

User: admin

Password: admin

InfluxDB Details

Database: cadvisor

User: root

Password: root

Finally, click Save and Test. If things are working correctly, you should see a message with “Success. Data Source is working.”

After that, the influxdb data source is configured and appears as available when clicking on Data Sources.

Grafana influx db data source working

Add Dashboard to Grafana

The last step for the metrics to be displayed in Grafana is to add a dashboard where the desired metrics are displayed. With this document, we are providing a base dashboard that allows displaying filesystem, network, CPU, and memory usage across the cluster, separated by services and aggregating the different instances per service.

Please download the example dashboard to your computer.

Go to the Dashboards home screen by clicking on the Grafana button on the top left of the screen, and clicking on Dashboards, then on Home.

Click on the Home button at the top of the screen. This will create a drop-down menu with all the dashboards in the system. Find the Import button at the bottom of the screen and click on it. You will be presented with an Import Dashboard screen. Find the Upload.json file and click on it:

Grafana import dashboard

Select the JSON file that you downloaded with the [example dashboard][13]. Select the influxdb source of data that was previously configured.

Grafana import dashboard 2

Click Save and Open. The Dashboard should now open up showing the different metrics.

Grafana dashboard

Uninstall

You can uninstall any of these components from the User Interface in the Universe's Installed menu.

To uninstall a component using the DC/OS CLI, run the dcos package uninstall command. For example, to uninstall Grafana use the following command:

$ dcos package uninstall grafana

 

 

 

 

Top