How to Automate Kubernetes Monitoring Like a Pro

Our guest blogger, Farhan Munir, explains why monitoring is important and presents a number of tools for monitoring Kubernetes like a pro

Written by Farhan Munir • Last Updated: • Cloud •

Kubernetes Logo

[Image Source]

Kubernetes is a leading container orchestration platform, which enables you to dynamically manage your microservices, but it is very difficult to manage. The more containers you’re deploying, the more difficult it becomes to control the operation. This is why monitoring is such a crucial part of Kubernetes management. 

To ensure that you are not wasting development time, you can set up automated monitoring processes. There are some native options built in, and there are also third-party monitoring tools you can use. In this article, you will learn how monitoring works in Kubernetes, what third-party monitoring automation tools are available, and pro tips for successful monitoring automation.

What to Monitor in Kubernetes

Effective management of Kubernetes (k8s) deployments relies on consistent monitoring. Kubernetes deployments are complex with many moving parts and integrated components. If something goes wrong, you want to know right away and the best way to do this is to make sure you are monitoring key assets. 

Clusters and Infrastructure

As with any other system, monitoring the infrastructure of your deployment can help you identify and prevent issues in performance and functioning. In particular, you should pay attention to the following aspects:

  • Resource utilization—informs you of the level of system and user demand, including bandwidth use, CPU, and memory use. These metrics are useful for identifying and preventing possible bottlenecks. 
  • Disk pressure—informs you of the amount of disk space available. This is key information when you are running write-intensive services such as datastores or etcd. Failure to notice insufficient disk space can lead to data corruption and prevent continued operations.
  • Node/pod resources—informs you of how many nodes or pods are running and whether the system can accommodate more. This indicates the scalability of your system. Sudden or consistent drops in pod resources can also indicate a possible service interruption. 

Services

Within your deployment, there are two types of services you need to monitor — Kubernetes and internal. Technically, a Kubernetes Service is a logical set of pods in your deployment. However, when referring to k8s services, you should also keep in mind the various components that interface with your Services to provide functionality. For example, etcd, kube-scheduler, kube-api-server, and your controller managers. 

For internal services, you need to be monitoring the application components you have running on your nodes. This means monitoring application-specific metrics related to the application’s business rules. These metrics are often exposed directly by the application but you are required to ingest them via your monitoring tools.

Kubernetes Monitoring Automation Tools

Once you understand what you need to monitor in Kubernetes, it helps to know what tools are available. Below are a few open-source options you can use. 

Prometheus

Prometheus is a metrics and alerting tool designed specifically for containers. It is natively supported by Kubernetes and is one of the most commonly used monitoring tools. Prometheus is highly customizable and integrates well with other tools for more advanced Kubernetes monitoring solutions.

Key features of Prometheus include:

  • Multidimensional data model based on key-value pairs
  • Accessible format and protocols that are human-readable and exposed via HTTP 
  • Service discovery that enables metrics to be scraped from ephemeral workloads
  • Modular and highly available components

Grafana

Grafana is a platform you can use to monitor, analyze, and visualize metrics data. It is specifically designed for time-series analytics also it can support a range of data types.

Key features of Grafana include:

  • Custom alerting and notifications through a variety of channels
  • Dashboard templating for easy reproducibility
  • Automated provisioning of clusters and monitoring 
  • Ability to add data annotations to visualizations for clarity and correlation

cAdvisor

cAdvisor is a daemon you can use to collect, process, and export container performance and resource data. It can be run alone or as a daemonset and is natively supported by Docker containers. cAdvisor is also integrated into Kubelet for native use. 

Key features of cAdvisor include:

  • Automatic discovery of containers by node
  • Data is exportable to a variety of storage services, including Elasticsearch and InfluxDB.
  • Ability to analyze root containers to determine overall machine usage
  • Web-based UI for easy distributed use

Jaeger

Jaeger is a monitoring and troubleshooting tool that you can use for distributed, end-to-end tracing. It enables you to accomplish distributed context propagation and transaction monitoring, root cause and service dependency analyses, and performance optimizations.

Key features of Jaeger include:

  • Can connect to multiple storage backends, including Elasticsearch, Cassandra, Kafka, and in-memory storage
  • Built on an OpenTracing compatible data model with library support for Go, Python, Java, Node, and C++
  • Easy deployment to k8s via a Helm chart, templates, and a k8s operator
  • Default exposure of Prometheus metrics

Pro Tips for Kubernetes Monitoring Success

When setting up monitoring for Kubernetes there are a few tips that you can apply to help ensure success. Below are some of the most important to start with. 

Watch your API Gateways

Tracking API metrics can help you detect microservice issues faster than relying only on resource metrics. In particular, make sure to watch your request rates, call rates, and latency. Unexpected fluctuations in these metrics can alert you directly to component issues.

To set this monitoring up, it’s easiest to focus your efforts on API requests being made on the services load balancer. This ensures that all of your services are monitored in a standardized way and enables you to set alerts at any level of the API.

Track your disk usage

High disk usage (HDU) is one of the most common issues faced. Unfortunately, it is also an issue that typically needs manual attention since it indicates that either your resources can’t scale or are not scaling properly. 

When monitoring your disk usage, make sure to watch all attached volumes, including those for your root file system. You should also set alerts conservatively, around 75% use. This should notify you well before data corruption or loss becomes an issue and provides you with enough time to adjust your system and avoid service loss. 

Monitor every layer of your deployment

Make sure that you are monitoring for and logging events at all layers, from applications to controller managers. If you aren’t covering all layers it is difficult or impossible to trace the full effect of issues and you are likely to miss early signs of failure.

When collecting this information, make sure that your monitoring data is consistent throughout. This includes the accuracy of timestamps, which metrics are collected, and units of measurement. You want all logs and metrics to be as directly comparable as possible to provide a clear context for issues. 

Conclusion

When setting up your Kubernetes monitoring processes, it is important to create an organizational system that makes sense for all parties involved. As a standard, you should separate the monitoring tasks of services from those keeping track of clusters and infrastructure. You should also monitor API gateways, track your disk usage, and monitor deployment layers.

To ensure that you’re not wasting time on repetitive tasks, you should automate monitoring processes. You can do that with Kubernetes’ native tool, Prometheus, or with integrated third party services. Before choosing a tool, make sure its integration fits your environment and its interface fits your skillsets. If you do choose third parties or managed services, be sure to create standards that ensure every role knows its responsibilities. This can save you a lot of time and possible headaches.

Did you like this content? Show your support by buying me a coffee.

Buy me a coffee  Buy me a coffee
Picture of Farhan Munir

With over 12 years of experience in the technical domain, I have witnessed the evolution of many web technologies, as well as the rise of the digital economy. I consider myself a life-long learner, and I love experimenting with new technologies. I embrace challenges with enthusiasm and outside-of-the-box mindset. I feel it is important to share your experiences with the rest of the world - in order to pass on the knowledge or let other folks learn from your mistakes or successes. In my spare time, I like to travel and photograph the world.
[YouTube]

comments powered by Disqus