What is Prometheus?

This is the first post on Prometheus series. If you already know basics of Prometheus, then please read my post on Prometheus Hands-On Go api monitoring using Prometheus and Grafana Integration using Docker Compose

In this post, we will look at

What is Prometheus?

Prometheus Architecture

Prometheus Metrics

Where does it fit best?

Where does it NOT Fit?

Key Highlights

What is Prometheus?

This is an open-source monitoring and alerting system.

Prometheus can be used to monitor highly dynamic container environments like Kubernetes, Docker Swarm or we can also use it as a traditional non-container application.

Targets can be configured with service discovery pattern or static configuration is also supported.

Hey but what is target here? : Target is our application which expose an end point /metrics by using Prometheus client libraries and this is the end point where Prometheus can pull the data over http. So, our application is a target for Prometheus.

Push mechanism is also supported for a special cases like short lived scheduled jobs and this is when PushGateway component of Prometheus comes into play.

This Pull mechanism is one of the key differentiator of Prometheus from other monitoring systems like AWS CloudWatch, Nagios, NewRelic etc which installs a daemon (an additional step) in each service and pushes the metrics from service to centralized server.

Stores metrics in a time series format with key/value pair.

PromQL fetches the metrics and aggregated data from time series database via HTTP server.

Prometheus Architecture

Prometheus consists of various components but at it’s core there are 3 main components and other are optional components.

Each Prometheus server has these 3 components.

Data Retrieval Worker

This is responsible for scraping metrics from external sources/targets and pushing into the time series database.

Time series database

Time series means, changes are recorded over time. For an api, it could be number of request per minute, for database, it may be number of active connections on a given time.
Prometheus stores all metics data on a local on-disk time series database. It can also optionally integrates with remote storage systems.

This uses multi-dimensional data model. Time series defined by metric name and set of key/value dimensions.

HttpServer

This component accepts queries from clients ( data visualization clients like Granfa, Prometheus UI ) or alerts managers and then fetch metrics from prometheus database.

Additional components

Pushgateway

This comes into play when we want to monitor short lived jobs, like batch job etc. These short lived jobs then push it’s metrics to Pushgateway, which then intern transfer these metrics to the Prometheus data retrieval worker.

Alert Manager

This component receive the messages from HTTP Server component and sends notification to it’s clients, which could be PagerDuty, Slack, any Email or SMS tools etc.

Service Discovery

Prometheus is designed to quickly up and running with a very basic setup and also supports dynamic configuration for looking up targets in a container based environments like Kubernetes with service discovery pattern.

Prometheus Metrics

Prometheus has got very simple to understand text-format metrics

There are currently 4 types of metrics supported

Counter

Counter represents a cummulative value. It can increase, reset to zero. Use this in a situations when you want know how many times a x has happened. For example, Number of request served by an api, number of errors, number of task completed.

Most counters are therefore named using the _total suffix e.g. http_requests_total.

Important - do not use counter for things which are decreasing. For example - number of currently running process. We should be using Gauge for this case.

Gauge

Its a value that may go up or down. For example, current memory usage or also can be used for count which can go down like number of concurrent requests.

Histogram

This is a observed metrics shared into distinct bucket. If you want to track something “how long this took” or “how big something was”, then use Histogram.

Summary

Similar to histogram and also provides total count of observations.

Where does it fit best?

It is ideal in highly dynamic systems such as microservices running in a cloud environment.

Where does it NOT fit?

It may not be best fit where we need 100% accuracy like real time billing system.In such a case the specific billing function should be addressed with an alternative, but Prometheus may still be the right tool for monitoring the other application and infrastructure functions.

Key Highlights

🔆 Prometheus is recognized as a Cloud Native Computing Foundation member project.

🔆 Most of the Prometheus components are written in GO.

🔆 Recording any purely numeric time series.

🔆 Fits both machine-centric monitoring as well as monitoring of highly dynamic service-oriented architectures.

🔆 Support for multi-dimensional data collection in a world of MicroServices.

🔆 Strong querying mechanism.

🔆 Each Prometheus server is standalone, not depending on network storage or other remote services.

What is Prometheus?

What is Prometheus?

Prometheus Architecture

Data Retrieval Worker

Time series database

HttpServer

Pushgateway

Alert Manager

Service Discovery

Prometheus Metrics

Counter

Gauge

Histogram

Summary

Where does it fit best?

Where does it NOT fit?

Key Highlights

See Also