Configure Prometheus (metrics and monitoring)

You will learn basic Prometheus configuration and how to query the collected metrics.

Configuration

Classic

A minimal Prometheus config file could look like this

global:
  scrape_interval: 2s

  scrape_configs:
    - job_name: deribit
      metrics_path: '/metrics'
      static_configs:
        - targets: ['localhost:1234']

Prometheus will pull metrics from the list of services under scrape_configs.

In this case we have launched the gateway with the --service_listen_address=tcp://localhost:1234 flag.

Pushgateway

You can also choose to push metrics to Prometheus if you also launch Prometheus’ pushgateway.

There’s no special configuration needed for the pushgateway.

However, Prometheus should be configured to scrape the pushgateway instead of the individual services

global:
  scrape_interval: 2s

  scrape_configs:
    - job_name: pushgateway
      metrics_path: '/metrics'
      static_configs:
        - targets: ['pushgateway.service:9091']

In this case we launch the gateway with the --metrics_push_uri=http://localhost:9091/metrics flag.

You should be able to see your metrics using the pushgateway’s UI

../../../_images/pushgateway.png

Prometheus

Prometheus’ UI let’s you easily search for metrics

../../../_images/prometheus_search.png

We will find the same metrics discussed in the previous blog post.

First, the connection counters

../../../_images/counter.png

Second, the request latency

../../../_images/request_latency_search.png ../../../_images/request_latency_sum.png ../../../_images/request_latency_count.png ../../../_images/request_latency_bucket.png

PromQL

Prometheus has a powerful query language.

Here we will demonstrate how to compute a time-series of exchange ping latencies (the roq_latency_* keys).

The most recent observation

../../../_images/latency.png

Here we repeat the last record

roq_latency_sum {
  connection="5:md",
  exported_instance="deribit",
  exported_job="roq",
  function="ping",
  instance="pushgateway.service:9091",
  job="pushgateway",
  source="deribit"
}

The labels generated by the gateway for this metric are

  • connection

  • function

  • source

This is our query and the resulting time-series

../../../_images/promql.png

This is the query if you want to copy-paste

irate(roq_latency_sum{source="deribit"}[1m]) / on(connection, function)
irate(roq_latency_count{source="deribit"}[1m]) / 1000000