Latency#

Everything has been designed to observe those pesky latencies!

Note

Caveat… we could have missed something that you want to observe 😄 But that would just be another reason for you to get in touch with us! We aim to respond with a reasonably short latency.

Clocks#

Depending on the situation, Roq use either of these clocks (or both)

  • System clock

  • Realtime clock

The reasons for considering two clocks:

  • The system clock is always monotonically increasing and therefore useful.

  • The realtime clock is comparable between different hosts.

The reasons for not just choosing one of the clocks:

  • The system clock is not comparable between different hosts.

  • The realtime clock is susceptible to adjustments (e.g. NTP) potentially creating a bias towards zero latency, or maybe even allowing for negative latency.

We use the clock_gettime function call with either CLOCK_MONOTONIC or CLOCK_REALTIME.

Jitter#

You should at a minimum consider the following measures if you really care about latency

  • Physical instead of virtualized hardware.

  • Disable hyper-threading.

  • Use the isolcpus kernel boot parameter.

  • Understand your hardware’s NUMA design.

  • Use the component’s command-line flags to tune for low latency (thread pinning, busy poll, no sleep).

This will take care of most of the jitter.

There are other measures one could take (like disabling interrupts), but they require much deeper knowledge about the system design.

Overview#

This chart serves as a reference for the components for which we can measure latencies

Note

This discussion is only concerned with a single host.

MessageInfo#

The roq::MessageInfo struct contains various timestamps.

The convention is that those fields having the _utc postfix are based on the realtime clock and otherwise it’s the system clock.

Note

The _utc postfix will not be shown in the following. This is done to better explain the concepts. Depending on your situation, you can substitue as necessary.

We have the following measurements

Field

Purpose

origin_create_time

Time when a message was received from an external source. This field will be copied forward for all processing steps. It is possible to measure total internal processing time with this field as a reference.

source_receive_time

Time when a trigger event was received by the component which forwarded this event.

source_send_time

Time when this event was sent by the component which forwarded it.

receive_time

Time when this event was received by this component.

Any processing step can then perform the following calculations

Calculation

Interpretation

now() - origin_create_time

Total processing time since a message was first received. This is useful to keep a handle on how late the message is.

source_send_time - source_receive_time

Internal queue and process time by the previous component. This is useful to monitor potential queueing by the previous component, e.g. a gateway.

receive_time - source_send_time

Shared memory queue time between previous component and this component. This is both a measure of hardware cache coherence latency as well as own components processing time (of previous events).

now() - receive_time

Internal queue and process time by this component. Useful when profiling own component.

Important

The roq::MessageInfo struct is automatically populated and made available to API event handlers. To ensure correct use, it is not allowed for the end-user to specify any of roq::MessageInfo values through the API dispatch interfaces.

Round-Trip Latency#

  • Market data triggering order action (no bridge!)

Step

Time

origin_create_time

source_receive_time

source_send_time

receive_time

A

1

1

1

B

2

1

1

2

C

3

1

1

2

3

D

4

1

3

4

3

E

5

1

3

4

5

F

6

1

5

6

5

External Event#

  • Timer based event

  • External order action (e.g. from bridge)

Step

Time

origin_create_time

source_receive_time

source_send_time

receive_time

A

1

1

1

B

2

1

1

2

C

3

1

1

2

3

D

4

1

3

4

3

Market Data Latency#

Note

It is only possible to measure market data latency when the exchange provides a time-stamp. There are other alternative measures one could use to try and estimate the latency (such as ping time), but his is often made difficult by a CDN being in the middle.

There’s not much to say about this. The relevant messages has the exchange_time_utc field and you can measure latency as now() - exchange_time_utc.

The accuracy is obviously a function of such topics as relative NTP synchronization, truncation (by exchange) and maybe throttling or conflation (by exchange) in some scenarios.

Request Round-Trip Latency#

This is related to the time between an order action is being sent by the gateway and time it receives a response (which can be matched to the request).

This latency is communicated with the roq::OrderAck struct as the round_trip_latency field.

The calculation is origin_create_time - request.receive_time.

Note

This is an asynchronous calculation. The request caches the receive_time from the order action event. The exchange response will be associated with an origin_create_time.

Warning

This is technically wrong since we include the internal gateway processing time for the request. The reason for this is that the request creation is part of the processing which happens before the final request message (for the exchange) is being created.

Profiling#

Key functions are automatically profiled to help with the following

  • Easily monitor the “cost” of the different functions

  • Interrupts and thread scheduling can cause signficant spikes in processing time

  • Understand where to optimise code

There are also draw-backs making this latency measure less useful

  • Some profiled functions are top-level which naturally splits into other function calls depending e.g. on the type of required processing. For example, a generic top level function could be parse which can branch into parse_top_of_book and parse_market_by_price. The complexity of those two sub-functions are significantly different to make the top-level profiling seem noisy.

  • The cost of processing a single update may vary considerably. For example, the complexity of parsing an initial snapshot of a full order book is magnitudes mores complex than parsing an incremental update.

Metrics#

Different metrics are being captured and made available to e.g. Prometheus.

request_latency

Described above.

heartbeat

Gateways will automatically ping connected clients. Connected clients will automatically echo to the sending gateway. This metric records half the time between gateway sending ping and receiving a response. In other words: this is the 1-way latency between a gateway and client.

This latency is a function of NUMA and will measure how busy the client is. In particular, this latency will increase if a strategy blocks for considerable time.

inter_process

Note

This metrics should be correlated with heartbeat.

Gateways automatically attach a timestamp when a message is being enqueued to the shared memory buffer. Clients can directly measure the inter-process buffer time from this metrics as the difference between current time and the sending time.

This is a measure of how late a message is. Latency will increase if the client can not keep up with the message rate of the broadcast queue.

round_trip

Order actions are triggered by internal or external events. The origin could be a message, a timer or a bridged event.

This is a measure of the time between the timestamp at origin and the time when the order action has been finally processed by the gateway.

Note

“Processed” could mean having sent a message to the exchange or some error condition was detected (like network issue or failed validation).

end_to_end_latency

Note

This metrics is currently only computed by the FIX bridge.

This is a measure of the time between the timestamp at origin and the time when a message is being forwarded to a bridged client.