Latency¶
Everything has been designed to observe those pesky latencies!
Note
Caveat… we could have missed something that you want to observe 😄 But that would just be another reason for you to get in touch with us! We aim to respond with a reasonably short latency.
Clocks¶
Depending on the situation, Roq use either of these clocks (or both)
System clock
Realtime clock
The reasons for considering two clocks:
The system clock is always monotonically increasing and therefore useful.
The realtime clock is comparable between different hosts.
The reasons for not just choosing one of the clocks:
The system clock is not comparable between different hosts.
The realtime clock is susceptible to adjustments (e.g. NTP) potentially creating a bias towards zero latency, or maybe even allowing for negative latency.
We use the clock_gettime
function call with either CLOCK_MONOTONIC
or CLOCK_REALTIME
.
Jitter¶
You should at a minimum consider the following measures if you really care about latency
Physical instead of virtualized hardware.
Disable hyper-threading.
Use the
isolcpus
kernel boot parameter.Understand your hardware’s NUMA design.
Use the component’s command-line flags to tune for low latency (thread pinning, busy poll, no sleep).
This will take care of most of the jitter.
There are other measures one could take (like disabling interrupts), but they require much deeper knowledge about the system design.
Overview¶
This chart serves as a reference for the components for which we can measure latencies
Note
This discussion is only concerned with a single host.
MessageInfo¶
The roq::MessageInfo
struct contains various timestamps.
The convention is that those fields having the _utc
postfix are based
on the realtime clock and otherwise it’s the system clock.
Note
The _utc
postfix will not be shown in the following.
This is done to better explain the concepts.
Depending on your situation, you can substitue as necessary.
We have the following measurements
Field |
Purpose |
---|---|
|
Time when a message was received from an external source. This field will be copied forward for all processing steps. It is possible to measure total internal processing time with this field as a reference. |
|
Time when a trigger event was received by the component which forwarded this event. |
|
Time when this event was sent by the component which forwarded it. |
|
Time when this event was received by this component. |
Any processing step can then perform the following calculations
Calculation |
Interpretation |
---|---|
|
Total processing time since a message was first received. This is useful to keep a handle on how late the message is. |
|
Internal queue and process time by the previous component. This is useful to monitor potential queueing by the previous component, e.g. a gateway. |
|
Shared memory queue time between previous component and this component. This is both a measure of hardware cache coherence latency as well as own components processing time (of previous events). |
|
Internal queue and process time by this component. Useful when profiling own component. |
Important
The roq::MessageInfo
struct is automatically populated and
made available to API event handlers.
To ensure correct use, it is not allowed for the end-user to specify any
of roq::MessageInfo
values through the API dispatch interfaces.
Round-Trip Latency¶
Market data triggering order action (no bridge!)
Step |
Time |
|
|
|
|
---|---|---|---|---|---|
A |
1 |
1 |
1 |
||
B |
2 |
1 |
1 |
2 |
|
C |
3 |
1 |
1 |
2 |
3 |
D |
4 |
1 |
3 |
4 |
3 |
E |
5 |
1 |
3 |
4 |
5 |
F |
6 |
1 |
5 |
6 |
5 |
External Event¶
Timer based event
External order action (e.g. from bridge)
Step |
Time |
|
|
|
|
---|---|---|---|---|---|
A |
1 |
1 |
1 |
||
B |
2 |
1 |
1 |
2 |
|
C |
3 |
1 |
1 |
2 |
3 |
D |
4 |
1 |
3 |
4 |
3 |
Market Data Latency¶
Note
It is only possible to measure market data latency when the exchange provides a time-stamp. There are other alternative measures one could use to try and estimate the latency (such as ping time), but his is often made difficult by a CDN being in the middle.
There’s not much to say about this.
The relevant messages has the exchange_time_utc
field and you can
measure latency as now() - exchange_time_utc
.
The accuracy is obviously a function of such topics as relative NTP synchronization, truncation (by exchange) and maybe throttling or conflation (by exchange) in some scenarios.
Request Round-Trip Latency¶
This is related to the time between an order action is being sent by the gateway and time it receives a response (which can be matched to the request).
This latency is communicated with the roq::OrderAck
struct as the
round_trip_latency
field.
The calculation is origin_create_time - request.receive_time
.
Note
This is an asynchronous calculation.
The request caches the receive_time
from the order action event.
The exchange response will be associated with an origin_create_time
.
Warning
This is technically wrong since we include the internal gateway processing time for the request. The reason for this is that the request creation is part of the processing which happens before the final request message (for the exchange) is being created.
Profiling¶
Key functions are automatically profiled to help with the following
Easily monitor the “cost” of the different functions
Interrupts and thread scheduling can cause signficant spikes in processing time
Understand where to optimise code
There are also draw-backs making this latency measure less useful
Some profiled functions are top-level which naturally splits into other function calls depending e.g. on the type of required processing. For example, a generic top level function could be
parse
which can branch intoparse_top_of_book
andparse_market_by_price
. The complexity of those two sub-functions are significantly different to make the top-level profiling seem noisy.The cost of processing a single update may vary considerably. For example, the complexity of parsing an initial snapshot of a full order book is magnitudes mores complex than parsing an incremental update.
Metrics¶
Different metrics are being captured and made available to e.g. Prometheus.
request_latency
Described above.
heartbeat
Gateways will automatically ping connected clients. Connected clients will automatically echo to the sending gateway. This metric records half the time between gateway sending ping and receiving a response. In other words: this is the 1-way latency between a gateway and client.
This latency is a function of NUMA and will measure how busy the client is. In particular, this latency will increase if a strategy blocks for considerable time.
inter_process
Note
This metrics should be correlated with
heartbeat
.Gateways automatically attach a timestamp when a message is being enqueued to the shared memory buffer. Clients can directly measure the inter-process buffer time from this metrics as the difference between current time and the sending time.
This is a measure of how late a message is. Latency will increase if the client can not keep up with the message rate of the broadcast queue.
round_trip
Order actions are triggered by internal or external events. The origin could be a message, a timer or a bridged event.
This is a measure of the time between the timestamp at origin and the time when the order action has been finally processed by the gateway.
Note
“Processed” could mean having sent a message to the exchange or some error condition was detected (like network issue or failed validation).
end_to_end_latency
Note
This metrics is currently only computed by the FIX bridge.
This is a measure of the time between the timestamp at origin and the time when a message is being forwarded to a bridged client.