The design does not dictate how you should distribute your strategies.
You have these choices
Allocate a single strategy to a single process
Allocate many strategies to a single process
There are different concerns as outlined in the following.
Since the design is single threaded: be mindful that each strategy can update fast enough so that the process doesn’t fall behind the streams of data arriving from the gateways. There metrics you can (and should) use to monitor any queuing behaviour.
You should choose a single strategy per process if you only have a small number of strategies and you want to achieve the absolute lowest latency. Each process should then be configured for busy polling and pin to a distinct CPU core.
Having a single process for many strategies may be perfectly fine if you don’t need very low latency. The easiest design is when the strategies are completely orthogonal on symbols.
CPU cores achieve cache coherence using something like the MOSI protocol. The shared memory queue between each gateway and connected clients use this to great effect: distribution happens automatically at the hardware level. However, the hardware relies on a bus to chat about the state of each cache line. This bus has finite capacity and we expect this to degrade with number of connected clients.
We only recommend a single strategy per process when the number of strategies are fairly small. There’s no easy answer to “what is fairly small?”. It depends on choice of CPU and possibly other factors.
Another concern is around cache thrashing. You are less likely to keep data in the L1/L2 caches when you run many strategies in a single process.
The gateways offer the following services
All of these are managed per connected “user” (= process for the purposes of this discussion). In other words, the gateways do not offer more granular protection if you decide to bundle many strategies in a single process.
The main concern is around routing order acks/updates and fills to the relevant strategy.
This is not a challenge if you have many strategies with no overlap on the traded symbols (completely orthogonal).
However, you must route the events to the right strategy if there is overlap.
You can sometimes use the
routing_id field to help you.
This does however rely on exchange support: the field is appended to a unique
client order ID generated by the gateway and must adhere to the rules of the
exchange (max length of client order ID and very often using a specific
Some exchanges, such as Coinbase PRO, only allows UUID’s for client order ID’s
and then the
routing_id is not supported.)
You have the option to auto-cancel working orders per account. This is a policy you can set when configuring the client (process).
Again, this is something that may or may not affect individual strategies when you bundle many strategies together.
In fact, it may even affect other clients (processes). This could manifest itself if you choose to auto-cancel by user instead of by account – sometimes this generates many requests to the exchange which will in turn affect the rate limiters for all connected users.
You probably want to split strategies on different processes if there is some probability that you may have to restart a single strategy during a trading session.
You would have to restart all strategies if you bundled many strategies in a single process and one strategy turned out to be misconfigured.
Sometimes you need processing which isn’t fast enough to keep up with the stream
of data arriving from the gateways.
This will often be some heavier model calibration.
We suggest that you separate this calculation on a separate process and make use
of the timer to initiate the computation step.
Between timer updates you will receive the events and you simply use those to
update some internal storage for current state or some history of what happened.
You can use the
CustomMetrics to push the calibrated model to other
consumers once the heavy computation completes.
An alternative design is to use a secondary thread to compute anything heavy.
This design relies on the
CustomMessage to share information to the
primary thread (the poll loop).
The threaded design is more complex because it relies on you deciding on some
codec (JSON, for example) to share data between the threads.
You must also decide on some method for copying data from the primary data
(could be a queue, a global object, etc.).
You might then be tempted to start using a mutex and then you do not have a
low latency design anymore.
We strongly recommend using a separate process and
then you automatically get the benefits of lock-free queues and a very fast
This is more of a warning to not enable high log verbosity without thinking about the physical constraints.
You may find that having many processes isn’t great when you also need higher log verbosity. This is because you often use the same service (filesystem, journald, …) which then becomes a contented resource.
Having fewer processes makes the logging resource less contented and you’re leveraging buffering to a higher degree.