5. Localize State
Take ownership of data by co-locating state and processing
In data-intensive applications and use-cases, it is often beneficial to co-locate state and processing, maintaining great locality of reference while providing a single source of truth. Co-location allows for low-latency and high-throughput data processing, and more evenly distributed workloads.
One way of achieving co-location is to move the processing to the state. This can be effectively achieved by using cluster sharding (e.g. sharding on entity key) of in-memory data where the business logic is executed in-process on each shard, avoiding read and write contention. Ideally, the in-memory data should represent the single source of truth by mapping to the underlying storage in a strongly consistent fashion (e.g. using patterns like Event Sourcing and Memory Image).
Co-location is different from, and complementary to, caching , where you maintain read-only copies of the most frequently used data close to its processing context. Caching is extremely useful in some situations (in particular when the use-case is read-heavy). But it adds complexity around staying in sync with its master data, which makes it hard to maintain the desired level of consistency, and therefore cached data cannot be used as the single source of truth.
Another way to get co-location is to move the state to the processing. This can be achieved by replicating the data to all nodes where the business logic might run while leveraging techniques that ensure eventually consistent convergence of the data (e.g. using CRDTs with gossip protocols ). These techniques have the additional advantage of ensuring high degrees of availability without the need for additional storage infrastructure and can be used to maintain data consistency across all levels of the stack; across components, nodes, data-centers, and clients where strong consistency is not required.