Saturday, 31 July 2021

A view into state [backdated draft]

Note: This post is backdated to the date of the last draft (31 Jul 2021) as I changed my job and role and didn't want to bias / inform myself by that. It's an unfinished fragment of my thinking at that point in time that I just cleaned up a little and added references where necessary, but it's still rough and incomplete.

In principle, any stream processor could be used for materialized views
Kleppmann, 2017, p. 467

Martin Kleppmann recently gave a great keynote in which he summarized his ideas under a new framework of unordered vs. ordered events and mutable vs immutable state. In the accompanying paper he also gave an outlook on a new wave of tools that "abstract away" these consistency decisions in event-driven systems, in particular "Materialized Views" e.g. Materialize - a great overview on the different patterns is Jamie's article.

I've been a fan of this idea not only since the famous "database turned inside out" (1) quote but actually when I first worked with Spanner and saw firsthand how it turned into an SQL system which required a surprising amount of abstraction of internal (consistency) concepts. Most surprising was that fundamentally Spanner is a messaging system (in other words it is truly distributed and not built on replication paradigms), and could even be used as a queue by observing changes "inside out". In most databases those changes are row-based and often used (with CDC) as basis for event sourcing architectures, sadly mixing physical considerations with domain events. But I am interested in one level lower, of the changes within a domain event entity, and lineage, the reason why. This interest comes from Dataflow / Beam and its consistent view of temporal locality, something I've been keen on since agent-based simulation using Erlang's actor inbox (2). Beam doesn't use "wall clock" time but time in the sense of event-timeflow or an ordered sequence of events or the spacetime / promise-theory goal "to explain a process ... we need to be able to ... tell a story about its behaviours". I'm interested in exposing this state and making it legible and explainable.


Wouldn't it be great to have a database supporting per row state (consistency) metadata instead of state hidden in a stream processing system?