DDD Persistence: Recorded Event-Driven Persistence
When you decide to implement your business logic by applying DDD, one of the things you’ll run into is ‘how do I save my changes?’ The internet is full of blogs and articles about the mythical DDD repository, but all they offer is an interface. How do you actually implement it?
The interface
First, let’s take a look at what the interface for the repository should look like. But, first, what should a repository actually do? Per Martin Fowler’s Patterns of Enterprise Application Architecture Catalog:
[The repository] mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.
Alright, so at the very least, we should have a method for accessing the entire collection. And it probably makes sense to be able to single out one item if you already know its unique identifier. Let’s not get into super-generic repositories1 and say that our domain models have a unique identifier that is truly unique: a GUID. Our domain is going to be a pretty simple one: placing and modifying orders.
public interface IOrderRepository
{
IEnumerable<Order> GetAll();
Order GetById(Guid orderId);
}
Right, that takes care of reading. We should of course also be able to create, update and delete orders. Wait, no, that sounds too much like CRUD, which we don’t like because we’re currently in our Ivory DDD Tower. We should be able to modify the collection of orders. Much better.
public interface IOrderRepository
{
IEnumerable<Order> GetAll();
Order GetById(Guid orderId);
void Add(Order order);
void Update(Order order);
void Remove(Order order);
}
We’ve nicely side-stepped the CRUD anti-pattern by using different verbs (mostly) for the repository methods. Great!
YesSQL
So, let’s go about implementing this interface. Fine, you’ll say. Just spin up a MongoDB instance, and then implement GetAll
by retrieving an entire collection, GetById
by doing .FindAsync(Builders<Order>.Eq(o => o.OrderId, orderId)
. Add
, Update
and Remove
basically map to InsertOneAsync
, UpdateOneAsync
, and DeleteOneAsync
, respectively.
Whoa there, not so fast. We’re already paying through the nose for this big-ass Oracle cluster that we have just sitting there, waiting to be used. I’d like you to store data in Oracle, if you please. And if you’ll turn to page 257 of the
BibleCoding Guidelines as set forth by our Enterprise Architect, you’ll notice that it is not allowed to store documents in a table. We can’t run our daily reports and business intelligence cubes on JSON! So, a table schema in proper 3NF, please.
Alright. Oracle it is.
The Order aggregate
At this point, we should probably take a look at our domain model. What is an order, really? Let’s ask our business analyst.
Well, this is just an MVP, so an order is really simple. It has some customer and shipping information (primarily the address) and some price stuff. Oh, and order lines. And you should be able to modify an order before you finally send it off to be fulfilled.
For the sake of the example, let’s not get into a discussion about what an order line is. An order line is simply a product number (a SKU), a quantity and a per-item price. So, we have something like this:
public class Order
{
public Guid OrderId { get; }
public Address ShippingAddress { get; }
public Customer Customer { get; }
public decimal TotalPriceIncludingVat { get; }
public decimal TotalPriceExcludingVat { get; }
public IReadOnlyCollection<OrderLine> OrderLines { get; }
public bool IsReadyForFulfillment { get; private set; }
}
public class OrderLine
{
public string Sku { get; }
public int Quantity { get; private set; }
public decimal PerItemPriceIncludingVat { get; }
}
Looks easy enough. What about the operations?
You should be able to create a new order, obviously, add and remove order lines, update an order line’s quantity, and finally send the order off for fulfillment.
So, something like this:
public class Order
{
// rest of class omitted for brevity
public static Order CreateNew(Address shippingAddress, Customer customer) { /* ... */ }
public void AddOrderLine(string sku, int quantity, decimal perItemPriceIncludingVat) { /* ... */ }
public void RemoveOrderLine(string sku) { /* ... */ }
public void ChangeOrderLineQuantity(string sku, int quantity) { /* ... */ }
public void Fulfill() { /* ... */ }
}
Now, the implementations of these operations are omitted for brevity, but you can imagine what they would be like. ChangeOrderLineQuantity
, for example, would look up the order line with the provided SKU, and modify its Quantity
property.
Persist all the things!
Now, how do we persist this to an Oracle database? Let’s first take a look at what the orchestrating code (the ‘use case’) should look like. I specifically want to look at the use case of adding an order line, because that’s where it starts to get interesting.
public void AddOrderLine(Guid orderId, string sku, int quantity)
{
var order = _orderRepository.GetById(orderId);
var perItemPrice = _productRepository.GetBySku(sku).PriceIncludingVat;
order.AddOrderLine(sku, quantity, perItemPrice);
_orderRepository.Update(order);
}
What about the database schema? Well, it’s pretty straight-forward; there’s an ORDER
table that contains the orders themselves, and an ORDERLINE
table that contains the order lines. ORDERLINE
has a foreign key that references back to the ORDER
table, to keep things nice and consistent.
create table "ORDER"
(
ORDERID raw(16) primary key not null,
SHIPPINGADDRESS varchar(1000) not null,
CUSTOMEREMAIL varchar(250) not null,
READY char not null,
TOTALPRICEINCVAT number not null
);
create table "ORDERLINE"
(
ORDERID raw(16) not null,
SKU varchar(50) not null,
QUANTITY int not null,
PRICEINCVAT number not null,
constraint ORDERLINE_PK primary key (ORDERID, SKU),
constraint ORDERLINE_ORDER_FK foreign key (ORDERID) references "ORDER" (ORDERID)
);
Implementing creating and deleting an order is pretty much self-explanatory, so let’s try and implement IOrderRepository.Update()
, which should be more of a challenge. Updating the order itself is fairly easy. We just do something like this:
public void Update(Order order)
{
_connection.Execute(
"update ORDER set " +
"SHIPPINGADDRESS = :SHIPPINGADDRESS, " +
"CUSTOMEREMAIL = :CUSTOMEREMAIL, " +
"READY = :READY, " +
"TOTALPRICEINCVAT = :TOTALPRICEINCVAT "
"where ORDERID = :ORDERID",
new
{
ShippingAddress = order.ShippingAddress.ToString(),
CustomerEmail = order.Customer.EmailAddress,
Ready = order.IsReadyForFulfillment ? 'Y' : 'N',
TotalPriceIncVat = order.TotalPriceIncludingVat,
OrderId = order.OrderId.ToByteArray(),
}
);
}
Done. On to the order lines. So... foreach (var line in
... in what? How do we know what the state of an order line is? We don’t know which order lines are new or existing, or which of the existing ones are modified or unmodified (or ‘clean’ or ‘dirty’, if you will).
We could just iterate over all of them, see if they already exist in the database, and either insert
or update
all of them. Yuck. That’s not very elegant at all, and it doesn’t scale well when we have an order with hundreds or thousands of lines, or when there is heavy load on the system.
Another solution would be to sneak some extra properties into the domain model, so that it keeps track of those things itself. So we add IsNew
and IsModified
to the order and order line model. How about deletion? We could introduce an IsDeleted
property on the order, but for order lines this presents a problem; if we use a property to indicate it was deleted, we’d have to be careful not to include those deleted lines in any business logic that involved the current order lines of an order, such as checking if there’s already an order line for a particular SKU. We could introduce a separate collection of order lines that have been deleted; then we can delete an order line from OrderLines
and add it to DeletedOrderLines
for the repository. You see how this quickly gets out of hand, and in addition to that, DeletedOrderLines
or IsModified
are not things that the business cares about. Let’s try and keep the domain model clean.
How about we have the repository keep track of all the entities and associated objects it returns? That’s what an ORM framework does. It means that you’ll have to deal with the increased memory usage and processing time it takes to keep and process this additional state. An ORM usually works by capturing the state of an object before it’s returned to the caller, and when saving comparing the differences. It’s really kind of a brute-force approach. Also, you’ll have to pick, configure, and wire up an ORM framework, which can be difficult to get right, even with something as ubiquitous as Entity Framework. Or you can write one yourself, which is incredibly difficult to get right.
Let’s try a different approach.
Persisting value types
One of the solutions we have is to modify the domain model a bit so that, instead of returning void
, operations return a value object describing the change. We then modify the repository interface to have a single method for each kind of change. Yes, this could work. It also has the benefit of making it very explicitly clear that a new operation is not yet implemented, because the repository interface won’t have a method for it yet. The operations on the domain model would look something like this:
public class Order
{
// rest of class omitted for brevity
public static Order CreateNew(Address shippingAddress, Customer customer) { /* ... */ }
public OrderLineAdded AddOrderLine(string sku, int quantity, decimal perItemPriceIncludingVat) { /* ... */ }
public OrderLineRemoved RemoveOrderLine(string sku) { /* ... */ }
public OrderLineQuantityChanged ChangeOrderLineQuantity(string sku, int quantity) { /* ... */ }
public OrderFulfillmentRequested Fulfill() { /* ... */ }
}
Note that CreateNew
still returns an Order
, because when creating a new order, there is no ambiguity; everything is new. Also note that the type names of the value objects returned are formulated like events: OrderLineAdded
, OrderLineRemoved
, et cetera. They all describe things that have already happened. You could also think of them as very explicit deltas; differences between the state of an order in two points in time.
Next, let’s modify the repository interface to make this work.
public interface IOrderRepository
{
IEnumerable<Order> GetAll();
Order GetById(Guid orderId);
void Add(Order order);
void Store(OrderLineAdded @event);
void Store(OrderLineRemoved @event);
void Store(OrderLineQuantityChanged @event);
void Store(OrderFulfillmentRequested @event);
void Remove(Order order);
}
The implementation almost writes itself. For brevity, I won’t spell it out here, but you should be able to imagine what it would look like.
There are some problems, however. The initial version of the interface is ideal for when you want to persist data in a NoSQL database, because you can simply serialize your object graph, write it to the database, and you’re done. The second version of the interface, with the value types, is ideal for when you want to persist data in an RDBMS, because for each event you run some SQL queries that get your data in the right state. This is pretty cumbersome to do when your storage mechanism is a NoSQL database. Choosing one interface pattern over the other pretty much locks your application into the associated storage mechanism.
Having to define a new method on the repository interface for each operation that your application supports exposes a flaw that is perpendicular to one of its benefits. The interface is a contract, and when you extend the interface by adding a new operation, you’ve broken that contract; all of its implementations now need to be modified, which violates the Open/Closed Principle.
Persisting events
So, how do we solve this conundrum? Driving persistence based on events seems like a good idea, but how do we avoid the Open/Closed Principle violation? Let’s use that angle as the basis for our solution. Let’s go back to a single ‘save’ method.
public interface IOrderRepository
{
IEnumerable<Order> GetAll();
Order GetById(Guid orderId);
void Store(IEnumerable<IEvent> events);
}
At least the interface is now much cleaner. Note that, as well as condensing all of the Store
methods into one method, the Add
and Remove
methods have also been removed. So how do we add a new order? That’s an event! How do we remove an order? Also an event! The Store
method accepts a collection of IEvent
, which can just be a marker interface to make it explicit what kind of objects we accept.
This design still has some problems, though. Because the interface accepts a collection of events, the use case (or service, or whatever you want to call it) has to collect a list of events as it’s performing operations on the order. That shouldn’t be the use case’s job. Also, what if we decide to use a NoSQL database and just want to store a serialized version of the order?
Let’s refine it once more.
public interface IOrderRepository
{
IEnumerable<Order> GetAll();
Order GetById(Guid orderId);
void Store(Order order);
}
That’s better. Now it truly does not matter what kind of persistence mechanism you want to use, and the use case can stay truly ignorant about this fact. But who’s collecting the events? It’s the aggregate root itself. Each operation returns void
again and, in addition to modifying the aggregate root’s state, it’s adding an event to a list of events that have not yet been persisted.
What do the internals look like? I’m going to focus on a single use case: changing an order line’s quantity. First, let’s look at the Order
and OrderLine
classes.
public class Order
{
public void ChangeOrderLineQuantity(string sku, int quantity)
{
var orderLine = _orderLines.First(l => l.Sku == sku);
orderLine.ChangeQuantity(quantity);
_eventsPendingPersistence.Add(
new OrderLineQuantityChanged(
orderId: OrderId,
sku: sku,
quantity: quantity
)
);
}
public IReadOnlyCollection<IEvent> DequeueAllEvents()
{
var events = _eventsPendingPersistence.ToList();
_eventsPendingPersistence.Clear();
return events;
}
public IReadOnlyCollection<OrderLine> OrderLines => _orderLines;
private readonly List<OrderLine> _orderLines = new List<OrderLine>();
private readonly List<IEvent> _eventsPendingPersistence = new List<IEvent>();
}
public class OrderLine
{
internal void ChangeQuantity(int quantity)
{
Quantity = quantity;
}
public int Quantity { get; private set; }
}
The DequeueAllEvents
method provides the repository access to the events that have not yet been persisted. It’s called ‘dequeue’ because it returns all the events in the ‘queue’, and then empties the queue. It’s implemented using a List<T>
rather than a Queue<T>
, because we don’t need the random access, one at a time operations that it exposes.
The use case code is trivially simple. Get the order, operate on it, store it.
public void ChangeOrderLineQuantity(Guid orderId, string sku, int quantity)
{
var order = _orderRepository.GetById(orderId);
order.ChangeOrderLineQuantity(sku, quantity);
_orderRepository.Store(order);
}
What about the repository code? It gets an order, uses DequeueAllEvents
to get all its events and then dynamically dispatches them.
class OrderRepository: IOrderRepository
{
public IEnumerable<Order> GetAll() { /* not implemented for brevity */ }
public Order GetById(Guid orderId) { /* not implemented for brevity */ }
public void Store(Order order)
{
var events = order.DequeueAllEvents();
DispatchAllEvents(events);
}
private void DispatchAllEvents(IReadOnlyCollection<IEvent> events)
{
foreach (var @event in events)
DispatchEvent(@event);
}
private void DispatchEvent(IEvent @event)
{
Handle((dynamic) @event);
}
private void Handle(OrderLineQuantityChanged @event)
{
_connection.Execute(
"update ORDERLINE set " +
"QUANTITY = :QUANTITY " +
"where ORDERID = :ORDERID and SKU = :SKU",
new
{
OrderId = @event.OrderId.ToByteArray(),
@event.Sku,
@event.Quantity,
}
);
}
}
Pretty straight-forward. The DispatchEvent
method does something peculiar: it casts the event to dynamic
. This is the most straight-forward approach to dispatching events, and it also happens to be the fastest one.
Sidebar: dynamic
performance
Let me briefly dig in to that statement. I ran a benchmark using BenchmarkDotNet, comparing various methods of dispatching events; using dynamic
, using reflection, using a lambda, and using pattern matching. The results: of course pattern matching using the switch
statement is the fastest option at around 6 ns per invocation on .NET Core, followed closely by... dynamic
at around 12 ns per invocation. Reflection, with a pre-warmed MethodInfo
cache is a lot slower at around 197 ns per invocation2. I expected reflection to be ‘slow’, but I was pleasantly surprised at how lightning-quick dynamic
is, especially on .NET Core. Mind you, we’re still talking about nanoseconds; 200 nanoseconds is one 5 millionth of a second.
For reference, my benchmark code can be found in this Gist. You can easily see that the DynamicDispatch
class is by far the smallest and easiest to read, and there is no performance-related reason not to do this.
Advantages & Disadvantages
So now that we’ve established the ‘event-driven persistence’ pattern, let’s look at some of its advantages and disadvantages.
Advantages
- It allows for easy implementation of your repository for different persistence mechanisms; document-based, relational database, or Event Sourcing are all very easy to implement.
- Even if you’re not going for Event Sourcing, but you still want to store, broadcast, or publish events in your application, you basically get this for free.
- Implementing persistence for a relational database is exceedingly easy, without having to rely on an ORM framework.
- Just one method for persistence. Even creation and deletion can be represented as events, after all.
- Compared to the ‘persist value objects from the domain model’ pattern, transaction management can remain inside the persistence layer, rather than having to be managed by the orchestrating code.
- Your domain model will remain clean of persistence-related artifacts.
Disadvantages
- Depending on the complexity of your domain model, you’re going to have to model a lot of events.
- Only pattern matching gives you static type safety. On the other hand, even with pattern matching, you’ll want to test whether a particular type of event is supported by your code, so you’ll have to write tests for all of the event types, anyway.
Sidebar: On modeling events
One important characteristic to keep in mind when modeling the events your domain model will raise is this: events are things that happened in the past, and since you cannot change the past, they should be immutable. I’m saying should, but when you implement persistence using Event Sourcing, or when you expose these events to the outside world, that should becomes a must.
What this means is that you must not design an event like this:
class OrderLineAddedEvent
{
public OrderLine NewOrderLine { get; }
}
But this looks immutable, you might say. Yes, but OrderLine
, being part of the domain model, is almost definitively mutable, which makes the entire event mutable. It means that by the time you handle or serialize this event, the state of the instance you put in there might have changed from when you created the event. This means you’ve lost some information, which might not be all that bad when all you’re doing is persisting the current state, but when you’re doing Event Sourcing or exposing the event to the outside world, it makes your events lose value.
Instead, design your events like this:
class OrderLineAddedEvent
{
public string Sku { get; }
public int Quantity { get; }
public decimal PerItemPriceIncludingVat { get; }
}
Also, think about how granular you want your events to be. Too coarse, like OrderChangedEvent
that you raise for every change to an order, and it will be difficult to extract what has changed from the event, which is the whole point of this exercise. Too fine-grained, like OrderShippingAddressHouseNumberChanged
, and you’ll get an explosion of events and corresponding Handle
methods and a lot of traffic to the database. You want to capture intent. When somebody changes the house number on the shipping address of an order, what they’re doing is either correcting the shipping address or choosing a different shipping address. You might not be able to capture the distinction from your UI and use cases, but at the very least it becomes apparent that you’ll only change the address as a whole.
Conclusion
Sorry for the very long post, but the subject matter is unfortunately not very simple. This pattern, let’s name it ‘recorded event-driven persistence’, is simple and elegant, and it allows you to persist complex domain models without your persistence code becoming total spaghetti.
If you think it can be improved, let me know. Also, if this is already a named pattern, or if you can think of a better name than ‘recorded event-driven persistence’, let me know!