Distributed systems, incident management and reliability culture, OpenTelemetry, databases and more. A good mix this week of super technical posts and more people and process focused content.

How do you build a culture of reliability? This post looks at the introduction of a single metric to drive conversations, called a Service Delivery Index.

How best to organise teams for incident management? This post looks at centralised vs distributed structures and the pros and cons of each.

A nice walkthrough of instrumenting an application with OpenTelemetry. The example is in Go, but the introduction does a good example of explaining the fundamentals.

PostgreSQL can be used for a wide range of sometimes surprising database needs. This post explains how to store graph data in Postres.

How do you make the business case for investing in a new tool? Given the current financial situation in many companies, this is likely a more rigorous process today than before.

As your application grows, invariably you run into data management issues. This long post is a good primer on why, starting with ACID and describing the challenges with distributed transactions and multiple datastores.


Travelgrunt is an interesting tool for navigating a Terraform/Terragrunt monorepo. It’s a replacement for cd, providing shell aliases to make moving around much quicker and more intuitive.

Protobom is a new project providing a protobuf description of a SBOM, aiming to facilitate easier conversation between existing SBOM formats.