1 minute read

Three posts on building reliable (people and technical) systems at scale, incident response processes and platform engineering this week that all feel like modern takes on classic systems operation.

StackHawk sponsors Devops Weekly

StackHawk and Snyk have partnered up to provide a complete set of application security testing tools for engineering and DevOps teams. Learn more:


A good post on reliability patterns, including assuming failure, measuring from the customer perspective, reducing blast radius, self-healing infrastructure and more.

A post on the emergence of developer platform engineering as a way of scaling development team productivity.

Lots of advice on building an incident management process, including running drills, defining severity levels, service ownership, communication planning and more.

A deep post on global distributed systems, looking at debugging Consul and also discussing which problems to solve and which to avoid with architectural decisions.

As you can probably guess from this newsletter, I like learning new things. This next post looks at things communities can do to make learning by lurking easier. A big plus-one on public discourse.

WIth the ubiquity of Kubernetes clusters lots of compute jobs have moved to running in Kubernetes clusters, including the venerable cronjob. This post explains how to configure cronjob resources and looks at monitoring and logging too.

An up-to-date guide on running Java applications in Docker containers.


Kaar, like tar but for Kubernetes. Package up all the manifests and container images into a single OCI archive.

Bubblewrap provides a container runtime tool aimed at providing unprivileged sandboxes, unlike most existing approaches, this is intended for running untrusted code.