2 minute read

A busy week at Configuration Management Camp (still one of my favourite events) and a busy week for interesting posts as well, covering incident management and on-call, as well as the pros of fast release cycles, coverage of Devopsdays NYC and more.


Asking five (or more) whys is outdated. So is trying to find a Root Cause Analysis. Take a look at the case against RCA


An excellent post on the state of on-call, digging into why it’s important for developers running services, and how to make it manageable.

A great set of notes from the recent Devopsdays New York, covering talks on failure, debugging systems, the history of devops, building teams, Kubernetes and more.

One of the talks from Configuration Management Camp last week, looking at how to effectively test Ansible modules and playbooks using InSpec and Test Kitchen.

A series of posts on incident management in one team, looking at definitions and the integration of chatops into the process.

A detailed look at the performance impact of the Linux kernel page table isolation (KPTI) patches that workaround the Meltdown bug.

An interesting observation on software developers and ivory tower architects. I think this applies to infrastructure and operations design too.

A post on the Kubernetes release cycle, which also covers some of the details of how the special interest group model works and the reason why the fast pace of change is a good thing.

Another series of posts, this one on getting started with Puppet on Windows. Covers the basics as well as facter, hiera and managing modules with r10k, all from the perspective of a Windows administrator.

A useful slidedeck on best practices for securing Kubernetes, with a look at basic threat vectors and how to mitigate the risk.

CNCF - Cloud Native Computing Foundation

Free Webinar - Deployment Strategies on Kubernetes February 13, Online

Take a practical look at the different strategies to deploy an application to Kubernetes. We list the pros and cons of each strategy and define which one to adopt depending on real world examples and use cases.


Lots of problems occur between development and production due to differences in data. Dotmesh aims to solve those problems by allowing you to capture, manage and share the state of your whole application using a git-like CLI tool and a central hub.

The YAML language is actually huge, with features most people don’t use or know about, some of which result in interesting security problems. SafeYaml is an interesting idea, a small subset of YAML enforced by a simple linting tool.

Asking five (or more) whys is outdated. So is trying to find a Root Cause Analysis. Take a look at the case against RCA