3 minute read

Shameless plug. I’m giving a talk at the upcoming PuppetConf event in San Diego at the end of the month, all about Puppet testing practices. If you’re a Puppet user then I’d love for you to take 5 minutes to complete short survey. https://goo.gl/forms/wvKFi62SHDvcvO2o2. And please share with your Puppet using friends and colleagues.


Downtime costs a lot more than you think. Learn why - and how to make the case for Real-time Incident Management.


Gene Kim shares top lessons learned from The DevOps Handbook

Join The DevOps Handbook co-author Gene Kim as he shares first-hand insights and lessons learned during the five years required to create this essential resource, including DevOps transformation case studies around continuous integration and delivery. Register today!


An excellent talk, along with a full transcript, about incident response. Talk of internal communication, tools, building trust with customers and lots more.

Databases are terrible things and managing data incorrectly can lead to the worst types of outages. With a push for more generalist software folks I liked this post about being an accidental database admin.

Lots of teams adopt code reviews as a powerful way of improving software quality. This post makes a good argument for introducing code reviews, with a good discussion of various anti-patterns.

A series of blog posts with a comprehensive look at Terraform, acting as a solid introduction, tips for management, reusability and more posts incoming.

Lots of Kubernetes users automatically install a container networking solution as well, likely an overlay or other SDN system. But you can run Kubernetes with more traditional networking setups too. This post explains how and why.

A dive into some of the design choices for the Docker SwarmKit orchestration library, covering replication, presence, push vs pull and more.

A continuation of the series from last issue on postmortems, this post looking at identifying the root cause of an incident.

In a system composed of many services working out what went wrong with a single request can be a challenge. This is where request tracing comes in and the following slide deck introduces the concept and demonstrates the use of Zipkin.

A good explanation of some of the operational challenges that microservices introduce, as well as a look at linkerd and namerd for solving some of these problems.

A good presentation on what it means to be cloud native. Focuses on the journey to embracing automation, and on the concepts in the 12 factor app model.

A look at setting up Prometheus to monitor a Docker cluster. Includes lots of code samples for alerts and pictures of the resulting dashboards so you can get a feel for the full monitoring syste,


KubeCon is coming up in Seattle on the 8th and 9th of November. The schedule is packed with low level technical talks as well as lots of production case studies. Something for Kubernetes experts and those just getting started.

Devopsdays Kansas City is happening on October 20th and 21st. The programme covers real world case studies as well as talk of containers and a super interesting sounding talk on human factors and devops. Tickets are available now.

Devopsdays India is coming up on the 4th and 5th of November in Bangalore. Devopsdays really is global at this point so if you’re in the region then it should be a great event.


For lots of workflows setting the GitHub master branch to protected is a sensible choice, but it’s relatively recent addition and you may have lots of repos. Enter Petter, a simple tool to change the setting for all repos in a given organisation.

Zentral is a nice looking tool for host inventory and acting on inventory data, built atop osquery, Santa, Sal, ElasticSearch and with integration with various other tools.

Cillium is another container networking approach, but one which uses experimental kernel features like BFP and XDP. It aims to provide fast in-kernel networking along with string security policy enforcement.

Downtime costs a lot more than you think. Learn why - and how to make the case for Real-time Incident Management.