1 minute read

Lots of great operations content this week, with an in-depth look at an public incident report, discussion of game days, operational excellence, shadow organisation charts and more.

From our sponsor, VictorOps

Learn about some more subtle, unknown use cases for using Splunk + VictorOps to drive a more analytical, proactive approach to incident response:
https://go.victorops.com/devopsweekly-splunk-for-analytical-incident-response

News

An in-depth look at the recent report into the recent UK bank TSB IT migration failure. Lots of details and some great anecdotes for any enterprise IT or project management folks to learn from.
https://medium.com/@JonHall_/lessons-from-the-tsb-failure-a-perfect-storm-of-waterfall-failures-4f4d2e789b35

This post describes the role game days, and practice in general, play in improving incident management processes.
https://uptime.com/blog/got-game-secrets-of-great-incident-management

Devops conversations often turn to how organisational structure impacts the work we do. This post cleverly looks at organisational structure not through the org chart, but through how people actually work and influence others. When we say we ship the org chart, we need to ask which one.
https://carta.com/blog/the-shadow-organizational-chart/

A nice long post on building a culture of operational excellence. The importance of measurement, training and education and how tools and culture support each other.
https://medium.com/@adhorn/towards-operational-excellence-c9fe298e27e7

With the ever-present need to manage lots of YAML files, various tools have been emerging to help. This post looks at some of the problems with text-based templating, and explores yq, kustomize and using native Javascript bindings for Kubernetes.
https://learnk8s.io/templating-yaml-with-code

Lots of details on how logging in Kubernetes works, from the cluster components to the applications you’re running on top.
https://sematext.com/guides/kubernetes-logging/

An example of using Lambda to bridge two other AWS services, in this case AWS Kinesis Firehose and AWS ElasticSearch.
https://saurabh-hirani.github.io/writing/2020/02/09/aws-firehose-throttling-with-lambda

Another post on alternatives to authoring Kubernetes configuration in YAML. This presentation looks at using Kotlin and the Kotlin Kubernetes DSL for authoring configuration.
https://codetalks.tv/talk/kotlinconf-2019-unlock-power-of-kotlin-dsl-for-kubernetes-by-fedor-korotkov-yaw0m9kpa8q

Tools

Gops is a handy tool for listing and diagnosing Go processes running on a machine. LIst the process, which version of Go was used to compile the binary, network connections and more.
https://github.com/google/gops

Learn about some more subtle, unknown use cases for using Splunk + VictorOps to drive a more analytical, proactive approach to incident response:
https://go.victorops.com/devopsweekly-splunk-for-analytical-incident-response

Updated: