DEVOPS WEEKLY ISSUE #518 - 29nd November 2020

2 minute read

Incident reports, books covering devops topics, KubeCon reports and several interesting new tools this week.

Env0 sponsors Devops Weekly

env0 lets your entire team manage their own cloud environments, governed by your policies and templates. Now manage your users even easier with Teams and RBAC.
http://env0.com/l/devops-teams-launch

News

A look at a number of different security incidents involving AWS customers, inferring patterns and making some recommendations.
https://speakerdeck.com/ramimac/learning-from-aws-customer-security-incidents

An up-to-date list of books to read on the various facets of devops. Most interesting is the choose-your-own-adventure tool to help with deciding what you should read next.
https://octopus.com/blog/devops-reading-list

Two good, detailed, writeups from the recent KubeCon virtual event. Key takeaways and reviews of a selection of the sessions on Kubernetes and related technologies.
https://blog.getambassador.io/kubecon-na-2020-key-takeaways-platforms-safety-and-end-users-cb6df12082e6
https://firehydrant.io/blog/kubecon-north-america-2020-wrapup/

A writeup of the recent AWS US-East-1 outage. Reading incident reports is always interesting, but few are at quite this scale.
https://aws.amazon.com/message/11201/

How do we know that databases really behave as advertised at scale? The Jepsen testing tool is helping here, and the new checker described in a recent paper helps identify even more issues.
https://blog.acolyer.org/2020/11/23/elle/

A KubeCon session on disaster recovery. Discussion of outbreaks, being on-call, emergency response and eventual prevention, with the interesting twist being a comparison of site reliability engineering and the ongoing pandemic healthcare response.
https://thenewstack.io/kubecon-lessons-in-disaster-recovery-from-covid-19-and-site-reliability-engineering

Sponsored

Fire Drills: A Guide to Preparing for Your Next Incident shows you how to design and run a fire drill for your SRE team. In this Whitepaper from Elieser Pereira - Customer Reliability Engineer at Container Solutions, you will see the roles of the game master, the on-call engineer, a sample scenario and what questions to expect when all seems bleak during an outage. Download the free PDF.
https://bit.ly/3mgyY6k

Jobs

I’m on the lookout for an experienced product leader for a Director of Product Management role at Snyk. The job would involve owning the Snyk Container product, so a strong focus on Kubernetes and container security and on helping developers build secure applications. I’m mainly looking for candidates in the UK or Israel in order to be close to the engineering teams. More details on the Snyk jobs site.
https://boards.greenhouse.io/snyk/jobs/4948658002

Events

The ThoughtWorks Technology Radar has been around for 10 years and covers emerging techniques, tools, platforms, languages and frameworks. Next week, on the 7th of December, join 2 of the people involved in putting together this year’s edition to hear about what’s new.
https://www.thoughtworks.com/radar/technology-radar-webinar-vol-23-uk

Tools

Lots of interest in user interfaces for Kubernetes at the moment, the latest being Headlamp. Interestingly you can run the headlamp dashboard locally or in-cluster. It also features a plugin system for extending its core functionality.
https://kinvolk.io/blog/2020/11/shining-a-light-on-the-kubernetes-user-experience-with-headlamp/
https://github.com/kinvolk/headlamp

cloudquery is a tool that queries the state of a public cloud infrastructure and saves a snapshot of that information in a SQL database for offline querying.
https://github.com/cloudquery/cloudquery
https://docs.cloudquery.io/

Setting up dashboards for every service can end up being a manual and repetitive process. Legend builds and publishes Grafana dashboards for your services with prefilled metrics and alerts, integrating with cloudwatch, prometheus, influxdb and more.


https://github.com/grofers/legend

K6 is an interesting new load testing tool, written in Go but configured using a powerful Javascript DSL.
https://github.com/loadimpact/k6

Updated: