2 minute read

Interesting posts this week on what causes service outages, visualising cloud infrastructure, operational concerns with service mesh, how to think about serverless and new tools for testing configuration.

From our sponsor, VictorOps

Being on-call sucks. But, DevOps teams are finding ways to collaborate more and improve workflow visibility – leading to more efficient incident detection, response and remediation. Learn exactly how you can start making on-call suck less:
http://try.victorops.com/devopsweekly/making-on-call-suck-less-checklist

News

A reminder that monoliths vs microservices is mainly a discussion of team organisational dynamics. This post looks specifically at cognitive load as a way to right-size service.
https://techbeacon.com/app-dev-testing/forget-monoliths-vs-microservices-cognitive-load-what-matters

Understanding a large cloud infrastructure is an interesting problem. The availability of APIs doesn’t immediately make things accessible in a useful way. This post shows what’s possible if you import that data into a graph database.
https://labs.spotify.com/2019/06/04/painting-a-picture-of-your-infrastructure-in-minutes/

An interesting analysis of a recent paper exploring what bugs actually cause production incidents in a large service. Error handling issues, timing bugs, data formats. A good list to consider for your services.
https://blog.acolyer.org/2019/06/21/what-bugs-cause-cloud-production-incidents/

Service Mesh technologies are picking up lots of interest at the moment, for good reasons. But there are operational complexities and downsides to consider and this post provides a useful summary of some of them.
https://glasnostic.com/blog/service-mesh-istio-limits-and-benefits-part-2

AWS provides a number of building-block services for CI/CD. This repository contains a fully working example of using CodeDeploy and CodeBuild and deploys an application from GitHub to EC2.
https://github.com/dvassallo/github-to-ec2-pipeline

An attempt to describe Serverless in terms of doctrine, with a range of suggestions for high-level principles to implement to gain the purported benefits.
https://medium.com/@PaulDJohnston/serverless-is-a-doctrine-not-a-technology-4193ccb66cfc

A detailed post looking at various sandboxing approaches for containers, from gvisor and IBM Nabla to Firecracker, Kata and unikernels.
https://unit42.paloaltonetworks.com/making-containers-more-isolated-an-overview-of-sandboxed-container-technologies/

Serverless is inherently interesting from a cost perspective. This post features some good gotchas and tips for avoiding being surprised by your bill as your usage grows.
https://epsagon.com/blog/dont-be-surprised-by-your-serverless-bill/

If you’re operating a Kubernetes cluster then knowing how to troubleshoot common issues is useful, and this post summarises hunting down issues with slow scheduling, high CPU usage, failed deployments and more.
https://medium.com/avitotech/kubernetes-issues-and-solutions-2baffe25f40b

Becoming a power user of the command line often means becoming familiar with various tools for parsing output and extracting some specific bit of data. This post is a nice introduction to tools like cut, head, tr, sed and awk.
https://mh9.codes/posts/slice-n-dice/

Tools

Conftest is a new tool I’ve been hacking on for writing tests against configuration files, using the Rego language from Open Policy Agent. It’s useful when authoring config or in a CI pipeline, and the repository has examples with Kubernetes, Terraform and Serverless configs.
https://github.com/instrumenta/conftest
https://garethr.dev/2019/06/introducing-conftest/

Yggdrasil is an Envoy control plane that configures listeners and clusters based off Kubernetes ingresses from multiple clusters. This allows you to have an envoy cluster acting as a mutli-cluster loadbalancer for Kubernetes.
https://github.com/uswitch/yggdrasil

Being on-call sucks. But, DevOps teams are finding ways to collaborate more and improve workflow visibility – leading to more efficient incident detection, response and remediation. Learn exactly how you can start making on-call suck less:
http://try.victorops.com/devopsweekly/making-on-call-suck-less-checklist

Updated: