3 minute read

Some good posts on operations fundamentals this week, including security policy and the relationship between delivery speed and stability, as well as a case study of a migration. Also some new tools in the monitoring, deployment and service mesh spaces.

From our sponsor, Splunk

Do Google Docs and Jupyter Notebooks sound like a strange duo? Learn how this combination helps SREs and developers with incident management and root-cause analysis:


The CNAB specification, which is a standard packaging format for multi-component distributed applications, just hit 1.0. These two posts explain some of the background and why it’s important.

An interesting article exploring the idea of policy debt. In particular looking at security policies and a generational divide between architectures that favour encryption and those that obsess over firewalls. The post also observes that policy encodes culture which is an interesting lens through which to view the problem.

The relationship between delivery speed and stability is much more interesting that simply being in opposition. High performing operations teams achieve both. This post explores how, and why that’s important.

An interesting story about migrating from one provider to another, based on available features. Data wasn’t really an issue here, but some good points about the potential in open APIs for portability.

The Cluster API project aims to bring declarative, Kubernetes-style APIs to cluster creation, configuration and management. It’s making good progress it seems and the documentation is a useful read.

A call for better build tools focused on the problems we see with microservice environments vs the larger monolithic binaries many build tools are best suited for. Lots of interesting things happening in this space.

Keiko, introduced in this presentation, is an interesting set of Kubernetes tools covering forensics, reliability, management at scale and more.

A good writeup of using Open Policy Agent to enforce policy in a CI environment, in this case policy around dependencies in a Node.js project.


KubeCon and CloudNativeCon North America are coming up in San Diego from the 18th until the 21st of November. The schedule has just been announced and it’s packed with talks on the CNCF projects like Kubernetes, Envoy, Helm as well as case studies, community meetings and more. The schedule is extended with separate focused events on the first day, each one on a specific topic including security, CI/CD and Observability. Hope to see a few readers in San Diego.

The O’Reilly Velocity Conference premiers in Berlin, 4–7 November. Velocity is the best place on the planet for web ops and systems engineering professionals to get expert insight on building and maintaining cloud native systems. With 4 days of practical content on DevOps, systems engineering, performance, and more, there’s something for everyone. Passes start at €596 when you use the code DEVW20 (applies to Gold, Silver, and Bronze passes). Limited offer expires 20 September. Register today!


trivago is hiring Site Reliability Engineers (Cloud Engineers)! Help us to build the best Hotel Search of the world, powered by multiple Kubernetes Clusters, operating in several regions on Google Cloud Platform by using open source technology like Kafka, Istio (incl. Envoy), Redis, gRPC, Go, Java, Kotlin and more on a worldwide scale with a talented and senior team!


Vector is a high-performance observability data router. It aims to make collecting, transforming, and sending logs, metrics, and events easy.

Maesh is a new entrant in the Service Mesh space. It aims for ease of installation and is built on top of the Traefik proxy. It provides the typical load balancing, canary deployments, retries, failover, circuit breakers, rate limiting and more.

Do Google Docs and Jupyter Notebooks sound like a strange duo? Learn how this combination helps SREs and developers with incident management and root-cause analysis: