2 minute read

A bunch of experience reports this week, including around Fargate, Kubernetes, embracing production failure and envoy. Also a great series of posts on information dashboard design for anyone building monitoring infrastructure.

From our sponsor, VictorOps

[Last Chance] Death to Downtime: How to Quantify and Mitigate the True Costs of Downtime. VictorOps and Catchpoint are teaming up for a live webinar on 5 monitoring and incident response best practices for preventing outages.
http://try.victorops.com/devopsweekly/catchpoint-victorops-webinar

News

A series of four posts looking at information dashboard design, in particular focusing on structure and layout, presentation and accessibility, what charts to use and capturing context.
http://onemogin.com/observability/dashboards/practitioners-guide-to-system-dashboard-design.html
http://onemogin.com/observability/dashboards/practitioners-guide-to-system-dashboard-design-p2.html
http://onemogin.com/observability/dashboards/practitioners-guide-to-system-dashboard-design-p3.html
http://onemogin.com/observability/dashboards/practitioners-guide-to-system-dashboard-design-p4.html

A detailed blog post on what it really takes to configure ECS and Fargate for running a container. I’ve spoken before about severless not being infrastructureless, and this post hits similar points.
http://leebriggs.co.uk/blog/2019/04/13/the-fargate-illusion.html

A post on the importance of being able to fail when pushing for continuous deployment, especially for new engineers.
https://shubheksha.com/posts/2019/04/re-framing-how-we-think-about-production-incidents/

The book Accelerate identifies a number of activities and metrics that are correlated with improved software delivery. This post looks at measuring those across an organisation and then using them to drive improvements.
https://medium.com/onzo-tech/four-magic-numbers-for-measuring-software-delivery-55b1dd01eca

This post looks at the distinction between the software we use to drive our testing and deployment, and the pipelines we build on top.
https://www.networkcomputing.com/data-centers/diy-devops-build-buy-or-yes

A look at swarming as part of the solution to incident management, with some good points about avoiding looking for a single solution to different types of problems.
https://medium.com/@kaimarkaru/incident-management-triage-and-disorder-99b695a29e5b

A look at rolling out the envoy proxy as part of moving towards a microservices architecture communicating using gRPC.
https://medium.com/geckoboard-under-the-hood/we-rolled-out-envoy-at-geckoboard-13c45b4eaddd

A case study of migrating a large existing application stack to Kubernetes, including some good numbers.
https://medium.com/@tinder.engineering/tinders-move-to-kubernetes-cda2a6372f44

Events

The O’Reilly Velocity Conference returns to San Jose, June 10–13. Velocity is the best place on the planet for web ops and systems engineering professionals to get expert insight on building and maintaining cloud native systems. With 4 days of practical content on DevOps, systems engineering, performance, and more, there’s something for everyone. Passes start at $796 when you use the code DEVW20 (applies to Gold, Silver, and Bronze passes). Limited offer expires May 4. Register today!
https://oreil.ly/2Uf3u63

Your path to production ready Kubernetes This workshop is for DevOps engineers, Kubernetes cluster operators, and admins in the UK! Join our hands-on, in-person workshop at CodeNode on May 8. We’ll walk you through developing, building, and operating Kubernetes cluster the GitOps way.
https://www.eventbrite.com/e/workshop-your-path-to-production-ready-kubernetes-tickets-59200887448

Tools

Chaosblade is a powerful toolkit for implementing chaos engineering, with support for operating system errors, Java, Docker and Kubernetes. It can for example simulate filling disks, killing processes, network delay, etc.
https://github.com/chaosblade-io/chaosblade

[Last Chance] Death to Downtime: How to Quantify and Mitigate the True Costs of Downtime. VictorOps and Catchpoint are teaming up for a live webinar on 5 monitoring and incident response best practices for preventing outages.
http://try.victorops.com/devopsweekly/catchpoint-victorops-webinar

Updated: