DEVOPS WEEKLY ISSUE #587 - 27th March 2022
Several in-depth posts this week on crunch operations topics, including metrics for incident response, SLOs and SLIs and the idea that you build it, you run it.
StackHawk sponsors Devops Weekly
Leading teams are automating application security testing in their CI/CD pipelines. See how you can add automated security testing to your pipeline quickly and easily to ship better quality code.
https://sthwk.com/appsec-to-your-pipeline
News
A post on using mean time to
[https://www.infoq.com/articles/mtt-metrics-incident-response/](https://www.infoq.com/articles/mtt-metrics-incident-response/)
A comprehensive post on setting SLOs and SLIs for complex distributed systems.
https://newrelic.com/blog/best-practices/best-practices-for-setting-slos-and-slis-for-modern-complex-systems
You build it, you run it. A familiar phrase, but the following playbook explores what that means and how to implement it to improve service operations.
https://www.equalexperts.com/wp-content/uploads/2022/03/YBIYRI_Playbook-4.pdf
There is a lot of interest in software bill of materials (SBOMs) at the moment, but a lot of that has been focused on creating SBOMs. This post looks an another aspect, storage and distribution.
https://www.rkvst.com/sbom-distribution-manifesto/
Down the rabbit hole of one team debugging a AWS EC2 networking issue related to sending large packets.
https://medium.com/in-the-hudl/operation-jumbo-drop-how-sending-large-packets-broke-our-aws-network-ff5041fc7a09
Databases and applications are often still considered quite separately, separate specialists and sometimes teams. This post speculates about what vertical integration in that space might mean.
https://redmonk.com/sogrady/2022/03/21/vertical-integration/
A detailed look at detecting silent errors in large scale systems, combining opportunistic and ripple testing to detect hard-to-find issues.
https://engineering.fb.com/2022/03/17/production-engineering/silent-errors/
Tools
kube-opex-analytics is a cost optimization tool for Kubernetes. Collect data, surface to Prometheus/Grafana, focused on capacity planning and cost.
https://github.com/rchakode/kube-opex-analytics