1 minute read

Several incident management posts and tools this week, long with discussion of platform engineering teams, a good sociotechnical systems reading list and technical posts on Kubernetes and Elasticsearch operations.

StackHawk sponsors Devops Weekly

Teams that utilize AWS Cloud can now purchase StackHawk directly through their AWS account! Leverage your pre-existing AWS Cloud budget to cover application security costs, consolidate billing, and speed up the path to procurement. Learn more here:
https://sthwk.com/3JGSIgs

News

A nice high level introduction to incident management, centred around some real world improvements within one organisation.
https://medium.com/govtech-edu/how-we-keep-our-government-apps-running-with-high-reliability-a-peek-at-our-incident-management-fe1386d0fa43

Another good incident management post. This one a video and transcript of a talk on the cost of coordination during incidents.
https://www.infoq.com/presentations/incident-command-system/

Audio and transcript of an interesting discussion about the responsibilities of a platform team, and emerging patterns and anti-patterns.
https://blog.container-solutions.com/paula-kennedy-on-platform-team-responsibilities-patterns-and-anti-patterns

In discussions about software supply chain security you’ll hear folks talking about attestations. This post explains what these are, why they are important and demonstrates some tools for working with them.
https://docs.chainloop.dev/blog/software-supply-chain-attestation-easy-way

A look at rightsizing Kubernetes workloads, including details of how pod limits work, the Kubernetes scheduler and the vertical pod autoscaler.
https://www.datadoghq.com/blog/rightsize-kubernetes-workloads/

A collected set of reading material on Open Systems Theory and Sociotechnical Systems design. A good set of papers and books if you want to go deep on the subject.
https://www.linkedin.com/pulse/core-ost-sts-sources-trond-hjorteland/

Database tools often store data on disk, and when things go wrong how that works might be important. This post looks at how to debug disk-related Elasticsearch issues.
https://sematext.com/blog/elastic-dev-command-to-know-about-disk/

Events

Monitorama is coming up on June 26th to 28th, in Portland, Oregon. The schedule is available now, with a great range of talks on everything from the maths underpinning monitoring, scaling observability while being cognizant of cost, managing stress in operations roles and a host of other great talks.
https://monitorama.com/2023/pdx.html

Tools

Autometrics is a set of open source metrics libraries, making use of OpenTelemetry and Prometheus easier. Currently supporting Rust, TypeScript, Go and Python, with more languages coming.
https://github.com/autometrics-dev

Untitled Goose Tool is a tool designed for threat assessment and incident response for Azure Active Directory (AzureAD), Azure, and Microsoft 365 environments.
https://github.com/cisagov/untitledgoosetool

Updated: