1 minute read

The rise of very public, in-depth, high-profile incident reports in the last few years is definitely of benefit to the art of systems administration. Atlassian’s post is a great example, covering the recent multi-week outage. Plus posts on organisation design, least privilege and some interesting tools for testing and Kubernetes management.

StackHawk sponsors Devops Weekly

ICYMI: The StackHawk & Snyk in Action webinar is up on YouTube. Follow along to see how your team can automate security testing in CI/CD using these integrated tools. Watch now:


Atlassian had a large global outage last month. This in-depth indecent report goes into lots of interesting operational detail about the timeline, what happened and lessons learned.

A great post on organisational design, and in particular dependencies amongst teams adopting more product-centric funding models.

An interesting post on using monitoring of a local environment to inform implementing least privilege AWS access control.

Open source software is a large part of most systems administration efforts today. This site recounts 30 years experience of maintaining a critical open source tool, Curl.

Some thoughts on software architecture, advocating for more local-first experiences, using the cloud mainly for storage, synchronising and burst compute.


SLOConf kicks off tomorrow, running from the 9th of May to the 12th. A free, online event with a wide range of talks, from those focused on getting started to more advanced topics.


Korb is a handy tool for working with Kubernetes storage, specifically moving data from PVCs between StorageClasses or renaming them.

Tracetest is a tool for writing end-to-end tests for microservice-based applications, using OpenTelementry traces to speed up the test authoring.

Otomi is a platform as a service layer built atop Kubernetes. The focus looks to be on providing a visual management experience and it comes integrated out-of-the-box with Argo, Vault, Prometheus and lots more.