The rise of very public, in-depth, high-profile incident reports in the last few years is definitely of benefit to the art of systems administration. Atlassian’s post is a great example, covering the recent multi-week outage. Plus posts on organisation design, least privilege and some interesting tools for testing and Kubernetes management.
StackHawk sponsors Devops Weekly
ICYMI: The StackHawk & Snyk in Action webinar is up on YouTube. Follow along to see how your team can automate security testing in CI/CD using these integrated tools. Watch now:
Atlassian had a large global outage last month. This in-depth indecent report goes into lots of interesting operational detail about the timeline, what happened and lessons learned.
A great post on organisational design, and in particular dependencies amongst teams adopting more product-centric funding models.
An interesting post on using monitoring of a local environment to inform implementing least privilege AWS access control.
Open source software is a large part of most systems administration efforts today. This site recounts 30 years experience of maintaining a critical open source tool, Curl.
Some thoughts on software architecture, advocating for more local-first experiences, using the cloud mainly for storage, synchronising and burst compute.
SLOConf kicks off tomorrow, running from the 9th of May to the 12th. A free, online event with a wide range of talks, from those focused on getting started to more advanced topics.
Korb is a handy tool for working with Kubernetes storage, specifically moving data from PVCs between StorageClasses or renaming them.
Tracetest is a tool for writing end-to-end tests for microservice-based applications, using OpenTelementry traces to speed up the test authoring.
Otomi is a platform as a service layer built atop Kubernetes. The focus looks to be on providing a visual management experience and it comes integrated out-of-the-box with Argo, Vault, Prometheus and lots more.