DEVOPS WEEKLY ISSUE #592 - 1st May 2022
A few posts this week on aspects of building modern operations teams, one focused on observability teams, and the other on building an SRE team in a hypergrowth organisation. Plus posts on scaling dashboards, feature flag standards, development environments and more.
StackHawk sponsors Devops Weekly
StackHawk’s new Snyk Code integration is now live! Sign up for the webinar to see how you can use these tools to find, correlate, and fix application and API security issues quickly. Save your spot here:
https://sthwk.com/Stackhawk-snyk-in-action
News
Observability is critical to scaling the larger and larger systems we’re building, and dedicated teams are increasingly common in larger companies. This post covers why and the roles within those specialised teams.
https://ericmustin.substack.com/p/notes-on-an-observability-team
OpenFeature is a new effort to build an open standard for feature flag management.
https://openfeature.dev/
A good post on the growth of an SRE team, from 5 people to a much larger organisation many years later, through some hard won lessons that will be familiar to anyone who’s been through similar growth.
https://lethain.com/founding-uber-sre/
A nice detailed post on evolving development environments, from local setups with Vagrant, then Docker, then Kubernetes to building a remote development environment on top of Kubernetes.
https://medium.com/pipedrive-engineering/say-hello-to-devboxes-fab125cd793a
Self-service dashboards are great, but you can end up with a long tail of unused dashboards over time. A discussion of managing dashboard sprawl using GitOps.
https://medium.com/riskified-technology/consistent-monitoring-the-gitops-way-1d481e9965c9
Described as “nmap but for pids”, xpid is a handy systems administration tool that can be useful for finding hidden pids and observing eBPF operations.
https://dev.to/kcdchennai/hunting-hidden-pids-ebpf-and-much-more-using-xpid-c0o
A look at Kafka monitoring, focused on the importance of keeping an eye on consumer lag.
https://sematext.com/blog/kafka-consumer-lag-offsets-monitoring/
Tools
ssh no ports provides a way to ssh to a remote linux host/device without that device having any open ports (not even 22) on external interfaces.
https://github.com/atsign-foundation/sshnoports
Fleet is a new Rust build tool. Nicely integrating several ecosystem tools to provide up to a 5x performance boost.
https://github.com/dimensionhq/fleet