1 minute read

For the last day of 2023 I’ve pulled together a list of the best posts from the last 3 months or so. An interesting set, covering incident management, LLMs, observability, build engineering and more. Here’s to a fruitful 2024.

StackHawk sponsors Devops Weekly

Discover how StackHawk and GitHub are reshaping the way we secure web applications and APIs with developer-first functionality. Learn more:
https://sthwk.com/stackhawk-github-blog

News

A post on using open source LLM models, in particular focused on engineering for performance, including details of benchmarks and optimisation techniques.
https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices

A detailed, technical, post on embracing eBPF for monitoring at the network layer and providing better control of a large microservice and infrastructure platform.
https://doordash.engineering/2023/08/15/bpfagent-ebpf-for-monitoring-at-doordash/

A look at applying some of the DORA findings to improving incident management practices.
https://firehydrant.com/ebook/dora-2023-incident-management/

Having a formal lead role for incident management is a common pattern. This post explains what that role should do, and why it’s important.
https://argoday.medium.com/incident-command-guide-9872b51d7c94

A great post with tips for being on-call. Covering why on-call is hard, and what you and your team can do to make it suck less.
https://hart-michael.medium.com/how-to-be-on-call-034e3a202729

A set of collected views on how generative AI can aid incident response. Lots of great observations and ideas, some of which will undoubtedly become features and products in the future.
https://www.heavybit.com/library/article/generative-ai-incident-response-devops

There is quite a bit of cross-over between how a central security team needs to interact with a larger development team, and what’s needed for cost-control in self-service platform teams. A good post on this topic.
https://stateofsecurity.com/how-information-security-and-risk-management-teams-can-support-finops/

How do you switch the build system for more than 200 developers, without slowing developer velocity? A great post on moving to Bazel, including local and remote instrumentation, parallel running and staggered rollout.
https://engineering.atspotify.com/2023/10/switching-build-systems-seamlessly/

An argument for treating observability instrumentation as a first class part of software development, in the same way that unit tests have become.
https://www.honeycomb.io/blog/observability-is-about-confidence

A passionate call for a second wave of devops tools to fulfil the promise of the movement. Some good observations here about what’s changed over the last 10+ years, and what hasn’t.
https://www.systeminit.com/blog-second-wave-devops/

Tools

Radius is a new application configuration tool. Using Bicep to define the components of an application, and use reusable recipes to run that on various infrastructure platforms.
https://radapp.io/
https://github.com/radius-project

Updated: