1 minute read

A mix of content this week, from some good posts on managing monitoring infrastructure to tools for maintaining health test suites, provisioning Kubernetes clusters for development and lots more.

StackHawk sponsors Devops Weekly

gRPC security scanning is in open beta. Scan each microservice and improve engineering best practices with StackHawk’s support for scanning gRPC services. Sign up to join the first gRPC security testing beta available in the market!
https://sthwk.com/gRPC

News

If you run your own monitoring infrastructure at scale, you’ll invariably be surprised by the cost of storage. This post looks at one team’s work to reduce the storage cost by a third for their Prometheus infrastructure.
https://medium.com/criteo-engineering/how-we-reduced-our-prometheus-infrastructure-footprint-by-a-third-8bf8171e46b1

Another monitoring post. This one on the design of a Prometheus, Grafana, Alert Manager setup and considering the tradeoffs around various topologies.
https://medium.com/riskified-technology/how-we-improved-our-monitoring-stack-with-only-a-few-small-changes-66c5a489fd67

A look at TypeAPI, a new alternative to OpenAPI for describing an API that is more optimised for automatically generating client tools.
https://chriskapp.medium.com/typeapi-an-alternative-to-openapi-e666eefe8a76
https://typeapi.org/

A nice introduction to AWS Vault, which aims to solve the problem of AWS credentials ending up in plain text all over the place.
https://blog.symops.com/2023/04/20/aws-vault/

A nice overview of Kubernetes evolution to a system that’s well suited to large batch processing workloads as well as lots of microservices.
https://thenewstack.io/kubernetes-evolution-from-microservices-to-batch-processing-powerhouse/

If you’re managing an ElasticSearch cluster then keeping a track on unassigned shards is important for monitoring search stability. This post explains why as well as looking at what to measure.
https://sematext.com/blog/elasticsearch-unassigned-shards/

Tools

Captain is a CLI tool that can detect and quarantine flaky tests, automatically retry failed tests, partition files for parallel execution and generate comprehensive test failure summaries. It works with your existing test runners.
https://www.rwx.com/captain
https://github.com/rwx-research/captain/

An interesting use of GPT, in this case it consumes event information from Falco and then generates recommendations for actions that are shared in Slack.
https://github.com/Dentrax/falco-gpt/

Provisioning a Kubernetes cluster for development isn’t always simple. Kubelift aims to provide a great experience for anyone wanting a single node Kubernetes setup on Azure, with an UI inspired by Vagrant.
https://kubelift.com/
https://github.com/polverio/kubelift-cli

A handy tool for visualising a toolchain, document which tools you’re using and where they are in their adoption.
https://github.com/Wivik/devops-solutions-map

Updated: