2 minute read

For some reason this issue is nearly all about monitoring. Lots of slides, videos and notes from the recent Monitorama event but also several blog posts looking in depth at logs, traces, metrics and alerting.

From our sponsor, VictorOps

[Free Webinar] VictorOps partnered with Catchpoint to share actionable ways to transform your monitoring and incident response practices. See how DevOps teams are being more proactive toward service reliability:
http://try.victorops.com/devopsweekly/death-to-downtime”

News

The Monitorama conference took place in Portland, Oregon a few weeks back, and folks have been writing up posts summarising some of the talks and conversations. All of the slides and videos are available too.
https://scoutapm.com/blog/monitorama-2019-portland-oregon
https://chaossearch.io/blog/monitorama-2019-recap/
https://blog.sensu.io/recapping-monitorama-2019
http://monitorama.com/2019/pdx.html#speakers

Who hasn’t had a drawn out conversation about how long to retain logs? This post explores the basics, from regulation to usefulness of keeping logs around.
https://medium.com/anton-on-security/retaining-logs-for-a-year-boring-or-useful-70ea21fa3dda

Distributed tracing has been around a while, but this post explores why it’s not as useful or widely used as you might assume. This post explores one area for improvement, the primary user interface for trace data.
https://medium.com/@copyconstruct/distributed-tracing-weve-been-doing-it-wrong-39fc92a857df

A nice writeup of evolving a monitoring stack. The post doesn’t just look into the technical components, but into the business reasons for the change, and how the tools where used to avoid needless alerts out of hours.
https://www.farfetchtechblog.com/en/blog/post/nobody-wants-to-be-woken-up-at-4-am/

Another monitoring post, this one exploring logs vs metrics, or more accurately pointing out that it’s a false dichotomy and you’ll want both anyway.
https://whiteink.com/2019/logs-vs-metrics-a-false-dichotomy/

Some hard won operational insights into running Kafka clusters at scale. Avoiding data loss, dealing with low-throughput topics and more interesting topics.
https://www.datadoghq.com/blog/kafka-at-datadog/

A good post on evolving an unfamiliar architecture, in this case using event sourcing, and fixing early design mistakes.
http://natpryce.com/articles/000819.html

Sometimes you end up writing bash scripts. This post contains useful tips and tricks for making those scripts idempotent.
https://arslan.io/2019/07/03/how-to-write-idempotent-bash-scripts/

Jobs

The state51 Music Group is hiring an engineer to help us push our infrastructure forward to support the rapid growth of the business. You’ll join a team of engineers and developers with a diverse skill set, working on a similarly diverse set of problems. More info and details on how to apply:
https://state51-music-group.workable.com/jobs/1063112

Tools

Kyma is a packaging of Istio, NATs, Prometheus and Kubeless to provide an enterprise integration platform. WIre up your existing applications and expose them to your Kubernetes based services.
https://github.com/kyma-project/kyma
https://kyma-project.io/

[Free Webinar] VictorOps partnered with Catchpoint to share actionable ways to transform your monitoring and incident response practices. See how DevOps teams are being more proactive toward service reliability:
http://try.victorops.com/devopsweekly/death-to-downtime”

Updated: