Imagine that you're driving on a racetrack at night. There's no light, and it's pitch black. There's a light at the beginning of the course, and a light at the end. The rest of the course is dark. When you say you can't see where the road is, you're handed a map of the course. But don't worry. When you run off the road and into the bushes, you can put a light where you ran off the road so that the next time you try the course, you can see where the road is.
This would be a crazy state of affairs, but it's how developers are expected to write and debug code these days. We write code with no debugging statements. When we run into a problem, we'll put debug statements (or breakpoints, if you're lucky enough to have a local dev environment) where we think something went wrong, and keep doing that until we have a rough idea of what's happening. This is entirely backwards. Code maintenance involves reading and understanding code that you didn't write, and seeing how that code operates in different environments. In production, metrics and error reporting are critically important, while in testing and development, clarity of code flow and data flow is essential. Every single part of the code should be observable by default.
Operations has already realized the need for observability to answer questions about errors in production. Structured operational logging, distributed tracing, and metrics are all commonly used: but there's a catch. These tools are typically only used to determine and resolve errors and issues in latency. Operational logging capturing an error doesn't tell you what led up to it. Opentracing instrumentation is typically not fine-grained. Metrics are aggregated and low-cardinality. As developer tools go, they don't capture the flow of data -- they cannot catch logic bugs, they will not handle database corruption or memory leaks, and they typically do not capture changes over time. Worse, these tools are typically aggregated and sampled in production so rare bugs are harder to see.
There is a solution to this: context aware diagnostic logging. Given a sufficiently advanced context, we can determine when to log information at debug or trace level. And between Scala's implicit support, rich language features like macros for line numbers and type classes, we can provide lightweight logging and instrumentation that aligns with your code so that observability can be turned on and off for a particular flow or a particular user.
Will Sargent is an ex-Lightbend developer with a long list of blog posts about logging and a deep interest in minimizing the amount of time he spends in debugging code and resolving production issues.