Thoughts on creating a good logging strategy (iOS example)

I ran across a blog post this morning on Fighting Log Entropy that caused me to reflect back on my own experience over the previous 15 years with logging.

Loggly Interface

Logging is an interesting topic because it is rarely talked about in academia, it is never included in sample code, and yet it is a foundational practice when trying to manage a production system. A good logging strategy gives you the visibility into the health of the system as a whole.

Here are some good logging practices that I’ve discovered over the last decade or so. I’ll use Thread, my current company where I’m the CTO, as a way to explain the foundational topics using a concrete example. These principles are adaptable to any tech stack.

1. Use a logging framework

In every technology stack there are logging frameworks that encapsulate best practices for logging. Unfortunately, I typically see devs rolling their own logging library or just writing log statements directly to disk which is a big mistake.

At Thread, I’m currently using the Cocoa LumberJack framework. It’s not that different than any good logging framework and provides an extensible foundation for your logging strategy. It will grow with you over time.

It’s an order of magnitude more performant than the built in NSLog. NSLog is fine for playing around if all you need are logs in the Xcode console. NSLog won’t cut it for anything more than that.

2. Use multiple logging levels

LumberJack supports the standard logging levels that are commonly used in production quality apps. It’s up to you to decide what gets logged at each level but here are my *general* guidelines:

DDLogError(@"Log true errors here. Especially an unexpected return from a third party service. e.g. Parse.CloudCodeService() returned error code 500 - something went wrong.");

DDLogWarning(@"An unexpected code path was hit. Input params you don't expect. ");

DDLogInfo(@"Normal code paths wrapped around third party calls. e.g. Called Parse.CloudCodeService() w/ parameter Age=21.");

DDLogDebug(@"Normal code paths wrapped around internal method calls including input parameters. Called CPUser.Save() w/ parameter Age=21.);

DDLogVerbose(@"Control flow. Could be inside an if/else statement letting you know which way the code went.");

In a team environment I would document this and include it in code reviews so that logging is consistently implemented.

3. Give devops visibility in the field

Logging is more about production support than anything. Devs typically think about logging as a way to help debug the app during development unless they've had to maintain a production app before. During development I prefer to use breakpoints and the debugger instead of logging.

In a consumer facing iOS app you typically don't have a direct line to the end user. The best way monitor the overall health of the system is to ship logs to a central place where they can be read.

I have a zero infrastructure policy for Thread because I'm the only engineer on the team and I need to scale myself. I'm using Loggly which is log management as a service. LumberJack has a custom logger which automatically ships logs out to Loggly.

I send logs from the client and server code. Each log statement contains a user ID so I can trace whats happening across the whole stack for a single user.

In production I only record Errors from our iOS app by default. I'm especially interested if calls to our backend systems error out and have an automated alert routed to my phone in that event. I have more detailed logging inside the server code though. This alone has saved me more than once in the last year as I was able to revert a change within minutes of introducing a critical bug to our server code.

I haven't needed this yet, but if I had the time I could send all device logs to Loggly based on a configuration flag. Imagine that flag could be sent to the client via silent push notification or enabled globally via a settings package the clients might download on each app start.

4. Evolve your logging over time

The last point I want to make is that your logging should evolve over time as you dial in what works for you. At first you might find that your logs are too verbose and contain a lot of noise. Or, like my first production system, you forgot about logging and find that you have no idea what's happening once the code is no longer running in your dev environment.

Take each defect that you troubleshoot and as you read the logs think about each statement in that log file. Are there lines in the logs that you find your eyes always skipping over? Those are good candidates to get pushed out to a higher verbosity level or removed completely.

If you weren't able to isolate a problem with logs and had to resort to guessing what happened on your own device than you have a blindspot. I try to identify blindspots and systematically eliminate them as I go.

The right thing to do would be to file an issue in the defect tracker to be addressed on the next release cycle. Also try to think more generally when you add new logging. For example, if you are adding logging around a call to an external resource, you should wrap all calls to external resources at the same time so that your logs are consistently formatted and easy to read.

That's a pretty good foundation that I've picked up over the years. It's saved my bacon on more than one occasion for sure.

I'm interested in hearing about your logging best practices too.