Some Logging Rules

I seem to be building logging infrastructure today.  I keep recalling one or another of the rules for playing this game.  Might as well try to put them down.

  • Who? – The speaker’s unique ID and type should be in each log line.
  • Transcript – The speaker’s utterances should have a serial number, so you can notice gaps.
  • Checksum – A running check sum is a big help in proving things.
  • When – The utterances should have a time stamp (daemontools multilog t is good)
  • Synchronize our watches – NTP is a must everywhere.
  • Breadcrumbs – Jobs/tasks/work-items/requests should have a unique ID that is threaded these across to the logs and across process/module/machines
  • Health – All processes (machines, threads …) should emit a heart beat; heart beats should include some health indicators so other parties can notice when they expire or get sick
  • Replay – Logs that enable a rebuild from last snapshot will save your butt.  Often your close and only some minor optimization (truncating output, discarding binary info, say) is preventing it.  I once rebuilt an entire source repository from years of mail to prove an intrusion had not touched the sources.
  • Syntax – It’s good if the logs are well tokenized, i.e. embedded strings are escaped; and character encodings are worked out.
  • Standardized – It’s good, but it’s hopeless. This is the worst case of the 2nd part of “”Be strict in what you send, but generous in what you receive”
  • Innummerable – You can lay an ontology over the space of exceptions.  Accept that, and then proceed as usual.
  • Now – The sooner the log analysis takes place the better.  Don’t wait until your patient is in intensive care.  Analogy: test driven development.
  • Email – The accumulated headers in modern email are full of lessons learned
  • FSEvents – the asynchronous file system journaling/notifications of (BeOS, et. al.) are worth looking at closely.
  • Fast – I tend to embrace that writing the log is not transactional or even particularly reliable so I can have volume instead.

2 thoughts on “Some Logging Rules

  1. bhyde Post author

    Julius (and Anton) – Thanks.

    I’m somewhat ambivalent about priority (warning, error, emergency) since I’ve too often been caught in situations where emergency wasn’t, and warning was.

Leave a Reply

Your email address will not be published. Required fields are marked *