Return to Revisions

6 of 6

Fixing some typos

edited Feb 15, 2019 at 14:02

4.3k
3
28
38

Making logging more efficient and useful by categorizing data (and virtually not writining messages anymore)

I'd like to make my logs more useful. Currently they mostly contain such common columns as: timestamp, logger-name, log-level, message, exception.

They are unsearchable and unparsable and writing messages and adding some data to them is not actually helping. Consequently when something goes wrong I'm not able to easily find the answer by just looking at the logs and need to debug my application.

The questions that need to be answered are not only about why the application crashed or didn't do something correctly but also about business rules like: why did I get a bonus when ordering ABC?

So usually I need to open the IDE and get the order and debug it to find which condition prevented the bonus to be added and whether it was ok to do so. Have I had logged the data or any other criteria necessary to make this decision, I would have been able to find it in the logs and answer the question maybe within 5 minutes.

But it's not possible to do so with the default schema with the message being the main part of it.

I thought I need to completely reorganize my logs in order to be able to log more data. But I cannot just put everything into the message or additional columns because there are too many possibilities and I'd like to have a general solution that would work for any application.

This means that I need more specific fields then just the message where I can put the additional information.

In order to find those fields I categorizing every piece of data I could think of. This is my list:

Environment - this is the largest scope. I use this to log machine names or dev/prod environments.
Product(name) - runs within the environment.
Layer(name) - this helps me to categorize the logs by the software layer. Each of the layers has its own log-level so I have:
Application - for general technical data about the application itself, this is logged with the max; log-level: Debug
Business - which are logs about business-logic; log-level: Information
Presentation - logs about the UI; log-level: Trace
IO - logs about disk operations; log-level: Trace
Database - logs about database; log-level: Trace
Network - logs about network; log-level: Trace
External - logs about external devices; log-level: Trace
Transaction(name) - all logs must belong to some transaction so that I can group them together and see the entire process.
State or Event - each log is either a State log, that logs some data that I usually use to make decisions or it's an Event.

As a state I can log three types of information:

State(Name) - this is the name of the state that I can search for (could be a CustomerInfo, an array of active downloaders, etc.)
Actual(State) - this is current state; log-level: Trace
Excpected(State) - this is what I expected; log-level: Trace

They both usually contain small object dumps in json format.

Events can be logged together with the Elapsed field. They also end with a result. I defined four of them:

Undefined - when not run like invalid parameters; log-level: Warning
Success - everything went well; log-level: Information
Completed - conditions not met, no errors; log-level: Information
Failure - an error occurred; log-level: Error
Message - finally there is the old good message which I usually use to give some hints how to fix what might went wrong but I now write it very rarely.
Exception - here I put the stack-trace of exceptions.

As a table in a database it could look like this:

Log-Table
--------------
Id
Timestamp
---
Environment | development, production
Product | Product-v0
Logger | RepositoryXLogger
TransactionId | 123
Layer | Application, Business
Level | Debug, Information
State | like a variable name for the state
Expected | small object dumps (json)
Actual | small object dumps (json)
Event | LoadConfiguration, GetDataX
Elapsed | milliseconds
Result | Undefined, Success, Completed, Failure
Message
Exception

I don't present any code because it's not about an specific programming language also how I log this information is an implementation detail that I'd rather ask on a different site.

With this new categories it should be much easier to tell what happened and to distinguish the application logs form the business logic logs. I should also be much easier to log because now I don't have to put all this very specific and enum-like information into messages.

If now anyone asks me about what went wrong I should be able to much faster give him an answer because I wouldn't even have to open the IDE and debug the application.

My questions are:

Are these categories enough to easily find the information you need about your application?
What other useful categories could there be?
Can you think of any case or question about your application you would not be able to find an answer for in such log or you would not be able to efficiently log?

The goal is to be able answer questions about application failure or strange behavior more quickly and without looking at the code. Especially questions about business rules that you might know but you are sometimes not sure if it worked correctly if you don't see the data it used to make decisions.

Disclaimer I'd like to kindly ask you not to comment about how to write a better message. There are already more then enough such questions on SE and I've tried them all and it does not work. My question is about how not to write a message at all. I find it is not necessary if the data we log is properly segmented/categorized. This question is about breaking the so called universally valid rules about logging. If you are not open to new ideas then please just ignore this question.

design logging

asked Nov 19, 2017 at 21:34

t3chb0t

2.6k
3
23
35