Introduction
This post describes how I setup central logging for the distributed cloud application that I am developing, and why I did so. It details the software that I used and how I configured it, and it explains why I did, what I did, where I did it.
I wanted to add central logging to my application and associated systems because,
- Accessing individual logs on distinct systems and in distinct files on those systems had too much overhead
- I wanted my logs, as far as possible, to survive a catastrophic failure of the application and systems (of course there is always a possibility that the logging system itself fails)
- I wanted to avoid touching my production systems too often
- As I add more staff for development and operations, I want to avoid complicating my firewall rules by having to open up access to the individual systems from each admin’s system
- I wanted to log “everything” to see if I could debug problems by simply reviewing the logs rather than having to repeat problematic transactions and trace through my code. Repeating problematic transactions and tracing through code, to debug problems, is close to impossible in my production systems anyway, as this is the back-end for a consumer mobile app with a potentially huge user base. Further, I cannot rely on my users to report problems, or when they do, to report the correct set of steps to reproduce the problem.
- So far the log everything strategy has worked very well. I’ve solved numerous problems by simply reviewing the logs and I have not had to trace through my code even once.
- With the availability of cheap high performance systems today and with the distributed nature of the workloads, I have not encountered significant slowdowns in end user response times so far.
- I wanted my central logging system to automatically send me an email when an error occurred rather than for me to discover the problem some other way and then go hunting for the logs
- Down the road I want to process the logs more intelligently, and codify my common triage workflows, so that the system can automatically determine root causes and then alert me to the issue rather than its symptoms
- E.g., In a simple example, if there is a problem with my database server, then my
MySqllogs will have errors in them and my Java application may encounter manySQL Exceptionsthat are logged as well. In this case, instead of being inundated by alerts for everyJava SQL exception, I’d rather just get an alert with the error log entry fromMySqldetailing theJava SQL exceptionsas likely symptoms.
- E.g., In a simple example, if there is a problem with my database server, then my
- Logs are also a cheap way for me to track other metrics such as which features are used the most. I can get this information by processing the logs instead of writing more code to specifically track feature usage and increment counters and metrics etc. Processing the logs also gives me more freedom to retroactively track metrics that I may need later, instead of having to update my application and getting those metrics only from that point on. Data like which features are used the most is great for focusing Marketing, Product Management, and Engineering resources.
- A central logging system also allows me to correlate logs from multiple systems by say timestamp and username (say I am able to codify usernames into the logging from complex transactions across multiple systems). This way I can see a full end-to-end trace of what happened on behalf of a user and tell where in the chain of function calls the problem occurred and how transaction arguments morphed during that movement of data across systems.
- I have a thought that comparing logs output from my test systems and production systems can also quickly tell me if certain code paths are being tested enough or not. Of course there is COTS software to report code coverage but I could potentially avoid running another software application if I can just use my logs.
Synchronizing time for Log Analysis
- In order to debug problems in multi-access / multi-threaded systems like web and mobile applications, its very important that your systems have a consistent view of time. When you are debugging problems across multiple systems, e.g., even simply between a web and database server, you need to be able to string log entries together to identify the code execution path from transaction start to failure.
- Synchronizing time is very easy with
NTPso there is no reason to avoid this step
Setup Steps
Machines and OSes used
My systems are running in AWS EC2 and are currently t1 or t2 micro instances running Ubuntu 14.04 LTS
Synchronizing time across my systems using NTP
- I’m currently of the opinion that for my purposes of synchronizing time I can just set the timezone of all my instances to be US Pacific Time irrespective of where the instances really are. This way I don’t have to do any timezone processing of my log entry timestamps when I am investigating an issue and all externally reported times just have to be converted to a single timezone for debugging purposes.
- Setting the common timezone of all my instances
$ sudo dpkg-reconfigure tzdata
- Setting up NTP
- Installation
$ sudo apt-get install ntp
- Configuration (optional)
- The default
NTPconfiguration file hasNTP pool serversalready configured and its not necessary to change them - Just in case, you can edit the file
/etc/ntp.confand add the lines
- The default
server 0.us.pool.ntp.org
server 1.us.pool.ntp.org
server 2.us.pool.ntp.org
server 3.us.pool.ntp.org
- Reference
NTPconfiguration and the optimal pool servers to use are documented here
Creating a common logs format across my application and systems
I wanted to have a common logs format across all my application code and systems to make debugging easier. This way,
- I can count on all log entries having all the data fields / information that I deem necessary for debugging and tracking
- The code that I write to process my logs is simpler
- Understanding / consuming the log entries is easier as they all follow a standard format
Using Log4J2 for application logging
- Early on I standardized on using
Javafor all my application code because I’m good atJavaand it has such widespread support that I can code every piece of my application back-end in it, be itweb servicesandREST APIs, or components that are straight upJava server applications. There is also this amazing application logging package namedLog4J2! I’ve been usingLog4J2for many years and absolutely love it. - Adding
Log4J2to yourJavaapplication or web project and configuring it is very easy and described in detail here - I’ll just say that I used a
RollingFileAppenderwhich writes logs to a well named file in a standard sub-directory under/var/log/. Having just one type of OS across my systems helps standardization of course. - Also I don’t need to keep old log files around on the systems since all log entries are all being sent to my central logging system. This way I can just delete the files once they get too big and start new ones.
Updating the default java.util.logging format used by Tomcat
- I primarily have five kinds of components –
Servletsthat run inTomcatinstances,MySql,Redis,syslog-ng, and straight upJava server applicationsthat are run periodically bycronjobs. - I’m trying to minimize the number of diverse types of components so that standardization and maintenance is easier. The net effort required to develop, deploy, and maintain, a component has to be minimal. So something that is harder to develop in-house but easier to maintain and standardize can win out over
COTSsoftware. - So far I’ve standardized the logging format of my
Java Servlets,Java server applications, andTomcatitself.- My
Java Servletsand andJava server applicationsuseLog4J2and thats covered above Tomcatitself uses thejava.util.loggingpackage. Updating the format of theTomcatlogs was easy.Tomcathas three types of log files by default.catalina.*,localhost.*, andlocalhost_access_log.*
- The logs that I am interested in are in the localhost.* log files
- These capture uncaught
exceptionsgenerated within my deployed applications - I had to do two things
- Update the format
- The format is set in
Tomcat'slogging.propertiesfile which on my systems is in/etc/tomcat7/ - I added the format line:
java.util.logging.SimpleFormatter.format = %1$td %1$tb %1$tY %1$tH:%1$tM:%1$tS,%1$tL \"%1$tY-%1$tm-%1$td %1$tH:%1$tM:%1$tS,%1$tL\" %4$s %2$s [] \"%5$s%6$s\"%n
- This matches the format that I use in my
Log4J2configuration - I then saved my changes and restarted
Tomcat - More information on this is available here
- The format is set in
- Create and update a well named
hard linkto thelocalhost.*log file- The format of the
localhost.*log filenames islocalhost.<date>.logandTomcatcreates a new file daily - I didn’t want to mess with this but
syslog-ng OSErequires a constant filename to track and gather logs from - Hence, I created a well named
hard linkto the latest version of this file and update this hard link to the latest version every hour via acronjob- I had to create a
hard linkbecause the modified date on asoft linkwould not change when the underlying file is changed, hencesyslog-ngwould not know that a change has occurred / new logs have been written to the log file
- I had to create a
- The
cronjob that runs every hour executes the commandsudo ln -f $(ls -t localhost.* | head -1) tomcat_localhost.log
- The format of the
- Update the format
- My
Deciding the log entries of interest that I want to be proactively notified about
- Early on in my application design and development, I decided that I would try to log everything, as explained in the introduction to this post. Hence, I decided that I would have only three severities of logs –
DEBUG,WARN, andERROR. Actually I started out with two –DEBUGandERROR, withERRORsbeing the ones that I want to be notified about obviously. However, soon I realized that there are certain business related events that I also need notifications about, e.g., a user creating a new account or canceling their account. These are not errors but my marketing and sales departments would be interested in this information hence I added theWARNseverity to cover such things. - My log entries are very descriptive in that they include every dynamic part of a particular action. Typically these are the arguments for the action like a username, datetime, entity ID, etc. Of course you have to be careful not to accidentally log things like passwords and authentication tokens in plain text, or at all actually.
- This being my standard format for a log entry, then the only thing I need to change is the severity in order to decide whether its something to be notified about or not. And for Notifications I have –
DEBUG(don’t notify),WARN(notify the business contact),ERROR(notify the engineering contact). - These rules have worked for me so far but maybe in the future they will need more complexity. The main thing is to be rigorous so that I don’t compromise consistency and ease of maintenance.
Setting up a storage system for my logs
- Given that I want to log everything, I was obviously going to have a tremendous number of log entries. Hence, I needed a robust storage system.
- I decided to go with
MySqlbecause its robust, scalable, and easy to manage for me. I looked at usingNoSqlsystems but I just found that their management and maintenance requires more time than I had. Maybe down the road I will use them. - Another thing is that I don’t expect to store all my logs in
MySqlforever. I figure that I only need a few days worth of logs inMySql, in case I need to debug a problem. The log processing that I have done so far is working off recent logs and these are easy to store and get fromMySql. To save older logs, I’m planning to implement a system whereby my logs are saved both inMySqland in a longer term data storage system like cloud storage sold byAWS,Microsoft Azure, orGCE, simultaneously. This way, I will not have to move logs fromMySqlto secondary storage. I’ll just delete older logs fromMySqlwhile still having copies in cloud storage. - Setting up
MySqlonUbuntuis very simple and I’m not going to cover it here - Also,
syslog-ng OSE, just requires my database name, username, and password, and then it creates / alters / drops all the tables based on mysyslog-ngconfiguration. This cuts my overhead tremendously.
Setting up a system to alert (email) me when log entries of interest are generated
- The first and simplest form of log processing that I wanted to do, as described above, was to get emails when an
ERRORorWARNseverity log entry is created. - Since my logs are in
MySql, it was very easy to write a smallJava programthat retrieves newERRORorWARNlog entries (since the last run of the program) and then emails me these log entries. - I run this
Java programvia acronjob every minute, and of course thisJava programis also usingLog4J2so its own logs end up insyslog-ng OSEtoo. Now if there is a problem with this program, I won’t get any emails but at least the logs will be inMySqlfor me to review.
Setting up the core logging system – syslog-ng OSE
- I looked at many different options for a central logging system (both paid and free) – there are a ridiculously large number of products for logging out there. I was of course partial to free products.
- I ended up testing two products
- The so-called
ELK stackconsisting ofElasticSearch,LogStash, andKibana, and - On the recommendation of my friend and fellow CTO, Sebastien Soudan, the
syslog-ng Open Source Edition (OSE)
- The so-called
- The
ELK stackproved to be too time-consuming for me at the time. Maybe now the documentation and error checking has improved. It has a lot of great features and bothElasticSearchandLogStashare very fast and scalable. Ultimately though, the configuration and debugging proved to be quite challenging. Having three distinct moving parts didn’t help either. Also I ran into a bug that was fixed in the next GA release and there was no timeline for this next release. - So I then tested
syslog-ng OSE. My experience was fantastic to say the least! syslog-ngis clearly a tried and tested, robust, and scalable, logging system with many features to process logs from varied sources, write them to varied destinations (like myMySqldatabase), route, and process log entries in complex ways, and a simple but such an important feature for me – handling multiline log entries simply.- There are two components to setup for
syslog-ng OSE–serverandclient - The server is aggregating logs over the network, processing these logs, and sending them on for storage to my
MySqldatabase. - I also run the alerting program (described above) to email me when
ERRORandWARNlog entries are generated on this system - The clients run on every other system and are collecting log entries from one or more log files on the client systems, and sending these log entries over the network to the
syslog-ng serverusing the standard and robustsyslog protocol - Since I am using
Log4J2, I could have used asyslog appenderto haveLog4J2send the logs directly to mysyslog-ng server. However, thelog4j2logging function call is a blocking call. This means that if the network or my server are down or slow, then my application will also block. Hence, I prefer for my application to log to a file and forsyslog-ngto pick it up from the file. - The neat thing (and a thing that cuts my maintenance headache) is that the
clientandserverare both the same program just configured as either client or server - I found that the overhead of
syslog-ngis negligible in terms ofCPU,memory, anddisk space. The latency of logs being written to the original log files, and from there being transported over the network to thesyslog-ng serverand from there over the network once again to the database is negligible – less than a second end-to-end. - Another really neat feature is that I can configure the
clientandserverto communicate securely.syslog-ng OSEsupportsTLS, hence my clients and server authenticate each other, and all my log entries are encrypted as they are transported over the network. - So here is my setup
Documentation
The Ubuntu packages for syslog-ng are available at
Installing syslog-ng on a system
- Add the
GPG-keytoAPT
curl http://packages.madhouse-project.org/debian/archive-key.txt | sudo apt-key add -
- Add the repositories to
/etc/apt/sources.list
deb http://packages.madhouse-project.org/ubuntu trusty syslog-ng-3.5 syslog-ng-incubator-3.5
deb-src http://packages.madhouse-project.org/ubuntu trusty syslog-ng-3.5 syslog-ng-incubator-3.5
- Update the
APTcache
$ sudo apt-get update
- Install the
syslog-ng package
$ sudo apt-get install syslog-ng-core
$ sudo apt-get install syslog-ng
Install libdbd-mysql on the syslog-ng server machine
syslog-ngrequires this driver to communicate with theMySqldatabase
$ sudo apt-get install libdbd-mysql
MySql Configuration
- Login to
MySql
mysql -u <username> -p
- Enter your database password at the prompt
- Create database named
logs
CREATE DATABASE logs;
- Assign users to the
logsdatabase
GRANT ALL ON logs.* TO <username>@'<IP Address of syslog-ng Server>' IDENTIFIED BY '<database password>';
- Open
MySqlport3306from<IP Address of syslog-ng Server>in yourfirewallguarding theMySqlserver
syslog-ng Server configuration
- The
syslog-ngconfiguration is saved in file/etc/syslog-ng/syslog-ng.conf - Keep the default configuration and add the following
- Source
source s_network_ietf_syslog {
syslog(ip(0.0.0.0) port(5000) transport("tcp"));
};
- This sets up
syslog-ngto listen forsyslogmessages onTCPport5000. Yoursyslog-ng clientswill communicate with the server on this port. - Parser
parser log4j_logs_parser{
csv-parser(
columns(LOG4J.DAY_OF_MONTH, LOG4J.MONTH_NAME, LOG4J.YEAR, LOG4J.TIME, LOG4J.SEVERITY, LOG4J.CLASS_NAME, LOG4J.LOCATION_IN_SOURCE_CODE, LOG4J.THREAD, LOG4J.LOG_MESSAGE)
delimiters(" ")
flags(escape-backslash, greedy)
quote-pairs('""')
);
};
- We use
syslog-ng's CSV parserto parse the log entries into fields for storage in the database - This makes it easier to process the log entries down the line
- Earlier in the
Log4J2configuration I used spaces judiciously between fields to make sure this parser works correctly.- The line
columns(…)represents the fields in my log messages. The names likeLOG4J.DAY_OF_MONTHandLOG4J.CLASS_NAMEare names made up by me so that I can refer to each of these parsed fields later in my configuration. Thesyslog-ng CSV parserwill parse each log entry based on thedelimiterspecified and then match the parts of the parsed log entry to the column names that I have specified. If I have less column names than the parser actually finds then I will lose the excess data from the log entry.
- The line
- I also surrounded certain fields like the message within double-quotes
("")to ensure that theCSV parsertreats data within double-quotes as a single field even if it has spaces (mydelimiter). This is configured by thequote-pairs('""')setting. - The
escape-backslash flagis used to escape quotes within log entry fields - And finally, the
greedy flagis a catch-all. If by some misconfiguration or chance a log entry has more fields than mycolumns, then everything after the last field of the parsed log entry is added to the last column of the parser. This way I won’t lose any log entry data. - Destination
destination d_logs_mysql_server {
sql(
type(mysql)
host("<IP address or DNS name of the MySql server>")
username("<database username>")
password("<database password>")
database("logs")
table("logs")
columns("log_date_time datetime", "sender", "senders_send_date_time datetime", "servers_receive_date_time datetime", "severity", "class_name", "location_in_source_code", "thread", "message")
values("${LOG4J.ISO8601_DATETIME}", "${HOST}", "${S_UNIXTIME}", "${R_UNIXTIME}", "${LOG4J.SEVERITY}", "${LOG4J.CLASS_NAME}", "${LOG4J.LOCATION_IN_SOURCE_CODE}", "${LOG4J.THREAD}", "${LOG4J.LOG_MESSAGE}")
);
};
- The destination for the log entries is my
MySqldatabase - You’ll see that the
databaserefers to the database I created above under theMySqlconfiguration - The
tableinstructssyslog-ngto create a table, also namedlogs, if it does not exist in the databaselogs - The
columnsinstructssyslog-ngto create columns having the specified names.syslog-ngdoes a good job of inferring data types for the columns. - The
valuesrefers to fields from my parser above or from default fields that everysyslog messagehas. Remember that in my configurationsyslog-ngis transmitting log entries from the clients to the server assyslog messages. Thus${HOST}refers to the sender’shostnamecontained in everysyslog message. Similarly,${S_UNIXTIME}refers to the datetime that the sender sent thesyslog message. - The neat thing is that if I change this configuration, then
syslog-ngwill automatically alter the database table as necessary. - Log Statement
log {
source(s_network_ietf_syslog);
parser(log4j_logs_parser);
destination(d_logs_mysql_server);
};
- The
logstatement finally ties everything together- It instructs
syslog-ngto receive log messages from the clients, parse the log messages, and then send them on to theMySqldestination
- It instructs
syslog-ng Client configuration
- Once again, the
syslog-ngconfiguration is saved in file/etc/syslog-ng/syslog-ng.conf - Keep the default configuration and add the following
- Source
source s_tmhapp_log_file{
file (
"/var/log/tomcat7/<my_log_file_name>"
flags(no-parse)
multi-line-mode(regexp)
multi-line-prefix("[0-9]{2} [a-zA-Z]{3} [0-9]{4}")
);
};
- All my clients are writing logs to files and these files are the source of log entries on my clients
- The
no-parseflag instructssyslog-ngthat the log entries inside the files are not in standard syslog format and thus should not be parsed as such. This is because the log entries are following my custom format that I previously set inLog4J2. syslog-ngwill thus treat the entire log entry as the payload (or message) of asyslog packet. It sends onesyslog packetfrom theclientto theserverfor each log entry.- The
multi-line-prefixinstructssyslog-ngto handle multi-line log entries.syslog-ngapplies the specified regular expression to incoming text data from the log files and identifies the start (thisprefix) of a new log entry and then treats all text data (including newlines) until the next instance of theprefixis found, as part of the single log entry. Without this,syslog-ngtreats a newline character as the end of a log entry. - The
multi-line-mode(regexp)instructssyslog-ngthat the specifiedprefixis aregular expressionand should be processed as such. - Destination
destination d_network_ietf_syslog{
syslog(
"<IP address or DNS name of the server>"
transport("tcp")
port(5000)
localport(5000)
);
};
- The
destinationspecifies where the server is and on which port - I also chose to fix the outgoing port for communications to lock down my firewall rules further
- Log Statement
log {
source(s_tmhapp_log_file);
destination(d_network_ietf_syslog);
};
- Finally, as in the
server, thelogstatement ties everything together - Note, that the log entries are parsed on the
server
Debugging syslog-ng
- Start
syslog-ngwith the following command
$ sudo strace -o /tmp/trace.txt -s256 -ff -ttT syslog-ng -f /etc/syslog-ng/syslog-ng.conf -Fdv
syslog-ngsends all its debug output to the console and I have found this more than enough to troubleshoot my setup
Starting and Stopping syslog-ng for Production
- The standard commands apply
$ sudo service syslog-ng <start> | <stop> | <restart>
Optional – Mutual TLS Encryption between syslog-ng clients and the server
- We will use the same
private keyandssl certificatefor bothclient'sand theserver- This is because for
syslog-ng, one must store both the client and server certificate on the machine without passwords - Therefore, if a malicious user breaks into these machines, they’ll get the certificates anyways
- So the main protections are to restrict access to the machines and to have the data transmission encrypted
- This is because for
- Create the
private keywithout apassphrase
$ openssl genrsa -out <private key name>.pem 2048
- Create the
ssl certificate
$ openssl req -new -x509 -key <private key name>.pem -out <ssl certificate name>.pem -days 7300
- Fill out the required certificate data as prompted
- Copy the
ssl certificateto theclientsin theclient'sdirectory/etc/syslog-ng/ca.d/ - Copy the
ssl certificateto theserver'sdirectory/etc/syslog-ng/ca.d/ - In the
/etc/syslog-ng/ca.d/Directory of eachclientand theserver- Issue the following command on the
ssl certificate
- Issue the following command on the
$ openssl x509 -noout -hash -in <ssl certificate name>.pem
- The result is a
hash(e.g., 6d2962a8), a series of alphanumeric characters based on the Distinguished Name of thessl certificate. - Issue the following command to create a
symbolic linkto thessl certificatethat uses the hash returned by the previous command and the.0suffix.
$ sudo ln -s <ssl certificate name>.pem 6d2962a8.0
- Copy the
ssl certificateto theclientsin theclient'sdirectory/etc/syslog-ng/cert.d/ - Copy the
certificateto theserver'sdirectory/etc/syslog-ng/cert.d/ - Copy the
private keyto theclientsin theclient'sdirectory/etc/syslog-ng/key.d/ - Copy the
private keyto theserver'sdirectory/etc/syslog-ng/key.d/ - On the
syslog-ng clients, configureTLSin thesyslog destinationto theServer- Change
transport("tcp")totransport("tls")in thesyslog destination - And, add
- Change
tls(
ca-dir("/etc/syslog-ng/ca.d")
key-file("/etc/syslog-ng/key.d/<private key name>.pem")
cert-file("/etc/syslog-ng/cert.d/<ssl certificate name>.pem")
)
- to the
syslog destination - On the
syslog-ng server, configureTLSin thesyslog sourceof theServer- Change
transport("tcp")totransport("tls")in thesyslog source - Add
- Change
tls(
ca-dir("/etc/syslog-ng/ca.d")
key-file("/etc/syslog-ng/key.d/<private key name>.pem")
cert-file("/etc/syslog-ng/cert.d/<ssl certificate name>.pem")
)
- to the
syslog source
References
- http://www.balabit.com/network-security/syslog-ng/opensource-logging-system
- https://help.ubuntu.com/community/UbuntuTime
- http://logging.apache.org/log4j/2.x/manual/configuration.html
- http://docs.oracle.com/javase/7/docs/api/java/util/logging/SimpleFormatter.html
- http://docs.oracle.com/javase/7/docs/api/java/util/Formatter.html#syntax
- http://stackoverflow.com/questions/5416769/how-to-configure-the-jdk14-loggings-pattern
- http://tomcat.apache.org/tomcat-7.0-doc/logging.html
- http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-3.6-guides/en/syslog-ng-ose-v3.6-guide-admin/html/index.html
- http://asylum.madhouse-project.org/projects/debian/