Using Multiple Grok Statements to Parse a Java Stack Trace

2024-11-30

Introduction

With increasing deployment of Big Data applications as well as to manage scalability and availability, enterprises are deploying a lot of servers (either as physical instances of virtual instances). Many of these deployments will be of Java applications, hosted on enterprise application servers like IBM WebSphere, Oracle Fusion Server, etc. Given the nature of application development and the way users interact with them, there is a good chance of these applications crashing at some point in time during their operation. If such applications are handling critical enterprise functionality, it is imperative that failures are detected at the earliest and responded to by suitable corrective action, followed by re-deployment of the failed services.

To monitor applications, most enterprises use well-defined logging guidelines, which not only log server performance parameters like CPU usage, memory consumption, disk consumption, but also make it mandatory for applications to log their performance and progress parameters. For example, in the case of a retail web application, it may log information about all the items in the users' shopping cart. Such information can provide a basis for detailed analysis as well as allow the application to recover its state, in case the application crashes mid-way through a transaction. In case of an error, a Java application generates an exception report, also known as a Java stack trace, which provides information that can be used by the development teams, to identify and resolve the error.

For any analysis or recovery, parsing the log information is of prime importance. In this document, we will cover, using the Logstash tool - the most common methods used to parse a Java stack trace. We will also share some issues faced when using these methods and propose an alternative approach to parsing the Java stack trace.

Java Stack Trace

When compared with other machine log traces generated by applications, a Java stack trace is quite peculiar and different. This is because it spans multiple lines, whereas most log information spans only one line. For effective analytics and reporting, all information from the log data needs to be parsed - irrespective of it being on a single line or spanning multiple lines.

For the scope of this document, we will refer to the sample log, shown in Table 1. It should be noted that a Java stack trace does not exist in isolation and is usually intermingled with other log information generated by the application. Thus, it is important for any parsing method to not only parse the stack trace correctly, but to also parse other log information.

Jan 12, 2015 9:51:08 AM org.apache.catalina.loader.WebappClassLoader validateJarFile
INFO: validateJarFile(C:\Apps\tomcat-6.0.41\webapps\Recommendation\WEB-INF\lib\servlet-api.jar) - jar not loaded. See Servlet Spec 2.3, section 9.7.2. Offending class: javax/servlet/Servlet.class
Jan 12, 2015 9:51:18 AM org.apache.catalina.core.StandardContext start
SEVERE: Error listenerStart
Jan 12, 2015 9:51:18 AM org.apache.catalina.core.StandardContext start
SEVERE: Context [/Recommendation] startup failed due to previous errors
Jan 12, 2015 9:51:19 AM org.apache.catalina.loader.WebappClassLoader clearReferencesJdbc
WARNING: JDBC driver de-registration failed for web application [/Recommendation]
java.lang.reflect.InvocationTargetException
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
   at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: PermGen space
   at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
   at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
   at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1647)
   at java.lang.Class.forName(Class.java:274)
   at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
... 1 more




Jan 12, 2015 9:51:19 AM org.apache.catalina.core.StandardContext start
SEVERE: Exception during cleanup after start failed
java.lang.OutOfMemoryError: PermGen space
   at sun.misc.Unsafe.defineClass(Native Method)
org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1642)
   at java.lang.Thread.run(Thread.java:745)

Table 1: Sample log

ELK Stack of Tools

With machine generating a lot of information, parsing the generated data is important. For a long time, enterprises have deployed custom parsers to read such data set and enable processing on the same. Nowadays, a lot of tools are available to parse and process data. There are commercial tools like Splunk. And there are open source tools like GrayLog2, Logstash and Elasticsearch. Of these tools, we have selected the ELK stack, which is short for a trio of tools, namely Elasticsearch, Logstash and Kibana. In this document, we will only cover Logstash, though we are providing a brief overview of all three components in the stack.

Logstash

Logstash is an open source tool that provides an integrated framework for log collection, centralization, parsing, storage and search. It has a wide variety of input mechanisms: it can take inputs from TCP/UDP, files, Syslog, Microsoft Windows EventLogs, STDIN and a variety of other sources. The Logstash server supports a large collection of filters that allow us to modify, manipulate and transform those events and give them context. Doing so makes it easier to draw conclusions and make good decisions using the log data. On the output side, Logstash supports a huge range of destinations, including TCP/UDP, email, files, HTTP, Nagios and a wide variety of network and online services.

Elasticsearch

Elasticsearch is an open-source search engine built on top of Apache Lucene, a fulltext search-engine library. Lucene is a complex, advanced, high-performance, and fully featured search engine library. Elasticsearch uses Lucene internally for all of its indexing and searching, but aims to make full-text search easy by hiding its (Lucene's) complexities behind a simple, coherent, RESTful API. In addition to full-text search, Elasticsearch also supports the following features

A distributed real-time document store where every field is indexed and searchable
A distributed search engine with real-time analytics
It is capable of scaling to hundreds of servers and petabytes of structured and unstructured data

Kibana

Kibana is an open source analytics and visualization platform designed to work with Elasticsearch. Kibana can be used to search, view, and interact with data stored in Elasticsearch indices. Advanced data analysis and visualization of data is possible using a variety of charts, tables, and maps.

Using ELK

Typically, log information is collected and parsed from the various systems by Logstash. If the system is remote, Logstash forwarder agents have to be used to send data to the Logstash instance. Log information is parsed using Logstash scripts and inserted into an Elasticsearch index. Typically, Logstash scripts convert the plain text log information into JSON objects, which are then stored by Elasticsearch. Elasticsearch allows the business logic to search for information using suitable queries and results are generated in JSON format. Kibana, the visualization tool, understands Elasticsearch and we can start creating visualizations simply by pointing the Kibana instance to an Elasticsearch index. Kibana fetches the index definitions from Elasticsearch, which ca be used to define visualizations like column charts, pie charts and the like.

Approach 1: using multiline codec in input

One of the most common solutions suggested to parse a Java stack trace is to use the 'multiline' 'codec' in the input section of the Logstash script. The pattern used to read the data, appends all lines that begin with a whitespace, to the previous line. In other words, when Logstash reads a line of input that begins with a whitespace (space, tab), that line will be merged with the previously read input information. Thus, multiple lines of the trace are merged into one entry. A sample script using the 'multiline' 'codec' is shown in Table 2. The pattern '^\s*' means any line beginning with whitespace.

input {
  tcp {
   type =>; "javaStackTraceMultilineCodec"
      port => 9999
      codec => multiline {
        pattern => "(^\s*)"
        what => "previous"
      }
   }
}

filter {
  if [type] == "javaStackTraceMultilineCodec" {
    grok {
      patterns_dir => "./patterns"
      match => [ "message", "%{TOMCATLOG}", "message", "%{CATALINALOG}" ]
    }
    date {
      match => [ "timestamp", "yyyy-MM-dd HH:mm:ss,SSS Z", "MMM dd, yyyy HH:mm:ss a" ]
    }
    mutate {
      gsub => ['message', "\n", " "]
      gsub => ['message', "\t", " "]
    }
  }
}

output {
  stdout {
    codec => "rubydebug"
  }
}

Table 2: Multiline codec for parsing

When this script is executed on the sample log, it generates JSON records that can be inserted into an Elasticsearch index. The JSON records generated are shown in Table 3. As shown, the complete log information has been read by Logstash. In most cases, the original message is retained in the 'message' field, so as to provide an easy reference to the original data.

{
   "@timestamp" => "2015-01-12T04:21:08.000Z",
   "message" => "Jan 12, 2015 9:51:08 AM org.apache.catalina.loader.WebappClassLoader validateJarFile INFO: validateJarFile(C:\\Apps\\tomcat-6.0.41\\webapps\\Recommendation\\WEB-INF\\lib\\servlet-api.jar) - jar not loaded. See Servlet Spec 2.3, section 9.7.2. Offending class: javax/servlet/Servlet.class Jan 12, 2015 9:51:18 AM org.apach e.catalina.co re.  St an d ar dC on t ex t   s ta rt   S EV ER E :   Er r or   l i st e ne rS ta r t   J an   1 2,   2 0 15   9 :5 1 :1 8   A M   or g. ap ac he .catalina.core.StandardCo ntext   s tart   S EVERE: Context [/Recom men d at i o n]   s t ar t up   f a il e  d   du e   t o   p r ev i ou s   e r r o r s   J a n   1 2 ,   2 0 1 5   9 :5 1 : 1 9   A M   or g . a p a c h e . c at a l i n a . l o a d e r . W e b a p p C l a s s L o a d e r   c l e a r R e f e r e n c e s J d b c   W A R N I N G :   J D B C   d r i v e r   d e - r e g i s t r a t i o n   fa il e d   f o r   we b   a p p l i c a t i o n   [ / R e c om  m e n d a t i o n ]   j a v a.lang.reflect.Invocatio n Tar g e t E x c e p t i o n   a t   s u n . r e f l e c t . De l e g a t i n g M e t h o d A c c e s s o r I m p l . i n v o k e ( D e l e g a t i n g M e t h o d A c c e s s o r Im p l . j a v a : 4 3 )   a t   j a v a . l a n g . r e f l e c t . M e t h o d . i n v o k e ( M e t h o d . j a v a: 6 0 6 )   a t   o r g . a p a c h e . c a t a l i n a . u t i l . L i f e c y c l e S u p p o r t . f i r e L i f ec y c l e E v e n t ( L i f e c y c l e S u p p o r t . j a v a : 1 4 2 )   a t   j a v a . l a n g . T h r e a d . ru n ( T h r e a d . j a v a : 7 4 5 )   C a u s e d   b y :   j a v a . l a n g . O u t O f M e m o r y E r r o r :   Pe r m G e n   s p a c e   a t   j a v a . l a n g . C l a s s L o a d e r . d e f i n e C l a s s ( C l a s s L o a de r . j a v a : 8 0 0 )   a t   j a v a . l a n g . C l a s s L o a d e r . d e f i n e C l a s s ( C l a s s L o a de r . j a v a : 8 0 0 )   a t   o r g . a p a c h e . c a t a l i n a . l o a d e r . W e b a p p C l a s s L o a d er . l o a d C l a s s ( W e b a p p C l a s s L o a d e r . j a v a : 1 6 4 7 )   a t   j a v a . l a n g . C l a s s. f o r N a m e ( C l a s s . j a v a : 2 7 4 )   a t   o r g . a p a c h e . c a t a l i n a . c o r e . C onta inerBase.addChild(C ontainerBase.java:7 79) ...   1 more   Jan 12 , 2015 9:5 1:19 AM org .apache.ca talina.co re.Standar dContext   start   SEVERE:   Exception   during cl eanup afte r start f ailed   java.lang .OutOfMem oryError:   PermGen s pace   at  sun.mi sc.Unsafe.def ineClass(Nati ve Method)   org.ap ache.catalin a.core.Contai nerBase$Conta inerBackgroun dProcessor.ru n(ContainerBa se.java:1642)   at   java.lang.Th read.run(Thre ad.java:745) ",
   "@version" => "1",
   "tags" => [
   [0] "multiline"
   ],
   "host" => "127.0.0.1:56123",
   "type" => "javaStackTraceMultiline",
   "timestamp" => "Jan 12, 2015 9:51:08 AM",
   "class" => "org.apache.catalina.loader.WebappClassLoader",
   "logmessage" => "validateJarFile"
}

Table 3: Records created when using multiline code in input section

Issues with the script

While this is the most common suggestion, there are a few issues with this solution. As shown in the output, this script has created a JSON object containing all lines from the input file. Ideally, each line that begins with a timestamp should be created as a separate record and a separate record object should be created for the Java stack trace information. The problem with the pattern used is that it reads extra information.

Another issue with the script relates to the buffer used for storing log data. Logstash has a limited buffer (whose size can be changed) for storing the input data. If the Java stack trace spans many lines, there is a chance that the concatenation of multiple lines into one line can cause this size to be exceeded. Whenever such splitting occurs, it is not consistent and data continuity is lost to an extent that it is not possible to extract meaningful information from the generated output. As an example, if Logstash exceeds its buffer size while parsing 'at java.lang.Class.forName(Class.java:274)', one possibility of the strings inserted into Elasticsearch (as records), can be 'at', 'java.', 'lang.Class.forName', and '(Class.java:274)'. It should be noted that behaviour is not consistent in that the first record contains two characters; the second contains five characters and so on. The result depends on the way the data is processed in the 'filter' section of the Logstash script.

Approach 2: using multiline in filter

Another common suggestion to parse Java stack trace is to use 'multiline' in the 'filter' section of the script. As in approach 1, 'multiline' reads multiple lines of input as one block of text. The Logstash script using 'multiline' in 'filter' is shown in Table 4.

input {
  tcp {
    type => "javaStackTraceMultiline"
    port => 9999
  }
}

filter {
  if [type] == "javaStackTraceMultiline" {
    multiline {
      patterns_dir => "./patterns"
      pattern => "(^%{TOMCAT_DATESTAMP})|(^%{CATALINA_DATESTAMP})"
      negate => true
      what => "previous"
    }

    grok {
      patterns_dir => "D:/bipin/src/elk/patterns"
      match => [ "message", "%{TOMCATLOG}", "message", "%{CATALINALOG}" ]
    }

    date {
      match => [ "timestamp", "yyyy-MM-dd HH:mm:ss,SSS Z", "MMM dd, yyyy HH:mm:ss a" ]
    }

    mutate {
     gsub => ['message', "\n", " "]
     gsub => ['message', "\t", " "]
    }
  }
}

output {
  stdout {
    codec => "rubydebug"
  }
}

Table 4: Using multline in filter

The records generated when the script is executed on the sample data are shown in Table 5. Using this approach, multiple records are created, as expected.

{
   "message" => "Jan 12, 2015 9:51:08 AM org.apache.catalina.loader.WebappClassLoader validateJarFile\nINFO: validateJarFile(C:\\Apps\\tomcat-6.0.41\\webapps\\Recommendation\\WEB-INF\\lib\\servlet-api.jar) - jar not loaded. See ServletSpec 2.3, section 9.7.2. Offending class: javax/servlet/Servlet.class",
   "@version" => "1",
   "@timestamp" => "2015-01-12T04:21:08.000Z",
   "host" => "127.0.0.1:56171",
   "type" => "javaStackTraceMultiline",
   "tags" => [
   [0] "multiline"
   ],
   "timestamp" => "Jan 12, 2015 9:51:08 AM",
   "class" => "org.apache.catalina.loader.WebappClassLoader",
   "logmessage" => "validateJarFile"
}

{
   "message" => "Jan 12, 2015 9:51:18 AM org.apache.catalina.core.StandardContext start\nSEVERE: Error listenerStart",
   "@version" => "1",
   "@timestamp" => "2015-01-12T04:21:18.000Z",
   "host" => "127.0.0.1:56171",
   "type" => "javaStackTraceMultiline",
   "tags" => [
   [0] "multiline"
   ],
   "timestamp" => "Jan 12, 2015 9:51:18 AM",
   "class" => "org.apache.catalina.core.StandardContext",
   "logmessage" => "start"
}

{
   "message" => "Jan 12, 2015 9:51:18 AM org.apache.catalina.core.StandardContext start\nSEVERE: Context [/Recommendation] startup failed due to previous errors",
   "@version" => "1",
   "@timestamp" => "2015-01-12T04:21:18.000Z",
   "host" => "127.0.0.1:56171",
   "type" => "javaStackTraceMultiline",
   "tags" => [
   [0] "multiline"
   ],
   "timestamp" => "Jan 12, 2015 9:51:18 AM",
   "class" => "org.apache.catalina.core.StandardContext",
   "logmessage" => "start"
}

{
   "message" => "Jan 12, 2015 9:51:19 AM org.apache.catalina.loader.WebappClassLoader clearReferencesJdbc\nWARNING:JDBC driver de-registration failed for web application [/Recommendation]\njava.lang.reflect.InvocationTargetException\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:606)\n\tat org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)\n\tat java.lang.Thread.run(Thread.java:745)\nCaused by: java.lang.OutOfMemoryError: PermGen space\n\tat java.lang.ClassLoader.defineClass(ClassLoader.java:800)\n\tat org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1647)\n\tat java.lang.Class.forName(Class.java:274)\n\tat org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)\n... 1 more\n",

   "@version" => "1",
   "@timestamp" => "2015-01-12T04:21:19.000Z",
   "host" => "127.0.0.1:56171",
   "type" => "javaStackTraceMultiline",
   "tags" => [
   [0] "multiline"
   ],
   "timestamp" => "Jan 12, 2015 9:51:19 AM",
   "class" => "org.apache.catalina.loader.WebappClassLoader",
   "logmessage" => "clearReferencesJdbc"
}

Table 5: Records created when using multiline in filter section

Issues with the script

This script also has issues. One of the issues with this method is that it seems to skip log information if an empty line does not separate the previous stack trace and a new line containing log information. For example, for the data shown in Table 6, the data following the line '... 1 more' is not parsed properly. Thus, this script is also not without drawbacks, as we typically would not like to have incomplete information while performing analysis.

. . . (omitted)
   at java.lang.Class.forName(Class.java:274)
   at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
... 1 more

Jan 12, 2015 9:51:19 AM org.apache.catalina.core.StandardContext start
SEVERE: Exception during cleanup after start failed
java.lang.OutOfMemoryError: PermGen space
   at sun.misc.Unsafe.defineClass(Native Method)
. . . (omitted)

Table 6: Empty line not present in log

Approach 3 – Using multiple grok statements

To overcome the problems described in the previous section, we propose a method where we treat each line of the Java stack trace at an individual level, without merging them into one line. The script uses multiple grok statements to parse the trace information. As the script stores each line as separate record in Elasticsearch, we use one pattern per input line type. For this script, we have defined a few patterns, as shown in Table 7.

Input line	Pattern used
Jan 12, 2015 9:51:08 AM org.apache.catalina.loader.WebappClassLoader validateJarFile	CUSTOM_CATALINALOG
SEVERE: Error listenerStart	CUSTOM_WARNINGLEVEL
WARNING: JDBC driver de-registration failed for web application [/Recommendation]	CUSTOM_WARNINGLEVEL
Caused by: java.lang.OutOfMemoryError: PermGen space	CUSTOM_TRACE_CAUSEDBY
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)	CUSTOM_TRACE_OTHERS
java.lang.reflect.InvocationTargetException	CUSTOM_TRACE_EXCEPTION
... 1 more	CUSTOM_TRACE_OTHERS
Any other text	GREEDYDATA

Table 7: Patterns used

The script used to parse the log data is shown in Table 8. As depicted, we use multiple 'grok' statements, with one statement for each type of input data. In addition to the log data, each 'grok' statement adds an additional field, namely 'subType' that indicates the kind of record. Some of the values for the 'subType' field are

'timestamp' – records that begin with a timestamp
'cause' – record that begin with 'Caused By'
'continuation' – records that being with a whitespace, followed by 'at'

All text that does not match any of the patterns is read using a 'catch all' 'GREEDYDATA' pattern, with the 'subType' value being 'unrecognizedText'.

input {
  tcp {
    type => "javaStackTrace"
    port => 9999
  }
}

filter {
  if [type] == "javaStackTrace" {
    grok {
      patterns_dir => "./patterns"
      # check if the log line contains a date
      match => { "message" => "%{CUSTOM_CATALINALOG:catalinaLog}" }
      add_field => { "subType" => "timestamp" }
      remove_tag => ["_grokparsefailure"]
    }
    if "_grokparsefailure" in [tags] {
      grok {
        patterns_dir => "./patterns"
        # check if the log line is has 'caused by'
        match => { "message" => "%{CUSTOM_TRACE_CAUSED_BY:causedbyText} %{GREEDYDATA:messageText}" }
        add_field => { "subType" => "cause" }
        remove_tag => ["_grokparsefailure"]
      }
    }
    if "_grokparsefailure" in [tags] {
      grok {
        patterns_dir => "./patterns"
        # check if the log line is an error trace
        match => { "message" => "%{CUSTOM_TRACE_ERROR:errorTrace} %{GREEDYDATA:messageText}" }
        add_field => { "subType" => "errorTrace" }
        remove_tag => ["_grokparsefailure"]
      }
    }
    if "_grokparsefailure" in [tags] {
      grok {
        patterns_dir => "./patterns"
        # check if the log line is a message
        match => { "message" => "%{CUSTOM_WARNINGLEVEL:warningLevel} %{GREEDYDATA:messageText}" }
        add_field => { "subType" => "warning" }
        remove_tag => ["_grokparsefailure"]
      }
    }
    if "_grokparsefailure" in [tags] {
      grok {
        patterns_dir => "./patterns"
        # check if the log line is an exception
        match => { "message" => "%{CUSTOM_TRACE_EXCEPTION:exceptionText} %{GREEDYDATA:messageText}" }
        match => { "message" => "%{CUSTOM_TRACE_EXCEPTION:exceptionText}" }
        add_field => { "subType" => "exception" }
        remove_tag => ["_grokparsefailure"]
      }
    }
    if "_grokparsefailure" in [tags] {
      grok {
        patterns_dir => ".patterns"
        # check if the log line is part of earlier 'exception' or 'caused by'
        match => { "message" => "%{CUSTOM_TRACE_OTHERS:messageText}" }
        add_field => { "subType" => "continuation" }
        remove_tag => ["_grokparsefailure"]
      }
    }
    if "_grokparsefailure" in [tags] {
      grok {
        patterns_dir => "./patterns"
        # parse all other lines as 'unrecognizedText' so that it is not lost after parsing
        match => { "message" => "%{GREEDYDATA:unrecognizedText}" }
        add_field => { "subType" => "unrecognizedText" }
        remove_tag => ["_grokparsefailure"]
      }
    }

    mutate {
      gsub => ['message', "\t", " "]
      gsub => ['catalinaLog', "\t", " "]
      gsub => ['messageText', "\t", " "]
      gsub => ['exceptionText', "\t", " "]
      gsub => ['errorTrace', "\t", " "]
      gsub => ['unrecognizedText', "\t", " "]
    }

    ruby {
      code => "event['upload_time'] = event['@timestamp']"
    }

    mutate {
      add_field => ["upload_time_string", "%{@timestamp}"]
    }
  }
}

output {
  if "_grokparsefailure" not in [tags] {
    stdout { codec => "rubydebug" }
    elasticsearch {
      host => "localhost"
      index => "java_stack"
    }
  }
}

Table 8: Parsing input into separate records

The records generated by the script, for the sample log data are shown in Table 9.

{
   "message" => "Jan 12, 2015 9:51:08 AM org.apache.catalina.loader.WebappClassLoader validateJarFile",
   "@version" => "1",
   "@timestamp" => "2015-04-06T05:26:31.165Z",
   "host" => "127.0.0.1:56016",
   "type" => "javaStackTrace",
   "catalinaLog" => "Jan 12, 2015 9:51:08 AM org.apache.catalina.loader.WebappClassLoader validateJarFile",
   "timestamp" => "Jan 12, 2015 9:51:08 AM",
   "class" => "org.apache.catalina.loader.WebappClassLoader",
   "messageText" => "validateJarFile",
   "subType" => "timestamp",
   "uploadTimestamp" => "2015-04-06T05:26:31.165Z",
   "uploadTimestampString" => "2015-04-06 05:26:31 UTC"
}

{
   "message" => "INFO: validateJarFile(C:\\Apps\\tomcat-6.0.41\\webapps\\Recommendation\\WEB-INF\\lib\\servlet-api.jar) - jar not loaded. See Servlet Spec 2.3, section 9.7.2. Offending class: javax/servlet/Servlet.class",
   "@version" => "1",
   "@timestamp" => "2015-04-06T05:26:31.194Z",
   "host" => "127.0.0.1:56016",
   "type" => "javaStackTrace",
   "tags" => [],
   "warningLevel" => "INFO:",
   "messageText" => "validateJarFile(C:\\Apps\\tomcat-6.0.41\\webapps\\Recommendation\\WEB-INF\\lib\\servlet-api.jar) - jar not loaded. See Servlet Spec 2.3, section 9.7.2. Offending class: javax/servlet/Servlet.class",
   "subType" => "warning",
   "uploadTimestamp" => "2015-04-06T05:26:31.194Z",
   "uploadTimestampString" => "2015-04-06 05:26:31 UTC"
}

{
   "message" => "java.lang.reflect.InvocationTargetException",
   "@version" => "1",
   "@timestamp" => "2015-04-06T05:26:31.382Z",
   "host" => "127.0.0.1:56016",
   "type" => "javaStackTrace",
   "tags" => [],
   "exceptionText" => "java.lang.reflect.InvocationTargetException",
   "subType" => "exception",
   "uploadTimestamp" => "2015-04-06T05:26:31.382Z",
   "uploadTimestampString" => "2015-04-06 05:26:31 UTC"
}

{
   "message" => " at java.lang.Thread.run(Thread.java:745)",
   "@version" => "1",
   "@timestamp" => "2015-04-06T05:26:31.598Z",
   "host" => "127.0.0.1:56016",
   "type" => "javaStackTrace",
   "tags" => [],
   "messageText" => " at java.lang.Thread.run(Thread.java:745)",
   "subType" => "continuation",
   "uploadTimestamp" => "2015-04-06T05:26:31.598Z",
   "uploadTimestampString" => "2015-04-06 05:26:31 UTC"
}

{
   "message" => "Caused by: java.lang.OutOfMemoryError: PermGen space",
   "@version" => "1",
   "@timestamp" => "2015-04-06T05:26:31.642Z",
   "host" => "127.0.0.1:56016",
   "type" => "javaStackTrace",
   "tags" => [],
   "causedbyText" => "Caused by: java.lang.OutOfMemoryError: PermGen",
   "messageText" => "space",
   "subType" => "cause",
   "uploadTimestamp" => "2015-04-06T05:26:31.642Z",
   "uploadTimestampString" => "2015-04-06 05:26:31 UTC"
}

{
   "message" => "... 1 more",
   "@version" => "1",
   "@timestamp" => "2015-04-06T05:26:31.846Z",
   "host" => "127.0.0.1:56016",
   "type" => "javaStackTrace",
   "tags" => [],
   "unrecognizedText" => "... 1 more",
   "subType" => "unrecognizedText",
   "uploadTimestamp" => "2015-04-06T05:26:31.846Z",
   "uploadTimestampString" => "2015-04-06 05:26:31 UTC"
}

Table 9: Separate records created from log data

Patterns

The Logstash script used to parse the Java stack trace uses some standard patterns. Standard patterns are typically supplied with the Logstash installation and are stored in the 'grok-patterns' file (in the 'patterns' directory, under the Logstash installation). We have defined a few custom patterns that have been appended to the standard pattern file. The custom defined patterns are given in Table 10.

JAVACLASS (?:+\.)+[A-Za-z0-9$]+
JAVALOGMESSAGE (.*)
# MMM dd, yyyy HH:mm:ss eg: Jan 9, 2014 7:13:13 AM
CATALINA_DATESTAMP %{MONTH} %{MONTHDAY}, 20%{YEAR} %{HOUR}:?%{MINUTE}(?::?%{SECOND}) (?:AM|PM)
CUSTOM_WARNINGLEVEL (^SEVERE:)|(^INFO:)|(^WARNING:)|(^ERROR:)|(^DEBUG:)|(^Error:)
CUSTOM_TRACE_ERROR (^.+Error:)
CUSTOM_TRACE_EXCEPTION (^java.+Exception)|(^.+Exception.+)|(^.+Exception: .+)
CUSTOM_TRACE_CAUSED_BY (^\s*Caused by:.+)
CUSTOM_TEXT (?:[A-Za-z_.-]+)
CUSTOM_TRACE_OTHERS (^\s+at .+)|(^\s+... \d+ more)
CUSTOM_CATALINALOG %{CATALINA_DATESTAMP:timestamp} %{JAVACLASS:class} %{JAVALOGMESSAGE:messageText}

Table 10: Pattern definitions

Observations

While the script describe in approach 3 does not leave any data unparsed, it also has a few quirks.

Order of 'grok' statements

The first quirk is related to the use of 'grok' statements. When using multiple 'grok' statements, the order of specifying them is very important. If a line of data can be matched using more than one pattern, we need to determine pattern that best describes the log data being parsed presently. For example, a line in the log file may contain the word 'Exception' either as standalone text or as part of a Java named exception. If we have two patterns that contain 'Exception' as text that needs to be matched, we need to decide the pattern that will take precedence. In the most likely case, we would like to have 'Exception' as part of a Java named exception to have more precedence over 'Exception' as a word by itself.

'Related' records

The second quirk is to do with 'relatedness'. As part of machine log analytics and related reporting, it is important to display all information in a coherent manner. Using the script described in approach 3, we are splitting information from the stack trace into multiple records that are not connected to each other.

Retrieving related records

Once all the data has been inserted into Elasticsearch, the only way to identify all the records that belong to one stack trace is to make use of the '@timestamp' field. To get all records, we will need to get the timestamp of the first record in the timestamp and fetch all records whose timestamp value is near to that of the first record. But, to make that task easier, we have added a helper field, namely 'subType' into each record. Thus, using a combination of '@timestamp' and 'subType' fields, we will be able to fetch the necessary records. It should be noted that '@timestamp' is a special field that is difficult to map to a Java class, from its JSON equivalent. Hence we have added two fields (which represent the same value), namely 'upload_time' and 'upload_time_string'. The value for both fields is the same as that of '@timestamp', indicating the time when the record was parsed and uploaded to Elasticsearch. The field 'upload_time' is of type 'Date', while the field 'upload_time_string' is of type 'String'.

Conclusion

In this article, we have presented a few of the approaches commonly used to parse Java stack trace information. Most of the approaches try and merge all the lines that belong to the stack trace, into one block of information. Due to the problems associated with these approaches, we suggest that it is better to treat the stack trace information as individual pieces of information and parse them as separate records. But, by using one record per line, our business logic / search criteria have to become 'intelligent' while extracting all the records that match our analysis and reporting criteria.

References

The Logstash Book, James Turnbull
Mastering Elasticsearch, Rafal Kuc and Marek Rogozinski
Elasticsearch: The Definitive Guide, Clinton Gormley and Zachary Tong
Implementing Splunk, Big Data Reporting and Development for Operational Intelligence, Vincent Bumgarner
Big Data Analytics using Splunk, Peter Zadronzy and Raghu Kodali
LogStash: How to make a copy of the @timestamp field while maintaining the same time format?, http://stackoverflow.com/questions/25189872/logstash-how-to-make-a-copy-of-the-timestamp-field-while-maintaining-the-same
Logstash Multiline Tomcat and Apache Log Parsing, Michael Lanyon, http://blog.lanyonm.org/articles/2014/01/12/logstash-multiline-tomcat-log-parsing.html
5-minute Logstash: Parsing and Sending a Log File, http://blog.sematext.com/2013/12/19/getting-started-with-logstash/
How to Setup Logstash to read Apache logs, http://bigtako.net/how-to-setup-logstash-to-read-apache-logs/
Introduction to Logstash, http://writequit.org/articles/logstash-intro.html
Logstash Cookbook, http://perfspy.blogspot.in/2014/11/logstash-cookbook.html
Logstash Custom Parsing, https://www.loggly.com/docs/logstash-custom-parsing-loggly/
Logstash configuration dissection, https://www.ddreier.com/logstash-configuration-dissection/
What is the regexp pattern for multiline (logstash), http://stackoverflow.com/questions/24286735/what-is-the-regexp-pattern-for-multiline-logstash
Grabbing Full Java Stack Traces from Syslog-ng with Logstash, http://blog.tagged.com/2012/05/grabbing-full-java-stack-traces-from-syslog-ng-with-logstash/
Logstash multiline filter configuration and java exceptions, http://www.drawbackz.com/stack/391646/logstash-multiline-filter-configuration-and-java-exceptions.html
Logstash multline issue with java logs, http://stackoverflow.com/questions/22622532/logstash-multline-issue-with-java-logs