/
/
Handling Multiline Log formats using Regex and GROK Parser

Handling Multiline Log formats using Regex and GROK Parser

Article
July 20, 2023
Profile Icon

Jason Franscisco

Handling Multiline Log formats using Regex and GROK Parser

To effectively analyze logs from multiple products, security operations teams must first comprehend the diverse landscape of log types. We will provide an overview of common log types encountered, such as system logs, application logs, network logs, and security logs. By understanding the characteristics and formats of each log type, teams can better prepare for the complexities that lie ahead.

Security Operations teams face challenges in analyzing different log types from Multiple Products. A few products have complicated log structures which requires advanced Rules and GROK pattern to extract the fields from RAW message.

The Challenge of Complicated Log Structures:

Certain products generate logs with intricate structures that pose challenges for analysis. We will examine the reasons behind these complexities, including proprietary log formats, inconsistent field naming conventions, and unstructured log data. Through examples, we will showcase the difficulties faced by security operations teams and how these complicated log structures can hinder their ability to extract relevant information effectively.

Regex and GROK Patterns – Unleashing the Power of Pattern Matching and Log Parsing:

Regular expressions, or regex, are a powerful tool for pattern matching in log analysis. We will explore techniques such as using anchors, modifiers, quantifiers, and capture groups to identify and extract relevant data from multiline log entries.

GROK patterns are a powerful tool for log parsing, enabling security operations teams to extract fields from raw log messages efficiently. Through practical examples, we will demonstrate how GROK patterns can be customized to handle complex log structures and extract valuable information. We will also highlight the importance of maintaining a GROK pattern library for consistent and scalable log analysis.

Overcoming Log Analysis Challenges:

We will address the specific challenges encountered by security operations teams when analyzing logs from multiple products with diverse log structures. We will discuss issues such as data normalization, log integration, and log source identification. Moreover, we will provide strategies and techniques to overcome these challenges, including log aggregation, log enrichment, and normalization processes.

This blog explains about analyzing and converting F5 BIG-IP logs which give different Timestamp formats and Multiple lines in one single Log into queryable/readable format.

Objective: Processing Complex log that have irregular or inconsistent patterns with various tools and frameworks.

  1. Understand the log format: Familiarize with the structure and format of the log messages. Identify the different components, fields, and patterns within the logs.
  2. Define the parsing strategy: Determine the approach used to parse the logs. This can include using regular expressions (regex), Grok patterns, or specific log parsing libraries or frameworks.
  3. Identify key fields: Identify the specific fields or information to extract from the logs. These include timestamps, log levels, error codes, user IDs, or any other relevant data.
  4. Writing Parser: Define regex/grok patterns that capture the required information and use them to extract the data using pipelines. Pipeline processes the incoming log messages by extracting relevant information, performs transformation using parser and takes actions based on condition.
  5. Utilize log parsing libraries or frameworks: For more complex log formats, leverage log parsing libraries or frameworks that provide built-in functionality to handle log parsing. Examples include Logstash, Elasticsearch, Fluentd, Apache Kafka, or specific language-specific log parsing libraries.
  6. Test and refine: Test parsing strategy and patterns against sample log messages to ensure they accurately extract the desired fields. Adjust and refine the approach as needed.
  7. Process and analyse: Once the logs are successfully parsed and extracted the relevant fields to process and analyse the data. This might involve storing the data in a database, performing aggregations or calculations, generating reports, or integrating it with other systems.

F5 BIG-IP logs

An F5 BIG-IP load balancer distributes the communications evenly across the servers in a network, so that no single server is overwhelmed. The BIG-IP keeps a constant check on the incoming and outgoing traffic of the servers and it will route the user requests to the most available server that can best handle them.

It also improves application performance, scalability and reliability while enhancing security and user experience.

  • F5 BIG-IP having complex log structure with multiple formats in a single log type. This kind of log requires appropriate parsing techniques and tools to ensure desired field extraction.
  • In this blog, F5 BIG-IP WEBUI logs is taken as an example to parse using GROK and REGEX patterns.

Encountering various timestamps formats

  • F5 BIG-IP WEBUI logs consist of multiple timestamp formats.
  • Grok provides default patterns for commonly used timestamp formats, making it easier to extract timestamps without writing custom regular expressions.
  • However, there may be cases where a specific timestamp format doesn’t have a default Grok pattern. In such scenarios, can use custom regular expressions that matches the desired timestamp format. This ensures accurate extraction of timestamps from log messages.

For example,

“May 11, 2023, 8:54:13 AM,” the timestamp format does not have a default Grok pattern.

To extract the above timestamp, define a custom Grok pattern using below regular expression which captures the timestamp components (month, day, year, hour, minute, second, AM/PM) and assigns them to the field vendor_timestamp.


(? [A-Z][a-z]{2,3} \d{1,2}, \d{4} \d{1,2}:\d{2}:\d{2} [AP]M).

Below is the sample logs with Multiple Timestamp formats.


webui INFO: Deployment of configuration descriptor /etc/tomcat/Catalina/localhost/tmui.xml has finished in 55,602 ms May 10, 2023 6:34:02 AM org.apache.catalina.startup.HostConfig deployDescriptor
webui WARNING: [SetPropertiesRule]{Server/Service/Engine/Host} Setting property 'xmlValidation' to 'false' did not find a matching property. May 12, 2023 5:35:14 AM org.apache.tomcat.util.digester.SetPropertiesRule begin usage: java org.apache.catalina.startup.Catalina [ -config {pathname} ] [ -nonaming ]  { -help | start | stop } Fri May 12 05:35:11 PDT 2023
webui 2023-05-10T13:34:02Z ERROR  [Thread-4] controller.SubscriberServlet:subscribe     : MCP subscribe error: Unable to read POST response data java.net.ConnectException: Connection refused (Connection refused)

Regex for Multiline log formats

  • Multiline logs often have more complex structures compared to single-line logs. They may span multiple lines and contain line breaks, making it difficult to extract information using predefined Grok patterns.
  • Below Regex pattern provides more flexibility and control over capturing patterns across multiple lines.

(\t+)?(?[\w\W\.\d\(\):]+$)
  • The above pattern matches with line breaks and tab spaces, captures multiline format till the end of the string.

Below is the sample for Multiline log.


webui SEVERE: Servlet.service() for servlet [org.apache.jsp.tmui.overview.welcome.introduction_jsp] in context with path [/tmui] threw exception May 24, 2023 11:14:39 PM org.apache.catalina.core.StandardWrapperValve invoke java.lang.NullPointerException 
     at com.f5.util.UsernameHolder.getUsername(UsernameHolder.java:72) 
     at com.f5.util.UsernameHolder.updateConnection(UsernameHolder.java:270)  
     at com.f5.util.UsernameHolder.updateConnection(UsernameHolder.java:245)  

Regex with GROK

  • Using GREEDYDATA in the middle of the log message captures all the characters remaining in the log line, which affects the parsing complexity.
  • With help of REGEX patterns, it provides the flexibility to capture the required fields without compiling unnecessary fields. This improves the performance of the parser.
  • Customize Parsing: GROK comes with a set of predefined patterns, but sometimes log formats are unique and not covered by those predefined patterns.
  • By using regex within GROK, custom patterns belonging to the specific log format are created allowing for more precise parsing and extraction of data.

Optional GROK

  • F5 BIG-IP WebUI logs have different formats or structures depending on the specific events or actions being logged.
  • By using optional Grok patterns, extract these optional fields when they are present and ensure that the log parsing system can handle different log formats without encountering parsing errors.
  • The parser skips the pattern when the fields are absent which helps in handling different log format in a single pattern.

Below is the GROK with REGEX pattern that parses logs with multiline and different timestamp formats.


%{WORD:logtype} (%{TIMESTAMP_ISO8601:event_created})?%{SPACE}(%{LOGLEVEL:log_level})?((%{GREEDYDATA:message})?(? [A-Z][a-z]{2,3} \d{1,2}, \d{4} \d{1,2}:\d{2}:\d{2} [AP]M) %{DATA:class} %{WORD:action})?( usage: java %{DATA:class1} \[ -config %{DATA:config_path} \] \[ -nonaming \](.*)? %{DATESTAMP_OTHER:timestamp})?(?[^\t]+)?(\t+)?(?[\w\W\.\d\(\):]+$)?

Analyzing logs from multiple products with complex log structures presents significant challenges for security operations teams. However, with the right approach, including the use of advanced rules and GROK patterns, these challenges can be overcome. By understanding diverse log types, leveraging advanced techniques, and embracing automation, security operations teams can extract valuable insights from log data, enabling them to proactively detect and respond to potential security incidents effectively.

Explore Cybersecurity Platforms

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros.

Learn more
white arrow pointing top right

About Loginsoft

For over 16 years, leading companies in Telecom, Cybersecurity, Healthcare, Banking, New Media and more have come to rely on Loginsoft as a trusted resource for technology talent. Whether Onsite, Offsite, or Offshore, we deliver.

Loginsoft is a leading Cybersecurity services company providing Security Advisory Research to generate metadata for vulnerabilities in Open source components, Discovering ZeroDay Vulnerabilities, Developing Vulnerability Detection signatures using MITRE OVAL Language.

Expertise in Integrations with Threat Intelligence and Security Products, integrated more than 200+ integrations with leading TIP, SIEM, SOAR and Ticketing Platforms such as Cortex XSOAR, Anomali, ThreatQ, Splunk, IBM QRadar, IBM Resilient, Microsoft Azure Sentinel, ServiceNow, Swimlane, Siemplify, MISP, Maltego, Cryptocurrency APIs with Digital Exchange Platforms, CISCO, Datadog, Symantec, Carbonblack, F5, Fortinet and so on.

Interested to learn more? Let’s start a conversation.

Book a meeting

IN-HOUSE EXPERTISE

Latest Articles

Get practical solutions to real-world challenges, straight from experts who conquered them.

View all our articles

Sign up to our Newsletter