Handling Multiline Log formats using Regex and GROK Parser

July 20, 2023

To effectively analyze logs from multiple products, security operations teams must first comprehend the diverse landscape of log types. We will provide an overview of common log types encountered, such as system logs, application logs, network logs, and security logs. By understanding the characteristics and formats of each log type, teams can better prepare for the complexities that lie ahead.

Security Operations teams face challenges in analyzing different log types from Multiple Products. A few products have complicated log structures which requires advanced Rules and GROK pattern to extract the fields from RAW message.

The Challenge of Complicated Log Structures:

Certain products generate logs with intricate structures that pose challenges for analysis. We will examine the reasons behind these complexities, including proprietary log formats, inconsistent field naming conventions, and unstructured log data. Through examples, we will showcase the difficulties faced by security operations teams and how these complicated log structures can hinder their ability to extract relevant information effectively.

Regex and GROK Patterns – Unleashing the Power of Pattern Matching and Log Parsing:

Regular expressions, or regex, are a powerful tool for pattern matching in log analysis. We will explore techniques such as using anchors, modifiers, quantifiers, and capture groups to identify and extract relevant data from multiline log entries.

GROK patterns are a powerful tool for log parsing, enabling security operations teams to extract fields from raw log messages efficiently. Through practical examples, we will demonstrate how GROK patterns can be customized to handle complex log structures and extract valuable information. We will also highlight the importance of maintaining a GROK pattern library for consistent and scalable log analysis.

Overcoming Log Analysis Challenges:

We will address the specific challenges encountered by security operations teams when analyzing logs from multiple products with diverse log structures. We will discuss issues such as data normalization, log integration, and log source identification. Moreover, we will provide strategies and techniques to overcome these challenges, including log aggregation, log enrichment, and normalization processes.

This blog explains about analyzing and converting F5 BIG-IP logs which give different Timestamp formats and Multiple lines in one single Log into queryable/readable format.

Objective: Processing Complex log that have irregular or inconsistent patterns with various tools and frameworks.

  1. Understand the log format: Familiarize with the structure and format of the log messages. Identify the different components, fields, and patterns within the logs.
  2. Define the parsing strategy: Determine the approach used to parse the logs. This can include using regular expressions (regex), Grok patterns, or specific log parsing libraries or frameworks.
  3. Identify key fields: Identify the specific fields or information to extract from the logs. These include timestamps, log levels, error codes, user IDs, or any other relevant data.
  4. Writing Parser: Define regex/grok patterns that capture the required information and use them to extract the data using pipelines. Pipeline processes the incoming log messages by extracting relevant information, performs transformation using parser and takes actions based on condition.
  5. Utilize log parsing libraries or frameworks: For more complex log formats, leverage log parsing libraries or frameworks that provide built-in functionality to handle log parsing. Examples include Logstash, Elasticsearch, Fluentd, Apache Kafka, or specific language-specific log parsing libraries.
  6. Test and refine: Test parsing strategy and patterns against sample log messages to ensure they accurately extract the desired fields. Adjust and refine the approach as needed.
  7. Process and analyse: Once the logs are successfully parsed and extracted the relevant fields to process and analyse the data. This might involve storing the data in a database, performing aggregations or calculations, generating reports, or integrating it with other systems.

F5 BIG-IP logs

An F5 BIG-IP load balancer distributes the communications evenly across the servers in a network, so that no single server is overwhelmed. The BIG-IP keeps a constant check on the incoming and outgoing traffic of the servers and it will route the user requests to the most available server that can best handle them.

It also improves application performance, scalability and reliability while enhancing security and user experience.

Encountering various timestamps formats

For example,

“May 11, 2023, 8:54:13 AM,” the timestamp format does not have a default Grok pattern.

To extract the above timestamp, define a custom Grok pattern using below regular expression which captures the timestamp components (month, day, year, hour, minute, second, AM/PM) and assigns them to the field vendor_timestamp.


(? [A-Z][a-z]{2,3} \d{1,2}, \d{4} \d{1,2}:\d{2}:\d{2} [AP]M).
.code-block { font-family: monospace; background-color: rgb(255, 255, 255); padding: 24px; /* Block padding all around */ border-radius: 8px; overflow-x: auto; /* Enable horizontal scrolling for long lines */}

Below is the sample logs with Multiple Timestamp formats.


webui INFO: Deployment of configuration descriptor /etc/tomcat/Catalina/localhost/tmui.xml has finished in 55,602 ms May 10, 2023 6:34:02 AM org.apache.catalina.startup.HostConfig deployDescriptor
webui WARNING: [SetPropertiesRule]{Server/Service/Engine/Host} Setting property 'xmlValidation' to 'false' did not find a matching property. May 12, 2023 5:35:14 AM org.apache.tomcat.util.digester.SetPropertiesRule begin usage: java org.apache.catalina.startup.Catalina [ -config {pathname} ] [ -nonaming ] { -help | start | stop } Fri May 12 05:35:11 PDT 2023
webui 2023-05-10T13:34:02Z ERROR [Thread-4] controller.SubscriberServlet:subscribe : MCP subscribe error: Unable to read POST response data java.net.ConnectException: Connection refused (Connection refused)
.code-block { font-family: monospace; background-color: rgb(255, 255, 255; /* BG color with 6% opacity */ padding: 24px; /* Block padding all around */ border-radius: 8px; overflow-x: auto; /* Enable horizontal scrolling for long lines */}

Regex for Multiline log formats


(\t+)?(?[\w\W\.\d\(\):]+$)
.code-block { font-family: monospace; background-color: rgba(255, 255, 255, 0.06); /* BG color with 6% opacity */ border: 1px solid rgba(255, 255, 255, 0.2); /* Stroke color with 20% opacity */ padding: 24px; /* Block padding all around */ border-radius: 16px; overflow-x: auto; /* Enable horizontal scrolling for long lines */}

Below is the sample for Multiline log.


webui SEVERE: Servlet.service() for servlet [org.apache.jsp.tmui.overview.welcome.introduction_jsp] in context with path [/tmui] threw exception May 24, 2023 11:14:39 PM org.apache.catalina.core.StandardWrapperValve invoke java.lang.NullPointerException
at com.f5.util.UsernameHolder.getUsername(UsernameHolder.java:72)
at com.f5.util.UsernameHolder.updateConnection(UsernameHolder.java:270)
at com.f5.util.UsernameHolder.updateConnection(UsernameHolder.java:245)
.code-block { font-family: monospace; background-color: rgba(255, 255, 255, 0.06); /* BG color with 6% opacity */ border: 1px solid rgba(255, 255, 255, 0.2); /* Stroke color with 20% opacity */ padding: 24px; /* Block padding all around */ border-radius: 16px; overflow-x: auto; /* Enable horizontal scrolling for long lines */}

Regex with GROK

Optional GROK

Below is the GROK with REGEX pattern that parses logs with multiline and different timestamp formats.


%{WORD:logtype} (%{TIMESTAMP_ISO8601:event_created})?%{SPACE}(%{LOGLEVEL:log_level})?((%{GREEDYDATA:message})?(? [A-Z][a-z]{2,3} \d{1,2}, \d{4} \d{1,2}:\d{2}:\d{2} [AP]M) %{DATA:class} %{WORD:action})?( usage: java %{DATA:class1} \[ -config %{DATA:config_path} \] \[ -nonaming \](.*)? %{DATESTAMP_OTHER:timestamp})?(?[^\t]+)?(\t+)?(?[\w\W\.\d\(\):]+$)?
.code-block { font-family: monospace; background-color: rgba(255, 255, 255, 0.06); /* BG color with 6% opacity */ border: 1px solid rgba(255, 255, 255, 0.2); /* Stroke color with 20% opacity */ padding: 24px; /* Block padding all around */ border-radius: 16px; overflow-x: auto; /* Enable horizontal scrolling for long lines */}

Analyzing logs from multiple products with complex log structures presents significant challenges for security operations teams. However, with the right approach, including the use of advanced rules and GROK patterns, these challenges can be overcome. By understanding diverse log types, leveraging advanced techniques, and embracing automation, security operations teams can extract valuable insights from log data, enabling them to proactively detect and respond to potential security incidents effectively.

About Loginsoft

For over 20 years, leading companies in Telecom, Cybersecurity, Healthcare, Banking, New Media, and more have come to rely on Loginsoft as a trusted resource for technology talent. From startups, to product and enterprises rely on our services. Whether Onsite, Offsite, or Offshore, we deliver. With a track record of successful partnerships with leading technology companies globally, and specifically in the past 6 years with Cybersecurity product companies, Loginsoft offers a comprehensive range of security offerings, including Software Supply Chain, Vulnerability Management, Threat Intelligence, Cloud Security, Cybersecurity Platform Integrations, creating content packs for Cloud SIEM, Logs onboarding and more. Our commitment to innovation and expertise has positioned us as a trusted player in the cybersecurity space. Loginsoft continues to provide traditional IT services which include Software development & Support, QA automation, Data Science & AI, etc.

Expertise in Integrations with Threat Intelligence and Security Products: Built more than 250+ integrations with leading TIP, SIEM, SOAR, and Ticketing Platforms such as Cortex XSOAR, Anomali, ThreatQ, Splunk, IBM QRadar & Resilient, Microsoft Azure Sentinel, ServiceNow, Swimlane, Siemplify, MISP, Maltego, Cryptocurrency Digital Exchange Platforms, CISCO, Datadog, Symantec, Carbonblack, F5, Fortinet, and so on. Loginsoft is a partner with industry leading technology vendors Palo Alto, Splunk, Elastic, IBM Security, etc.

In addition, Loginsoft offers Research as a service: We're more than just experts in cybersecurity; we're your accredited in-house research team focused on unraveling the complexities of cybersecurity and future technologies. From Application Security to Threat Research, our seasoned professionals have cultivated expertise in every facet of the field. We've earned the trust of over 20 security platform companies, who count on our research and analysis to strengthen their cybersecurity solutions.

Interested to learn more? Let’s start a conversation.

Get notified

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

BLOGS AND RESOURCES

Latest Articles