Centralized Log Management, Security integration development for capturing, indexing and analysis of unstructured and structured data from security endpoints

CUSTOMER BACKGROUND: A US based leading centralized log management product Company providing solution for capturing, storing and enabling real-time analysis of terabytes of machine data. Over 50,000+ open source users and more than 300 enterprise customers, IT, Network, Security, and DevOps teams rely on enterprise features to manage operations, explore data, trace errors, detect threats quickly and find meaning in data easily and take faster action. PROJECT GOALS: Log collection from several security endpoints such as CISCO, Juniper, Zscaler, CrowdStrike, Microsoft Defender, McAfee, Blue Coat, Fortinet, Sophos, Symantec, Qualys, Tenable.IO and so on.
  • Log storing, indexing, mapping unstructured data into structured data
  • Develop several Java plugins to integrate with GCP, Google Workspace, CarbonBlack, Azure Cloud and more as following services:
    • Fetch logs, based on log types VPC Flow logs, Audit logs, Firewall logs defined in the application configuration settings
    • Query the fetched logs at scheduled intervals
    • Normalize the classic plain logs to customer’s standard format that facilitates compressing
    • Ingest the logs into application for analysis
THE CHALLENGE: As the Customer is expanding their Product footprint to support several Network Security endpoints, sourcing logs from each of the leading endpoints available in the market is practically impossible, as this involves traditional costs associated in sourcing devices, configuring on prem-cloud, and most importantly, to have Network security analysts with an expertise to understand, scaleup, configure and generate traffic from these Firewall devices to collect log types. Choosing to maintain customer privacy, companies do not share their logs to the community. In addition, old formats and truncated dump found on repositories and so on further adds to the challenge. In addition, old formats and truncated dump found on repositories and so on further adds to the challenge. RESULTS: Log extractions from industry leading security endpoints helped customer visualize all types of logs in one place, enabling advanced analytics, seamless integration with security orchestrations, compliance and operations. From a business perspective, this shift enabled customer growth and reduced the costs to revenue generator. APPROACH:
Log Collection
Loginsoft’s infrastructure security team successfully created labs with minimal to no cost in setting up a real time environment with latest version of Firewalls, Routers, Switches and Virtual PCs. Traffic is generated through firewall by creating rules to replicate real time Malicious attacks which are mitigated while allowing Authentic traffic without any interference to users. Various log types such as Event, Firewall, Windows system, Windows security, PCAP are collected from the Endpoints and then stored to analyze, parse and ingest as per the Model Schema. This helps in preventing attacks and security breaches in real-time. User Documentation is created with configuration steps to assist user in understanding the Firewall and different types of logs generated.
Leveraged BigQuery to analyze large sizes of enterprise data
Google Cloud Platform provides a conventional way of fetching the logs through Cloud Logging API. BigQuery leverages the power of Cloud Logging API by handling huge data with good performance and minimizes data losses and duplicates. Integration Approach
  • Service Account is used for authentication as it supports app to app authorization and does not involve human intervention
  • Logs streaming from Cloud Logging API are routed to a logging sink
  • Logging Sink filters the logs based on log types configured in the application
  • Destination for logging sink is BigQuery Data Set
  • Java application creates BigQuery Data Set and tables are auto created inside the dataset when the logs are available. The schema for the table is based on log type
  • Unique tables are created for each log type and on a daily basis
  • Java application queries the BigQuery tables at scheduled intervals to fetch the logs and ingest into the application. The results of the BigQuery tables could be fetched in csv or in json formats
  • After fetching the logs, the tables are deleted by the application based on table creation time. This will ensure storage is minimized
BigQuery Data set, tables and log entry on the console

Pub/Sub was an alternative approach, but BigQuery was considered based on performance.

Google Workspace Logs


Logs are fetched through Reports API that programmatically retrieves the activity and usage reports. Domain Wide Delegation is an approach to authorize a third-party application to access Admin SDK Reports API. A service account in Google Cloud Platform is authorized through Domain Wide Delegation to access the logs from Workspace.

  • Authentication is done using service account for app-to-app authorization
  • This service account has domain wide delegation enabled and can access workspace logs
  • The Java application connects to the service account and pulls the activity report. The admin activity report lists all activities of all administrators and is organized by event names
  • User can visualize admin, drive, login, calendar, token logs using Reports API

Gmail logs to BigQuery
Gmail logs can be fetched through BigQuery by setting up configurations to export Gmail logs into BigQuery by specifying the service account and the BigQuery dataset name. This feature is available to Enterprise and Education subscriptions with Standard and Plus subtypes.

When email logs are turned on, BigQuery Dataset creates a template table as daily table_ which is used as schema table. The daily tables are auto created based on availability of logs.