Anomaly Detection Engine Update – Esopus Creek

By May 13, 2016 August 16th, 2016 Blog

Esopus Creek is a 65-mile tributary of the Hudson river and like its namesake the first update release of the IBM contributed Anomaly Detection Engine for Linux Logs is now flowing into the wider open source community if you excuse the pun.

In this blog James Caffrey the maintainer for the project provides an update on the progress of the code base. However, first to set the scene, Anomaly Detection Engine (ADE) was a code base around detecting anomalies in Linux logs, contributed by IBM via the Open Mainframe Project into the open source community back at the beginning of the year.

The new delivery of ADE Esopus Creek has four upgrades contributed by the community:

  • support for MariaDB(tm)
  • verify command – is there sufficient information to create a model
  • fixes to additional SonarQube(tm) issues
  • wiki topics
    • example of how to tailor the output of ADE
    • how to contribute to ADE

MariaDB(tm) support

ADE now supports MariaDB.  The Esopus Creek version has been updated to account for the SQL differences between Derby and MariaDB.

Why was the verify command added

To understand why the verify command is important, the way that ADE detects unusual time slices and messages needs to be explained:

ADE assumes that production Linux systems follow a predictable pattern and that differences between from the expected behavior is unusual. ADE is able to detect unusual time slices and unusual messages in Linux logs by comparing the Linux logs for the time periods of interest with the expected behavior of Linux logs.    When the ADE command train runs, it builds a model of the expected behavior which is used during analyze to check the behavior of that time period. Train and analyze uses statistical learning to find what is unusual.

For the statistical learning algorithms to generate helpful results, they need sufficient information to build a valid model.  The verify command checks to see if there is sufficient information available to create a valid model before the model is created.  The java class VerifyLinuxTraining invoked by verify checks if the number of unique message ids is sufficient for the number of intervals included in the training period.  The algorithm is designed to handle both high and low message traffic volume Linux systems.

The verify command, the java class, and the data science have been added to https://github.com/openmainframeproject/ade.

How to tailor the output of ADE

For each interval and message analyzed by ADE, ADE creates xml that is written to a file.  ADE also provides xslt files that convert the results into html so that the ADE results can be viewed with a standard web browser.  It writes the xslt files to the appropriate places, so that all you need to do to look at the results is point your browser to the file of interest. The ADE wiki now contains examples of how to change the xslt to:

  • sort by column values
  • change the order the columns and remove columns which might not be useful

The xslt files are shipped as part of ADE and can be tailored to meet your needs.  See https://github.com/openmainframeproject/ade/wiki/Hints-on-how-to-update-XSLT  for details. For more details on how the files are stored on disk, so you can find the one you want see http://openmainframeproject.github.io/ade/.

The next delivery of ADE – Poesten Kill will focus on reducing the cost of adding new analytics to ADE.  Look for ADE on Slack at #anomaly-detection and follow the OpenMainframe project @OpenMFProject on Twitter for announcements about ADE.