Principal Investigator,
Oak Ridge Cyber Analytics

October 2008 - current

Collaborators: Robert Patton, Chris Symons, Xiaohui Cui, Brian Klump, Tom Potok, Jim Treadwell, Matthew Schultz

Sponsor: Advanced Systems Group, Lockheed Martin Corporation

Funding: $1,110,000 to date

Video Link: http://www.youtube.com/watch?v=EhzXWXl0jSc


Project Description:

Oak Ridge Cyber Analytics (ORCA) is a system that performs the aggregation and analysis of cyber security data for intrusion detection.  It is comprised of a near-real-time Correlation Engine that analyzes cyber events as they arrive and an Analysis Engine that provides operator-directed insight into retrieved collections of cyber security data. The ORCA architecture is shown in Figure 1.  ORCA is a combination of custom-built and third party software in order to aggregate the cyber security data from the various sources.  Network alerts and log files are produced by off-the-shelf IDSs, analysis tools, and logging mechanisms.  These alerts and logs are aggregated in an off-the-shelf data warehouse product called Splunk.

Last updated 03/01/2011

Figure 1: ORCA Architecture

ORCA is designed to store cyber security events in near real-time and provide access to those events via a retrieval interface.  It leverages the Splunk information technology database product for aggregation and storage of events logged by off-the-shelf intrusion detection systems and system monitors. In addition, ORCA includes a package called the Network Data Feature Extractor that analyzes all incoming network traffic and keeps statistics on the state of the network.  A Cyber Data Fusion interface provides access to both the events stored in Splunk and the network statistics, and for each event returns an object containing the Splunk alert and the network statistics at the time the alert occurred.  The Analysis Interface provides low-level processing of the collection of retrieved events.  Several analysis tools interact with the Analysis Interface to provide more advanced computation and processing of the collection of cyber events.


The Correlation Engine uses semi-supervised machine learning to more reliably distinguish the real attacks highlighted by intrusion detection system alerts from expected events.  This engine is an automated process that cyclically retrieves the collection of cyber events for the most recent time slice and applies the semi-supervised machine learning to each event.  Based on the trained model, a determination is made for each event whether to notify that an attack was detected.


The Analysis Engine is a toolbox of advanced analysis techniques intended to provide an operator with insight into the underlying patterns in the cyber security data.  The Event Viewer (see Figure 2) provides a temporal view of the events and connections relative to each other.  The Cluster View (see Figure 3) applies raw text analysis and clustering algorithms to provide an organization of the events.  The Swarm View (see Figure 4) analyzes behavior patterns over time to highlight behaviors that are consistent with known attack patterns. The Categorization View (see Figure 5) provides insight into the automated classification of events based on the raw alert text and the extracted network features.


ORCA is highly configurable.  Operators have the flexibility to customize the terms and phrases used for categorization.  In addition, operators can label events as particular types of attack or normal behavior that, in turn, is used as the example data for training the machine learners.

Figure 2: ORCA Event Viewer

Figure 3: ORCA Cluster Viewer

Figure 4: ORCA Swarm Viewer

Figure 5: ORCA Categorization Viewer