Introduction to Event Log Mining with R

Meetup Video

Event logs are everywhere and represent a prime source of Big Data. Event
log sources run the gamut from e-commerce web servers to devices participating
in globally distributed Internet of Things (IoT) architectures. Even Enterprise
Resource Planning (ERP) systems produce event logs! Given the rich and varied
data contained in event logs, mining these assets is a critical skill needed by every
Data Scientist, Business/Data Analyst, and Program/Product Manager.

At the meetup for this topic, presenter David Langer, showed how easy it is to get
started mining your event logs using the OSS tools of R and ProM.

David began the talk by defining which features of a dataset are important for
event log mining:

Activity: A well-defined step in some workflow/process.
Timestamp: The date and time at which something worthy of note happened.
Resource: Staff and/or other assets used/consumed in execution of an activity.
Event: At a minimum, the combination of an activity and a timestamp. Optionally,
events may have associated resources, lifecycle, and other data.
Case: A related set of events denoted, and connected, by a unique identifier where
the events can be ordered.
Event Log: A list of cases and associated events.
Trace: A distinct pattern of case activities within an event log where each activity is
present at most once per trace. Event log typically contain many traces.

Below is an example of IIS Web Server data that may be used for mining:

In this example, the traces for this event log are:
1. portal, dashboard, purchaseorderreport
2. portal, help, contactus
3. portal, myteam, expensereports

David proceeded his talk with a live demo using the Incident Activity Records dataset from the 2014 Business Processing Intelligence Challenge (BPIC).

About the Meetup

In this presentation hosted by Data Science Dojo, David Langer discussed:
• The scenarios and benefits of event log mining
• The minimum data required for event log mining
• Ingesting and analyzing event log data using R
• Process Mining with ProM
• Event log mining techniques to create features suitable for Machine Learning models
• Where you can learn more about this very handy set of tools and techniques

About the Speaker

David is a veteran BI, analytics, and data science professional. He manages a team of technical Program Managers that own the mission-critical data warehouse, BI, big data, and analytics platforms used to run Microsoft’s $10+ Billion supply chain. While obsessed about everything in Data Science, David’s current passions are in text analytics, event log mining, and mathematical programming. David obtained a BA in Economics and a MS in Computer Science from the University of Washington.

For more videos and tutorials by David, visit his youtube channel.

Source Code

David’s source code can be viewed and cloned here, at his GitHub repository for this meetup. To clean and process the dataset, he ran through his R script step-by-step. David installed the R package, edeaR, which was specifically used to analyze and mine the dataset. After cleaning the dataset, he loaded the new csv file into the process mining workbench tool, ProM, for visualization. The visualization created helped gain insights about the flow of incident activities from open to close.