Event logs are everywhere and represent a prime source of Big Data. Event
log sources run the gamut from e-commerce web servers to devices participating
in globally distributed Internet of Things (IoT) architectures. Even Enterprise
Resource Planning (ERP) systems produce event logs! Given the rich and varied
data contained in event logs, mining these assets is a critical skill needed by every
Data Scientist, Business/Data Analyst, and Program/Product Manager.
At the meetup for this topic, presenter David Langer, showed how easy it is to get
started mining your event logs using the OSS tools of R and ProM.
David began the talk by defining which features of a dataset are important for
event log mining:
Activity: A well-defined step in some workflow/process.
Timestamp: The date and time at which something worthy of note happened.
Resource: Staff and/or other assets used/consumed in execution of an activity.
Event: At a minimum, the combination of an activity and a timestamp. Optionally,
events may have associated resources, lifecycle, and other data.
Case: A related set of events denoted, and connected, by a unique identifier where
the events can be ordered.
Event Log: A list of cases and associated events.
Trace: A distinct pattern of case activities within an event log where each activity is
present at most once per trace. Event log typically contain many traces.
Below is an example of IIS Web Server data that may be used for mining: