Comparison of process mining techniques application to flexible and unstructured processes
par HANANE ARIOUAT
Université Paris est créteil - Master 2 2014
The world in which we are living is submitted to different dynamic changes. Nowadays we live in a very competitive market, where customer's needs and expectations are always changing. Industry requirements are also changing and many mergers and acquisitions are taking place. This has led to many new different challenges for organizations. In order to gain a competitive advantage, organizations should revise, change and improve their strategic business processes, in a fast and efficient way, to avoid losing market share.
Most organizations use information systems to support the execution of their business processes . Examples of information systems supporting operational processes are Workflow Management Systems (WMS) , Customer Relationship Management (CRM) systems, Enterprise Resource Planning (ERP) systems and so on. These information systems typically support logging capabilities that register what has been executed in the organization. These produced logs usually contain data about cases (i.e. process instances) that have been executed in the organization, the times at which the tasks were executed, the persons or systems that performed these tasks, and other kinds of data.
These logs are the starting point for process mining, and are usually called event logs. The type of data in an event log determines which perspectives of process mining can be discovered. If the log (i) provides the tasks that are executed in the process and (ii) it is possible to infer their order of execution and link these tasks to individual cases (or process instances), then the control-flow perspective can be mined. The most potential next step for many applications after getting the events log is to filter it. Filtering is an iterative process. Coarse-grained scoping was done when extracting the data into an event log. Filtering corresponds to fine-grained scoping based on initial analysis results. For example, for process discovery one can decide to focus on the 10 most frequent activities to keep the model manageable .
Based on the filtered log, the different types of process mining can be applied: discovery and conformance. The primary objective of process mining is to discover process models
based on available event log data. In Discovery there is no a priori model, i.e., based on an event log, some models can be discovered and constructed based on low-level events. There exist many techniques to automatically construct process models (e.g., in terms of a Petri nets) based some event log. In this thesis we focus only on three algorithms which are: Alpha, Heuristic and Genetic algorithms. In Conformance, there is a priori model. This model is compared with the event log and discrepancies between the log and the model are analyzed.
Many free and commercial software's framework for the use and implementation of process mining algorithms have been developed, in this paper we use the open-source process mining toolkit, which is the ProM Framework.
As mentioned before, there are many process mining algorithms with different theoretical foundations and aims, raising the question of how to choose the best for a particular situation. Most of these algorithms perform well on structured processes with little disturbances. However, in reality it is difficult to determine the scope of a process and typically there are all kinds of disturbances. As a result, process mining techniques produce spaghetti-like models that are difficult to read and that attempt to merge unrelated cases. There is a need for methods for objectively comparing process mining algorithms against known characteristics of business process models and logs.
An approach to overcome this is to cluster process instances (a process instance is manifested as a trace and an event log corresponds to a multi-set of traces) such that each of the resulting clusters correspond to a coherent set of process instances that can be adequately represented by a process model. For this aims, we have used the clustering algorithm and the profile concept proposed by Song et al  and we proposed a new approach to traces clustering based on logical operators. In our approach we define another distance measure between traces and clusters center. We use the XOR operator to calculate the distance between traces and the clusters center and we use the AND operator to calculate the new clusters centers.
The rest of this thesis is organized as follows:
In the first chapter we gives an overview about Business process, starting by short definition, their management, life cycle and some business processes modeling languages. Then we present the main Process mining concepts, like Event logs, log filtering and process mining perspectives and control-flow discovery. These two section are followed by the Evaluation of process mining section, in which, different evaluation metrics are exposed.
The second chapter is devoted to the presentation of the ProM framework which is a powerful Process mining tool. In this chapter we mainly present some mining plug-ins