Journal of Computer Engineering & Information TechnologyISSN : 2324-9307

Reach Us +1 850 754 6199
All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Research Article, J Comput Eng Inf Technol Vol: 5 Issue: 1

Process Mining as a Business Process Discovery Technique

Omar AlShathry*
Department of Information Systems, Imam Mohammed Bin Saud University, KSA, Saudi Arabia
Corresponding author : Omar AlShathry
Department of Information Systems, Imam Mohammed Bin Saud University, KSA, Saudi Arabia
Tel: 011-25 81875
E-mail: [email protected]
Received: January 28, 2016 Accepted: February 15, 2016 Published: February 22, 2016
Citation: AlShathry O (2016) Process Mining as a Business Process Discovery Technique. J Comput Eng Inf Technol 5:1. doi:10.4172/2324-9307.1000141


Process Mining as a Business Process Discovery Technique

Business Process Management (BPM) is an important aspect of organizations excellence and global competitiveness. The main indicator of an efficient BPM in place is the level of conformance of its implementation to its original process model. Some processes are implemented based on what IT people think rather than what process guidelines state. Process Mining is a promising technique for extracting a process model based on its real time behavior. In this research, an ITIL compliant Incident Management Process of a renowned Telecom provider was selected to construct its process model. Results of applying three mining algorithms showed that there are different process scenarios implemented by the process engine. This entails further investigation to identify the most credible scenario to the original business process requirements. This research is part of on-going research to compare the efficiency of process mining tools over formal inspection techniques as a process discovery approach.

Keywords: BPM; Process mining; Process discovery.


BPM; Process mining; Process discovery.


Process discovery & conformance
Business processes documentation is a major indication of a healthy business process management (BPM) in organizations [1]. Documentation artifacts like (process charts, activities, policies, governance etc.) construct the process approach concept within the organization, which, in turn, has a positive impact on the increased profit and overall value [2,3]. For optimum process performance, a business process should be implemented and executed as per the process policies or according to its stakeholders or regulatory requirements [3,4]. In some cases, however; fully compliant process may not execute the way it is intended to, due to many reasons. Some automated processes were designed based on what IT people think rather than what the guidelines states. This lack of conformance is captured by either a complaint from the business function this process is linked to its input/output, or reflected by noticed service degradation. Therefore, a process owner should proactively monitor the compliance of the process execution against any process policies or regulations, which usually comes in the form of flow charts or narrative text. Process discovery and conformance checking are interchangeably used as they both related to the same problem. Process discovery is a learning a process to define a process model from its event logs, whereas conformance checking is a diagnostic and comparative process between a process model and its behavior [5-7]. In practice, there are two methods for process discovery and conformance check:
(1) Manual audit of the process execution
(2) Automatic analysis of the process execution based on its runtime behavior [1,8].
The 1st method usually relies on internal audit procedures, or what so-called (self assessment), for process discovery and compliance check, and it is widely practiced [3,9]. Despite its prominence, this method may hold some unforeseen drawbacks. One of the main problems with adopting such approach is its lack of objectivity, as those who conduct the self assessment are usually from the same organizations. Also, the required skills needed for undertaking the assessment process may not be available or not sufficiently exposed to them. Therefore, organizations may decide to go for external assessment instead for more credible results. However, this option may not also be drawbacks free. Not only because it is budget and effort intensive, but the required knowledge about the process to be audited, which is essential for accurate assessment output, is missing or fragmented among process users. Therefore, it would be more effective if the process analysis is conducted against its runtime execution rather than relying on the input of its paper documentation or domain expert’s interviews. This concept is relying on the fact that business processes are usually run on a process aware systems like workflow management system (WfMS) or Business Process Management (BPM) systems. Business Systems like Enterprise Resource Planning (ERP) and Customer Relationship Management (CRM) for example, register logs of the process transactions in its events log file. The log file can give, if properly analyzed, the accurate information related to the behavior of the process execution, and hence helps in representing the relative process model. This emerging process analysis technology is referred to as process mining.
Process Mining
Process mining is the use of data mining techniques and algorithms for the sake of uncovering process work-flows models and execution behavior [8,10]. IT relies on analyzing voluminous data of system event-logs to extract process models that are followed and applied within the organizations Figure 1. Similar to data mining or business intelligence applications, which help organizations make informed business decisions, process mining is a data mining with business process focus to help organization identify their process models [8,11]. There may be situations where the process narrative or policies are not available within the organization, and the knowledge of the processes is not existed but in the information system running them. Another objective of process mining is to check the conformance of an existed business model to a real system behavior [9]. In many cases the real behavior of process execution does not comply to the original requirements, or violates specific process policies [1,12]. Process activities should be executed in the same order as it is in the original process model with the predefined process qualities. Also, process mining can help in any process improvement projects by analyzing the run-time performance of the process and identifying its throughput rate, cycle time, and bottlenecks etc., which help in improving or redesigning them. The early application of process mining to get the process model of an information systems was conducted by Agrawal et al., [13]. Later, process mining have been applied in many domains like banks, health care, municipalities etc., [7,14,15]. The event driven nature of the process mining, helped in promoting its use as a powerful mechanism to support the model driven Workflow Management (WfM) and conventional Business Process Management.
Figure 1: Process Mining Process.
Process event logs: Event logs are the execution records that a system generates when it executes an instance of a process at a given unit of time [9,13]. Every event log record represents multiple data elements related to the process execution. Figure 2 shows an example of event log records. Event logs can be automatically generated from various business systems such as ERP systems, CRM or Workflow Management systems. MXML (Mining eXtensible Markup Language) or XES (eXtensible Event Stream)) are the two common format for storing evens log in a system [9,16].
Figure 2: Example of Events Log file.
Table 1 shows an example of system events log of a particular process. The table has 4 data elements: Case ID, which implies a process instance, Activity, which is the task being performed, Resource, refers to the actor responsible for performing the activity (person, group, department etc., and finally the timestamp which indicates the time an event was logged at in the system. The order of events records is essential to identify causal dependencies and hence construct the right process model. Case ID and Activity attributes are the only necessary attributes among others if the interest of the mining process is only the activities dependencies. Resource and timestamp attributes are used in analyzing performance related aspects like waiting time in the process, overall cycle time, bottlenecks and shortage of resource etc.
Table 1: Log File Data Columns.
Assuming the dataset in Table 1 is representative, an analysis process starts by grouping similar cases occurrences in groups of traces Table 2. Each trace represents a full execution of a process instance. There are many modeling techniques that may be used to represent these traces into a process model [17]. Figure 3 shows the expected process model of Table 2 in Business Process Modeling Notation (BPMN). Process mining relies on transactional data of the enterprise information systems, a data warehouse, if existed, is a good source to be used in mining business process. However, given the magnitude and variability of data types in data warehouses, a clear purpose of the process mining initiative needs to be identified so that the data are scoped and filtered accordingly. It is impractical to mine the event logs of the organization data warehouse or extract the events log for a particular enterprise system as the time and effort required would be too high. Some applications like ERP systems have thousands of database tables the things that make it invaluable to undertake process mining project unless a goal or a purpose is clearly identified. In this research, a single process will be selected which tends to be easy access and is worth investigation for its compliance to its original process model.
Figure 3: Table 2 log in BPMN.
Table 2: Process Traces.
ITIL Incident Management Process
This research applies the concept of process mining on the incident management process of the Saudi Telecom Company (STC). STC is the largest telecommunications company in the Middle East and Africa (MEA) [18], and follows the most renowned industry best practices to manage their business. One of the most common IT standards that STC is adopting is ITIL which stands for Information Technology Infrastructure Library. ITIL is recognized globally as a collection of the best practices that can be used in information technology management [19]. There are other models for IT Services Management; for example, ISO 20000, CMMI-SVC, COBIT, PRINCE2 and eTOM [20-22]. ITIL framework has evolved to meet the various issues facing organizations today. It began when Her Majesty Government in the United Kingdom raised the concerns about the quality of services gained from its IT projects [23]. The core philosophy of ITIL is responding not only to technological changes but also to the diverse needs of business in the current dynamic market. The latest version of ITIL (V3) comprises 26 processes grouped in 5 domains of service life cycles [23]. Each core domain addresses capabilities which have a direct impact on service providers with proper principles, methods and tools. It provides guidance to service providers on the provisioning of quality IT services, and on the processes, functions and other capabilities needed to support them. One of the most common ITIL processes which is the Incident management processes. Incident which is, according to ITIL handbook [23] defined as” the unplanned interruption to an IT service” is managed in this process by a set of procedures through its life cycle from the incident identification to its closure. According to ITIL v3 reference model the incident management process consists of different steps as listed below:
1. Incident identification: this step is the trigger of the incident management process, it starts once an incident occurs and an issue is reported.
2. Incident Analysis & Classification: it starts by logging the incident along with its technical and business description.
3. Incident Evaluation & Prioritization: the urgency of an incident in addition to its impact is the main factor in this step (high, medium, low).
4. Investigation and Diagnosis: this is where incidents are investigated for their cause, impact and possible solutions.
5. Resolution and recovery: when the solution of a reported issue is identified and tested, the team can start recover the service back.
6. Incident closure: service desk team will ensure that the workaround given to the user.
7. Incident Monitoring: service desk team will monitor the workaround of incidents for its reliability and efficacy.
The abstract view of the process activities as shown in Figure 4 does not include the functional escalation subprocess which is widely practiced for some incidents that requires further investigation or specific workarounds. Moreover, some organizations may have other incident management models for emergent incident or incidents that belong to specific service level agreements. In this research, the abstracted model of the process incident management process is adopted.
Figure 4: ITIL Incident Management Process.
Incident Management Process in STC
The incident management process in STC is triggered within the business layers when STC Customer calls the help desk number 902/907 to report an issue Figure 5. The call agent will try to resolve the issue following normal instructions or by advising on previous reported responses.
Figure 5: ITIL Incident Management.
When technicality advice is required service desk agent will log the request of the customer and pass it to Operation and Maintenance group (OMG). The OMG group will conduct more detailed investigation on the incident, and if IT advice is needed, it will forward it to the 1st level support of the IT team who may also escalate it to another level support in case the incident was not resolved in the first place. The main vision of Incident Management Process at STC, is to be able to restore the services as soon as possible with minimal disruption to the business. Once an incident is detected and logged in to the system, it gets classified so that it is assigned to the proper support group. The trigger of the Incident Management Process could be a service outage noticed by the help desk, user inquiry through email or phone calls or any automated system generated incident. In other situations, it follows critical Incident Management Process in case it is classified as critical or urgent. IT help desk will check the active incidents repository for already available resolutions or workaround. The incident then will be passed to the 2nd level group in case the IT help desk is not able to find proper resolution to the incident. In case the incident was wrongly assigned to the support group or needs an input from other support group, it will be transferred back to the IT help desk to reroute the incident ticket. There are two levels of escalating unresolved incidents:
(1) Vertical escalation, when higher priority is needed from a senior management
(2) Horizontal (functional), when an incident requires additional support from other groups.
After the incident is resolved, the relative support group will record the resolution or workaround in the incident knowledge base and then mark the incident as closed. For some incidents, the resolutions may result in RFC procedure submitted to the change management group which may result in software/hardware change. The expected log of this process, which runs on BMC remedy V8.1.0.1, is mapped to the following activities sequence:
{Assigned → In Progress→Resolved→Closed}
Data Analysis
Data analysis process starts by exporting the log file from the system application running the process. Figure 6 shows a snapshot of the incident management tool used by STC for incidents log. The requester details or the (Affected User) can be fetched automatically from the internal active directory. Some data inputs require manual entry like Description, Priority and Support Group. The event logs database of the BMC remedy is usually exported as spreadsheets with a non-delimited text format, which should be converted into a delimited or Comma-Separated Values format (CSV). Figure 7 shows a snapshot sample of the extracted event logs. In process mining analysis .CSV data needs to be converted to a more structured version of data format. An example of data standard governs this conversion process is called eXtensible Event Stream (XES) standard, which is commonly used by many process mining tools like ProMimport, Xesam etc,. This research uses RapidProm plug-in in RapidMiner tool for the required conversion process and ProM 6.5 suite for the process mining analysis. Generally, any event logs of a process should at least have the following data elements; Incident ID, Case ID, EventName (task or activity), TimeStamp, Resource and Group. For the purpose of process compliance check, data columns of interest from Figure 7 are the Original request ID (the incident ID), request ID (the Instance of Incident ID) and Log column where the actual sequence of activities are stored. Other columns like submitter, user, audit data, create data, etc. are not necessary for the process modeling purpose. They are essential as stated before for analyzing other performance assessment aspects like the cycle time, bottlenecks etc,. The main data column Log, which contains the sequence of activities of the incident, has also non-delimited attributes like the service configuration item (CI), assignee, assigned group, status, etc,. A simple python code with some .Excel equations were used to extract the required data, and map them according to the following data columns (Table 3).
Figure 6: STC Incident Management Tool.
Figure 7: Extracted Log File.
Table 3: Data Columns.
This mapping structure is a Prom requirement to perform the conversion process from .CSV into .XES format. The resulted .CSV file was then imported into a conversion Process in RapidMiner Figure 8. After the conversion process is complete, the resulted .Xes log file, was imported by Prom Tool, to construct the process model of the incident management process.
Figure 8: Rapid Minder Conversion Process.
Mining Process
There are plenty of mining algorithms that help in creating the process model of a log file. Choosing the proper algorithm is reliant on the nature of the log file data, rate of differences in the events, level of data structuredness, etc.; therefore, a careful inspection to the log file is required before selecting the proper mining algorithm, weather it is a single algorithm or a combination of them. In this research, three algorithms were used to construct the process model of the Incident Management System process log file:
(1) Alpha Miner Figure 9
(2) Visual Inductive Miner Figure 10 and
(3) Directly - follow algorithm Figure 11.
Figure 9: Process Model in Alpha Petri net Algorithm.
Figure 10: Process Model in Inductive Visual Miner Algorithm.
Figure 11: Process Model in Directly-follow Algorithm.
By looking at the process model in Figure 9, it seems that all process instances started with the task (Assigned) and ends with the task (Closed), which implies that the process instances follow the original process life cycle. However, it would seem that the process instances have different pathways or traces during the process execution. For example, after the task (assigned) completes, some incidents go through the task (Resolved) then to (Closed). Other instances may go through task (in progress) to (Closed), or go back to the task (Assigned) again. The other algorithm applied is the visual inductive miner which is shown in Figure 10. Inductive miner simulates the process execution for every instance occurrence. Every time the process starts, it creates a token (yellow round circle) which represents one instance of the process; i.e.; one incident case. It would seem that there are 18 tokens at the start of the process and 18 tokens reached the end of the process, that is; this process has been triggered 18 times. In task (Assigned), the number of tokens pass through is 21, whereas the original number of tokens was only 18. This indicates that 3 tokens passed the task twice due to a process rework. In task (In progress), the number of tokens passing is 20, which indicates 2 tokens were executed twice missing one token from those passed through the previous task (Assigned). The 18 tokens then leave the loop and 7 of them goes directly to task (Closed), whereas 11 tokens execute task (Resolved) before the final task (Closed). The advantage of Inductive Miner algorithm over Alpha Petri Net is that, the former visualizes the number of processed instances as they pass through the process tasks. This may help in identifying the right process model based on the major number of process instances follow a particular scenario. For a more lucid view of the process instances execution, Directly-follow algorithm was applied to generate the process model Figure 11. Directly-follow diagrams demonstrate the number of active instances passing process activities and their execution directions. For example as shown in the figure, task (Assigned) outputs 21 instances of the process: 20 into task (In progess) and 1 into task (Resolved). Task (In progess), has 3 output directions: 3 instances back into task (Assigned), 10 into task (Resolved) and 7 into task (Closed) and so on. By comparing the three models in Figure 9, Figure 10, and Figure 11, it would seem that the three models suggest the same process model in terms of its tasks dependency and their relative execution scenarios. Generally, there are 5 different scenarios of the process execution as follows:
1. {Assigned → Resolved →Closed}
2. {Assigned → InProgress→ Resolved →Closed}
3. {Assigned → InProgress→Closed}
4. {Assigned → InProgress→ Assigned → In Progress,Closed}
5. {Assigned → In Progress→ Assigned → In Progress→Resolved→Closed}
Each of the above scenarios is a process model candidate of the Incident Management Processes until they get verified against the original process chart within the organization if existed. By referring back to the original process in Section 4.1, it can be easily found that the process conforms to Scenario (3) of the events log analysis. However, in practice, there are (4) other scenarios generated by the process execution and cannot be ruled out as possible process models. Assuming no existing process description or a reference model is existed, or assuming the intended purpose of the process analysis is the process discovery rather than conformance check, further analysis is required to outweigh one process scenario over the other using more sample data or using different mining algorithm. This statement entails further exploration activities and hence increases the expected cost and effort which needs to be taken into account before a decision is made. Another approach for the process discovery analysis is to apply manual process discovery techniques like walkthrough and interviews. This approach requires formal interview sessions with the process domain experts like process owners and users, and rigorous review and analysis of any available procedures to define the right process model of the process.


This research is part of ongoing research to compare the efficiency of process mining techniques, as a process discovery mechanism, with ordinary process discovery activities like formal inspection and process walkthroughs. An ITIL compliant incident management process of a renowned Telecoms provider was selected to discover its as-is model based on its real-time behavior. Sample of the process event logs was extracted and analyzed using three mining algorithm. The analysis process showed 5 different process models of the process, 4 of them do not conform to the intended process flow. Future research will apply manual process discovery approaches on the same process to define the process model based on its stakeholders and its domain expert’s feedback. Findings of the second round of the process discovery project will be compared with the current findings of this research with respect to the relative effort and duration taken in defining the process model in both cases.


Track Your Manuscript

Share This Page