International Journal of Cardiovascular ResearchISSN: 2324-8602

Reach Us +1 850 900 2634
All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Research Article, Int J Cardiovasc Res Vol: 8 Issue: 1

Patterns in Risk Factors of Cardiovascular Disease using the Apriori Algorithm

Musa Karim1*, Shumaila Furnaz1, Tahir Saghir1, Nadeem Hasan Rizvi1 and Ahmed Raheem2

1National Institute of Cardiovascular Diseases (NICVD), Karachi, Pakistan

2Pathology and Laboratory Medicine, Aga Khan University Hospital, Karachi, Pakistan

*Corresponding Author : Musa Karim
Statistician, Research Department, National Institute of Cardiovascular Diseases (NICVD), Karachi, Pakistan
Tel:
+923323646169
E-mail: [email protected]

Received: April 09, 2018 Accepted: May 09, 2018 Published: January 14, 2019

Citation: Karim M, Furnaz S, Saghir T, Rizvi NH, Raheem A (2019) Patterns in Risk Factors of Cardiovascular Disease using the Apriori Algorithm. Int J Cardiovasc Res 8:1. doi: 10.4172/2324-8602.1000365

Abstract

Background: Cardiovascular Disease (CVD) is the dominating cause of mortality around the globe. Aim of this study was to identify the co-occurrence of risk factors of cardiovascular disease using the Apriori data mining algorithm among the patients visiting to outpatients department of a tertiary care hospital in Pakistan from January 2017 to June 2017.

Methods: This cross-sectional study includes 5,164 consecutive patients visiting to OPD of National Institute of Cardiovascular Diseases, Karachi Pakistan from January 2017 to June 2017. CVD risk factors were collected for all enrolled patients. Association rules were developed and assessed by applying data mining technique the Apriori algorithm. Pruning approaches such as removal of redundant rules, minimum length of at least two items, minimum support of 0.20, and minimum confidence of 0.90 were applied.

Results: Out of 5,164 patients 51.1% were female and 42.7% patients were more than 50 years of age. Dominantly observed risk factors are hypertension, obesity, dyslipidemia, and diabetes mellitus respectively. Hypertension was the consequent for all extracted association rules with overweight/obese, dyslipidemia, female with overweight/obese, more than 50 year, overweight/ obese with dyslipidemia, more than 50 year with dyslipidemia, female of more than 50 year of age, and female with dyslipidemia as antecedent respectively.

Conclusion: Based on the Apriori algorithm, meaningful association rules and patterns among the risk factors of cardiovascular disease (CVD) were extracted; these rules provide feasible way to reduce the risk of cardiovascular disease (CVD).

Keywords: Data mining; Apriori algorithm; Association rules; Cardiovascular disease (CVD)

Introduction

Cardiovascular Disease (CVD) is the dominating cause of mortality around the globe [1]. According to WHO Global Health Estimates for 2015, CVD is cause of approximately 17,689 thousand deaths, with crude death rate of 240.9 deaths per hundred thousand of population. It accounts for around 1/3rd (31.34%) of the total global mortalities [2]. By 2030 a conservative estimate of deaths due to CVD are more than 23.3 million per annum [3]. 70% of the risk factors attributed with CVD are potentially modifiable, such as hypertension, diabetes, and smoking [4]. These modifiable risk factors, in particular hypertension and obesity are known to have associated with functional and structural subclinical changes in heart chambers [5].

Implementation of effective management strategies of risk factors of CVD is among one of the effective efforts to reduce the global disease burden [6]. These strategies include development of effective drug therapies for these conditions and preemptive measures and life style changes. Health care system today generate enormous amount of complex data about patients, diseases, and diagnostic methods. Data mining techniques are useful tools to explore the underlying health phenomenon for the effective management of the disease. Data Mining is the process of extracting hidden knowledge from large volumes of raw data [7]. It has been defined as the nontrivial extraction of previously unknown, implicit and potentially useful information from data.

Extensive literature and data are available assessing major risk factors, modifiable risk factors, and contributing risk factors independently. Further investigation and assessment of cooccurrence and association patterns among the risk factors of CVD is important for practitioners to optimize the management strategies for the Cardiovascular Disease. Therefore, purpose of this study is to identify the co-occurrence of risk factors of cardiovascular disease using the Apriori data mining algorithm among the patients visiting to outpatients department of a tertiary care hospital in Pakistan from January 2017 to June 2017.

Patients and Methods

This cross-sectional study was conducted at National Institute of Cardiovascular Diseases, Karachi Pakistan from January 2017 to June 2017 after the approval of institutional ethical review committee. Data was collected using structural questionnaire after expert review. Data was collected for 7,728 conveniently selected patients visiting to OPD. All the patients with incomplete data of CVD risk factors were excluded from this analysis and 5,164 patients with complete data of CVD risk factors were considered for the analysis. Variables considered for this study are the typical and atypical risk factor of cardiovascular diseases, including; addiction (tobacco, alcohol, etc.), hypertension, diabetes mellitus, family history of CVD, dyslipidemia, obesity, gender, and more than 50 years of age.

The Apriori data mining algorithm was applied to formulate the association rules using the arules package for R version 3.3.3 on R Studio. And association rules were developed and assessed by calculating three measures; (1) support: occurrence of rule (closer to 1 is favorable), (2) confidence: truthfulness of the rule (closer to 1 is favorable), and (3) lift: strength of association or dependency (greater than 1 is favorable). Mathematical explanations of the three measures using discrete probability theory are presented in Table 1 for a hypothetical association rule A→B where A is the antecedent and B is the consequent. Pruning approaches such as removal of redundant rules, minimum length of at least two items, minimum support of 0.20, and minimum confidence of 0.90 were applied.

Measures Mathematical Formula
Support A→B image
Confidence A→B image
Lift A→B image

Table 1: Three measures of association rule A→B.

Results

Out of 5,164 patients 51.1% were female and 42.7% patients were more than 50 years of age. Dominantly observed risk factors are hypertension, obesity, dyslipidemia, and diabetes mellitus respectively. Frequency of CVD risk factors are presented in Figure 1.

Figure 1: Frequency of risk factors.

Based on the set pruning criteria, 8 unique association rules were extracted with support ranging over 0.22 to 0.57, confidence ranging over 0.92 to 0.97, and lift of ranging over 1.10 to 1.17. Eight potentially useful association rules extracted with hypertension as consequent and overweight/obese, dyslipidemia, female with overweight/obese, more than 50 year, overweight/obese with dyslipidemia, more than 50 year with dyslipidemia, female of more than 50 year of age, and female with dyslipidemia as antecedent respectively. Support, confidence, and lift of the extracted association rules are presented in Figure 2.

Figure 2: Extracted association rules {antecedent => consequent}.

Discussion

Aim of this study was to explore the patterns in risk factors of cardiovascular disease, all of the association rules extracted in this study revolve around the hypertension, one of the major global risk factor of cardiovascular disease [8]. In our study, hypertension was observed in more than 80% of the patients. Despite the presence of effective antihypertensive therapies and intensive research efforts, only half of the hypertensive patients were diagnosed for the condition and half were taken treatment [9]. Local data reported 77% of adherence to the anti-hypertensive medication, it is further reported that the factors strongly attributed to the non-adherence amongst Pakistani patients are lack of awareness, symptomatic treatment, younger age, and monotherapy [10].

Overweight or obesity and its association with hypertension is extensively studied and confirmed for varying geographies, age, and gender in past studies [11-15]. And the combination of these conditions in any patients is recognized as a pre-eminent cause of cardiovascular risk [14]. In our study, the extracted association rule number 1 depicts this important underlying association with hypertension as consequent and overweight/obesity as antecedent. Support of this association is 0.57 (say 57%) inferring the combination of these conditions prevails amongst more than half of the patients presenting to our center. Confidence of the association is as high as 0.92 indicating; 92% of overweighed or obese patients are hypertensive and lift of the rule is 1.10 indicating; patients presented with overweight/obesity are 1.10 time likely to be hypertension. This association gets even stronger in female patients; association rule number 3 shows 0.94 confidence value and 1.13 of lift for the rule with hypertension as consequent and overweight/obesity & female as antecedent.

Hypertension and dyslipidemia are established risk factors of CVD and more often both the conditions can be seen in one patient [16-18]. Mechanism is not clear for this association, justification, so far, given in literature for this relationship is; both conditions share common pathophysiological etiologies [16]. Furthermore, dyslipidemia promotes atherosclerosis by effecting arterial functional and structural properties adversely [19]. These changes adversely alter the BP regulation, as a result, predisposes dyslipidemia patients to development of hypertension [18]. The second association rule extracted in our study clearly outlines this association with hypertension as consequent and dyslipidemia as antecedent. Support measure of 0.39 quantifies the coexistence of dyslipidemia and hypertension is in around 39% of the study population, with 95% confidence that dyslipidemia patients will be hypertensive as well and dyslipidemic patients are 1.14 times likely to be hypertensive. The association rule of dyslipidemia implies hypertension get strengthen for the dyslipidemic patients with overweight/obesity (rule 5, confidence=0.97, lift=1.17), more than 50 years of age (rule 6, confidence=0.95, lift=1.15), and female (rule 8, confidence=0.97, lift=1.17).

Hypertension and its association with age was well picked by the algorithm in 4th extracted rule, 93% of the patients with more than 50 years of age present with hypertension. Strength of 4th rule (lift) was observed to be 1.12, the association is relatively more apparent in female patients (rule 7, confidence=0.92, lift=1.10). In general population hypertension tends to increase with age; more than 50% of the population over or equal to 60 years of age is hypertensive, proportion increases to approximately 66% in the population of age more than or equal to 65 years [20,21]. Physiological changes and arterial structural changes due to aging of the patients aggravated the treatment dilemma in these patients. Therefore, there is a need of extra attention and care in treatment of patients of this age group [22].

Strengths and Limitation

One of the particular strength of this study is it is based on sufficiently large number of patients and an advanced and well established data mining technique was used for the extraction of results.

This study was conducted in a single tertiary care hospital with a certain profile of the patients. Which is why the generalizability of the study findings can be limited. Second important limitation of this study was no diagnostic test was performed in patients for the verification and further investigation of the CVD.

Conclusion

Based on the Apriori algorithm, meaningful association rules and patterns among the risk factors of cardiovascular disease (CVD) were extracted; these rules provide feasible way to reduce the risk of cardiovascular disease (CVD). Targeted treatment, management strategies and awareness campaigns can be devised for coexisting conditions.

References

Track Your Manuscript

Share This Page

Media Partners