Research Article, Int J Cardiovasc Res Vol: 8 Issue: 1
Patterns in Risk Factors of Cardiovascular Disease using the Apriori Algorithm
*Corresponding Author : Musa Karim
Statistician, Research Department, National Institute of Cardiovascular Diseases (NICVD), Karachi, Pakistan
E-mail: [email protected]
Received: April 09, 2018 Accepted: May 09, 2018 Published: January 14, 2019
Citation: Karim M, Furnaz S, Saghir T, Rizvi NH, Raheem A (2019) Patterns in Risk Factors of Cardiovascular Disease using the Apriori Algorithm. Int J Cardiovasc Res 8:1. doi: 10.4172/2324-8602.1000365
Background: Cardiovascular Disease (CVD) is the dominating cause of mortality around the globe. Aim of this study was to identify the co-occurrence of risk factors of cardiovascular disease using the Apriori data mining algorithm among the patients visiting to outpatients department of a tertiary care hospital in Pakistan from January 2017 to June 2017.
Methods: This cross-sectional study includes 5,164 consecutive patients visiting to OPD of National Institute of Cardiovascular Diseases, Karachi Pakistan from January 2017 to June 2017. CVD risk factors were collected for all enrolled patients. Association rules were developed and assessed by applying data mining technique the Apriori algorithm. Pruning approaches such as removal of redundant rules, minimum length of at least two items, minimum support of 0.20, and minimum confidence of 0.90 were applied.
Results: Out of 5,164 patients 51.1% were female and 42.7% patients were more than 50 years of age. Dominantly observed risk factors are hypertension, obesity, dyslipidemia, and diabetes mellitus respectively. Hypertension was the consequent for all extracted association rules with overweight/obese, dyslipidemia, female with overweight/obese, more than 50 year, overweight/ obese with dyslipidemia, more than 50 year with dyslipidemia, female of more than 50 year of age, and female with dyslipidemia as antecedent respectively.
Conclusion: Based on the Apriori algorithm, meaningful association rules and patterns among the risk factors of cardiovascular disease (CVD) were extracted; these rules provide feasible way to reduce the risk of cardiovascular disease (CVD).
Keywords: Data mining; Apriori algorithm; Association rules; Cardiovascular disease (CVD)
Cardiovascular Disease (CVD) is the dominating cause of mortality around the globe . According to WHO Global Health Estimates for 2015, CVD is cause of approximately 17,689 thousand deaths, with crude death rate of 240.9 deaths per hundred thousand of population. It accounts for around 1/3rd (31.34%) of the total global mortalities . By 2030 a conservative estimate of deaths due to CVD are more than 23.3 million per annum . 70% of the risk factors attributed with CVD are potentially modifiable, such as hypertension, diabetes, and smoking . These modifiable risk factors, in particular hypertension and obesity are known to have associated with functional and structural subclinical changes in heart chambers .
Implementation of effective management strategies of risk factors of CVD is among one of the effective efforts to reduce the global disease burden . These strategies include development of effective drug therapies for these conditions and preemptive measures and life style changes. Health care system today generate enormous amount of complex data about patients, diseases, and diagnostic methods. Data mining techniques are useful tools to explore the underlying health phenomenon for the effective management of the disease. Data Mining is the process of extracting hidden knowledge from large volumes of raw data . It has been defined as the nontrivial extraction of previously unknown, implicit and potentially useful information from data.
Extensive literature and data are available assessing major risk factors, modifiable risk factors, and contributing risk factors independently. Further investigation and assessment of cooccurrence and association patterns among the risk factors of CVD is important for practitioners to optimize the management strategies for the Cardiovascular Disease. Therefore, purpose of this study is to identify the co-occurrence of risk factors of cardiovascular disease using the Apriori data mining algorithm among the patients visiting to outpatients department of a tertiary care hospital in Pakistan from January 2017 to June 2017.
Patients and Methods
This cross-sectional study was conducted at National Institute of Cardiovascular Diseases, Karachi Pakistan from January 2017 to June 2017 after the approval of institutional ethical review committee. Data was collected using structural questionnaire after expert review. Data was collected for 7,728 conveniently selected patients visiting to OPD. All the patients with incomplete data of CVD risk factors were excluded from this analysis and 5,164 patients with complete data of CVD risk factors were considered for the analysis. Variables considered for this study are the typical and atypical risk factor of cardiovascular diseases, including; addiction (tobacco, alcohol, etc.), hypertension, diabetes mellitus, family history of CVD, dyslipidemia, obesity, gender, and more than 50 years of age.
The Apriori data mining algorithm was applied to formulate the association rules using the arules package for R version 3.3.3 on R Studio. And association rules were developed and assessed by calculating three measures; (1) support: occurrence of rule (closer to 1 is favorable), (2) confidence: truthfulness of the rule (closer to 1 is favorable), and (3) lift: strength of association or dependency (greater than 1 is favorable). Mathematical explanations of the three measures using discrete probability theory are presented in Table 1 for a hypothetical association rule A→B where A is the antecedent and B is the consequent. Pruning approaches such as removal of redundant rules, minimum length of at least two items, minimum support of 0.20, and minimum confidence of 0.90 were applied.
Table 1: Three measures of association rule A→B.
Out of 5,164 patients 51.1% were female and 42.7% patients were more than 50 years of age. Dominantly observed risk factors are hypertension, obesity, dyslipidemia, and diabetes mellitus respectively. Frequency of CVD risk factors are presented in Figure 1.
Based on the set pruning criteria, 8 unique association rules were extracted with support ranging over 0.22 to 0.57, confidence ranging over 0.92 to 0.97, and lift of ranging over 1.10 to 1.17. Eight potentially useful association rules extracted with hypertension as consequent and overweight/obese, dyslipidemia, female with overweight/obese, more than 50 year, overweight/obese with dyslipidemia, more than 50 year with dyslipidemia, female of more than 50 year of age, and female with dyslipidemia as antecedent respectively. Support, confidence, and lift of the extracted association rules are presented in Figure 2.
Aim of this study was to explore the patterns in risk factors of cardiovascular disease, all of the association rules extracted in this study revolve around the hypertension, one of the major global risk factor of cardiovascular disease . In our study, hypertension was observed in more than 80% of the patients. Despite the presence of effective antihypertensive therapies and intensive research efforts, only half of the hypertensive patients were diagnosed for the condition and half were taken treatment . Local data reported 77% of adherence to the anti-hypertensive medication, it is further reported that the factors strongly attributed to the non-adherence amongst Pakistani patients are lack of awareness, symptomatic treatment, younger age, and monotherapy .
Overweight or obesity and its association with hypertension is extensively studied and confirmed for varying geographies, age, and gender in past studies [11-15]. And the combination of these conditions in any patients is recognized as a pre-eminent cause of cardiovascular risk . In our study, the extracted association rule number 1 depicts this important underlying association with hypertension as consequent and overweight/obesity as antecedent. Support of this association is 0.57 (say 57%) inferring the combination of these conditions prevails amongst more than half of the patients presenting to our center. Confidence of the association is as high as 0.92 indicating; 92% of overweighed or obese patients are hypertensive and lift of the rule is 1.10 indicating; patients presented with overweight/obesity are 1.10 time likely to be hypertension. This association gets even stronger in female patients; association rule number 3 shows 0.94 confidence value and 1.13 of lift for the rule with hypertension as consequent and overweight/obesity & female as antecedent.
Hypertension and dyslipidemia are established risk factors of CVD and more often both the conditions can be seen in one patient [16-18]. Mechanism is not clear for this association, justification, so far, given in literature for this relationship is; both conditions share common pathophysiological etiologies . Furthermore, dyslipidemia promotes atherosclerosis by effecting arterial functional and structural properties adversely . These changes adversely alter the BP regulation, as a result, predisposes dyslipidemia patients to development of hypertension . The second association rule extracted in our study clearly outlines this association with hypertension as consequent and dyslipidemia as antecedent. Support measure of 0.39 quantifies the coexistence of dyslipidemia and hypertension is in around 39% of the study population, with 95% confidence that dyslipidemia patients will be hypertensive as well and dyslipidemic patients are 1.14 times likely to be hypertensive. The association rule of dyslipidemia implies hypertension get strengthen for the dyslipidemic patients with overweight/obesity (rule 5, confidence=0.97, lift=1.17), more than 50 years of age (rule 6, confidence=0.95, lift=1.15), and female (rule 8, confidence=0.97, lift=1.17).
Hypertension and its association with age was well picked by the algorithm in 4th extracted rule, 93% of the patients with more than 50 years of age present with hypertension. Strength of 4th rule (lift) was observed to be 1.12, the association is relatively more apparent in female patients (rule 7, confidence=0.92, lift=1.10). In general population hypertension tends to increase with age; more than 50% of the population over or equal to 60 years of age is hypertensive, proportion increases to approximately 66% in the population of age more than or equal to 65 years [20,21]. Physiological changes and arterial structural changes due to aging of the patients aggravated the treatment dilemma in these patients. Therefore, there is a need of extra attention and care in treatment of patients of this age group .
Strengths and Limitation
One of the particular strength of this study is it is based on sufficiently large number of patients and an advanced and well established data mining technique was used for the extraction of results.
This study was conducted in a single tertiary care hospital with a certain profile of the patients. Which is why the generalizability of the study findings can be limited. Second important limitation of this study was no diagnostic test was performed in patients for the verification and further investigation of the CVD.
Based on the Apriori algorithm, meaningful association rules and patterns among the risk factors of cardiovascular disease (CVD) were extracted; these rules provide feasible way to reduce the risk of cardiovascular disease (CVD). Targeted treatment, management strategies and awareness campaigns can be devised for coexisting conditions.
- Kwan GF, Mayosi BM, Mocumbi AO, Miranda JJ, Ezzati M, et al. (2016) Endemic cardiovascular diseases of the poorest billion. Circulation 133: 2561-2575
- Global Health Estimates (2015): Deaths by Cause, Age, Sex, by Country and by Region, 2000-2015. In: Health statistics and information systems. WHO Geneva.
- Mathers CD, Loncar D (2006) Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med 3: e442.
- Sardarinia M, Akbarpour S, Lotfaliany M, Bagherzadeh-Khiabani F, Bozorgmanesh M, et al. (2016) Risk Factors for Incidence of Cardiovascular Diseases and All-Cause Mortality in a Middle Eastern Population over a Decade Follow-up: Tehran Lipid and Glucose Study. PloS one 11: e0167623.
- Petersen SE, Sanghvi MM, Aung N, Cooper JA, Paiva JM, et al. (2017) The impact of cardiovascular risk factors on cardiac structure and function: Insights from the UK Biobank imaging enhancement study. PloS one 12: e0185114.
- Patel SA, Winkel M, Ali MK, Narayan KM, Mehta NK (2015) Cardiovascular mortality associated with 5 leading risk factors: national and state preventable fractions estimated from survey data. Ann Intern Med 163: 245-253
- Kolçe E, Frasheri N (2012) A literature review of data mining techniques used in healthcare databases. ICT Innov.
- Kearney PM, Whelton M, Reynolds K, Muntner P, Whelton PK, et al. (2005) Global burden of hypertension: analysis of worldwide data. Lancet 365: 217-223.
- Saleem F, Hassali AA, Shafie AA (2010) Hypertension in Pakistan: time to take some serious action. Br J Gen Pract 60: 449-450.
- Hashmi SK, Afridi MB, Abbas K, Sajwani RA, Saleheen D, et al. (2007) Factors associated with adherence to anti-hypertensive treatment in Pakistan. PloS one 2: e280.
- Hall JE, do Carmo JM, da Silva AA, Wang Z, Hall ME (2015) Obesity-induced hypertension: interaction of neurohumoral and renal mechanisms. Circ Res 116: 991-1006.
- Wang SK, Ma W, Wang S, Yi XR, Jia HY (2014) Obesity and its relationship with hypertension among adults 50 years and older in Jinan, China. PloS one 9: e114424
- Macia E, Gueye L, Duboz P (2016) Hypertension and Obesity in Dakar, Senegal. PloS one 11: e0161544.
- Landsberg L, Aronne LJ, Beilin LJ, Burke V, Igel LI, et al. (2013) Obesity-related hypertension: pathogenesis, cardiovascular risk, and treatment: a position paper of The Obesity Society and the American Society of Hypertension. J Clin Hypertens (Greenwich) 15: 14-33.
- Kotsis V, Grassi G (2016) The enigma of obesity-induced hypertension mechanisms in the youth. J Hypertens 34: 191-192.
- Otsuka T, Takada H, Nishiyama Y, Kodani E, Saiki Y, et al. (2016) Dyslipidemia and the Risk of Developing Hypertension in a Working-Age Male Population. J Am Heart Assoc 5: e003053.
- Halperin RO, Sesso HD, Ma J, Buring JE, Stampfer MJ, et al. (2006) Dyslipidemia and the risk of incident hypertension in men. Hypertension 47: 45-50.
- Freitas MP, Loyola Filho AI, Lima-Costa MF. (2011) Dyslipidemia and the risk of incident hypertension in a population of community-dwelling Brazilian elderly: the Bambui Cohort Study of Aging. Cad Saude Publica 27: S351-S359.
- Wilkinson IB, Prasad K, Hall IR, Thomas A, MacCallum H, et al. (2002) Increased central pulse pressure and augmentation index in subjects with hypercholesterolemia. J Am Coll Cardiol 39: 1005-1011
- Ostchega Y, Dillon CF, Hughes JP, Carroll M, Yoon S (2007) Trends in hypertension prevalence, awareness, treatment, and control in older U.S. adults: data from the National Health and Nutrition Examination Survey 1988 to 2004. J Am Geriatr Soc 55: 1056-1065.
- Nguyen QT, Anderson SR, Sanders L, Nguyen LD (2012) Managing hypertension in the elderly: a common chronic disease with increasing age. Am Health Drug Benefits 5: 146-153.
- Oliva RV, Bakris GL (2012) Management of hypertension in the elderly population. J Gerontol A Biol Sci Med Sci 67: 1343-1351