Journal of Clinical & Experimental Radiology

All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Research Article, J Clin Exp Radiol Vol: 1 Issue: 1

Big Data: The Next Era of Informatics and Data Science in Medical Imaging- A Literature Review

Elizabeth Filonenko* and Euclid Seeram

Department of Medical Imaging and Radiation Sciences, Monash University, Clayton, Australia

*Corresponding Author : Elizabeth Filonenko
Department of Medical Imaging and Radiation Sciences, Monash University, Clayton, Australia
Tel: +61425880594
E-mail: [email protected]

Received: August 24, 2017 Accepted: January 05, 2018 Published: January 12, 2018

Citation: Filonenko E, Seeram E (2018) Big Data: The Next Era of Informatics and Data Science in Medical Imaging: A Literature Review. J Clin Exp Radiol 1:1.


Big data is an evolving topic of research in recent years and provides much promise in all fields, including medicine and radiology. The purpose of this literature review is to define big data, and discuss its relevance to medical imaging informatics. Relevant imaging informatics concepts and tools, such as data mining and various types of artificial intelligence will also be discussed. Literature surveyed showed two distinct types of articles; general review-type articles, and other articles discussing ideas expanding on themes and potential applications. Furthermore, the latter category provides very specific research examples examining problems on a range of issues. Potential applications of big data solutions will be discussed through multiple avenues, including its effect on treatment and diagnosis processes, research, education, and population study applications. Departmental efficiency, cost reduction, as well as its effect on safety in the radiology department will also be discussed. The future of big data applications was found to be very promising, however the utilisation of these solutions have a long way to go, before they find practical applications in clinical centres.

Keywords: Big data; Medical imaging; Radiology informatics; Imaging informatics


Over multiple decades, hospital and medical information has been converted from an analogue, offline infrastructure, to a digital online format leading to the creation of multiple electronic data sets. Reports from physicians in every department are stored as Electronic Medical Records (EMR), which can comprise of serology and immunology results and DNA tests, just to name a few [1]. A patient’s complete journey from admission, to their discharge can be traced electronically. Much of this information is uploaded to the hospital intranet or online servers, thus making an abundance of information available to an array of personnel.

Radiology is very promising in providing innovation in the future of healthcare, as there is a large influx of data from the many electronic systems incorporated into the daily activities of the department. With the introduction of the Picture Archiving and Communication System (PACS) many years ago, radiologists became 20-50% more efficient with their workflow, as images from a combination of modalities could be displayed immediately after the examination [2].

Big data is data which is of a large volume, often combining multiple data sets and requiring innovative forms of information technology (IT) to process this data [1]. Traditional software used for analysis is unable to capture, store, manage and analyse big data, thus new information processing must be derived for this data [3]. Big data is characterised by four V’s: volume, variety, velocity and veracity [1]. While volume refers to the very large amount of data, variety notes the diverse array of data being collected at a large speed (velocity). Veracity describes the uncertainty and reliability associated with big data trends.

The use of big data has had a multitude of effects in a range of industries. For example, big data consumer analytics can influence marketing decisions and campaigns from information about consumer behaviour [4]. Data that is sourced should be created at a large speed, so that up to date decisions can be made based on information.

This is commonly found through the social media environment, information online or throughout an organisation or company. It can be used in unison, or in combination with more traditional data sources, such as surveys or registers [5]. Using big data in healthcare has been of increased interest and its relevance to radiology will be mentioned in this review.

IT infrastructure in radiology has grown significantly in recent years, especially with the implementation of informatics tools. Informatics is a science that aims to engineer ways to process and handle data. Imaging informatics or radiology informatics is a subspecialty that uses data from medical imaging services. Data scientists who work in this field aim to develop algorithms to search for patterns in data [2]. Useful information can be extracted from this large diverse data set and then managed, analyzed and visualized [6].

The most recent “emerging science” in medical imaging informatics is the topic of “Big Data” informatics. As noted by Kansagra et al. [1] “Big Data will transform the practice of medicine. Among different specialties, radiology- which has a mature IT infrastructure and many years of available digital data- is particularly well positioned to lead and benefit from these advances. Among other applications, big data in radiology has the power to enable personalised image interpretation, discovery of new imaging markers, value quantification, and workflow characterisation”.

The purpose of this literature review is to define big data in the field of radiology and medical imaging and to critically review concepts surrounding imaging informatics. Furthermore, potential applications of big data through specific examples in literature will be discussed.


Searches of the PubMed and Scopus databases were performed. Articles were limited to articles, reviews or clinical trials published in the last five years, as this is an emerging topic of research. Articles not in English or where full text was not available were excluded. Search terms used were “big data”, “imaging informatics” or “radiology informatics”, as well as “big data”, “medical imaging” or “radiology”.

Articles where medical imaging was not the primary research point were excluded from this literature review as this is the focus in this article. These excluded articles were primarily focused on bioinformatics, genomics, or histology research. These will be mentioned briefly, but are not the theme for this review. Articles found were then categorised under the aims established for the literature review in which they are in relevance to and further themes were noted under the discussion of potential applications of big data.


There were two distinct types of articles found during the review process. The first, general review-type articles which broadly discuss ideas associated with big data and its role in radiology and imaging informatics. Secondly, very specific articles that used big data and imaging informatics techniques for a range of applications were found.

The literature encompassed ideas and research on a range of medical imaging modalities, as well as other forms of data such as department operational data and metadata. The breadth of pathology being researched was also predominant, with emphasis on neurovascular, chest and abdomen pathologies.

Due to this large variety of research in a range of contexts, it is not within the scope of this paper to critique individual articles. This literature review therefore will focus on big data and imaging informatics as broad themes and then categorise the specific applications into distinct subheadings and themes relating to specific issues that were outlined in the general review-type articles. The specific themes of big data applications found during this literature review are treatment and diagnosis, education and research, cost reduction and safety.


What is big data?

There are numerous types of data that can be used for big data solutions concerning radiology. Information from other medical examinations, such as serology, immunology, clinical notes, DNA results, cognitive measures can be obtained [1,7,8] and used in conjunction with radiology data from images and their findings for analysis. This can be termed “semantic” features [9-12] which is the information of the image that reflects its content. This could include the type of image, imaging features and plane and anatomic structures shown [12]. The use of imaging informatics solutions to analyse data can result in faster throughput and lower variance of information, as well as higher inter-rater agreement on findings [10]. These features are often used to determine an effective way for disease diagnosis [7].

In radiology, the data is noted to be data rich, but poor in the amount of information being produced from it. There is a great amount of data from a radiographic image, which is highly specific and focused, often answering a small number of questions for a specific clinical question. For example, a chest x-ray screening for tuberculosis will often answer that clinical concern however it will not provide much other information. This leads to radiologic data, termed as being “insight poor” [3]. In comparison, clinical notes may be noted as being “insight-rich”, as they contain additional information relevant to the patient’s visit in a series of domains, as it could include visual observations, medications taken by the patient, charts, amongst other knowledge [13]. There is also a distinction between “insightrich” and “insight-focused” data. While “insight-rich” data will be generated at a fairly constant speed as patients present to the hospital while “insight-focused” data, such as imaging will appear as a series of large spikes of data over time [13].

Metadata is another type of data that can be collected to be processed for image informatics systems [11,12]. This includes information embedded computationally into data, for example dose reports encoded in the header of images [12]. The physical nature of pixels, such as resolution and aspect ratio, geolocation, as well as the equipment and techniques for image pre-processing and generation are additional information that can be acquired [11].

Metadata of the pixels can be mathematically extracted as quantitative descriptors termed “agnostic” features [10], which can be analysed statistically in several ways using imaging informatics [9]. Some statistical analyses can include first-order statistics, such as the histogram-based methods to find mean, median, mode values, second-order statistics concerned with the “texture” of images and higher-order statistics to extract patterns in data [10].

Other quantitative information, such as measurements, calculations and regions of features extracted from images can also be used for statistical analysis [11]. Imaging resources usage patterns can also be included in big data solutions, such as to improve departmental efficiency [1,2,13]. Internet and social media data is another form of data that has gained momentum for informatics solutions [6,13-16]. This demonstrates the large array of types of data that can be used in imaging informatics solutions.

Imaging informatics

There are a few overarching themes and concepts that are relevant to the use of big data in imaging informatics. The informatics goal is to expose and collect information into a computable format, and subsequently perform tasks which are beyond human capability [12].

Data mining, for example, is the process of grouping similar objects into large data sets and then allowing for patterns to be identified in this data [10,12,14,17]. This process can be carried out through the use artificial intelligence, as well as machine and deep learning [9,14,18,19]. Artificial intelligence aims to create computer systems with human-like intelligence, while machine learning is a subset of artificial intelligence where large amounts of data are fed into an algorithm and the machine ‘learns’ how to perform a task. Deep learning is an additional subset, where certain information has different weights in the algorithm, in order to model high level abstractions in the data. The use of computational solutions can then help compensate for human fatigue and error, such as missing critical findings as well as doing this at a faster pace than the human observer. Another relevant note is that features of the data need to be selected or segmented in order to only use relevant information, which can be done both manually and automatically [7,10]. There must be expertise shown in the dissection of data being used in order to have meaningful data. This should involve strict evaluation of data and models before implementation into the real world [9].

The main technological advances that make this possible, are the rapid increase in computer processing currently available at a reasonable cost [12] and the emergence of cloud computing [3,12,14]. Cloud storage is the notion of operating, managing and analysing data, which is stored on the internet and can be accessed by multiple people from different locations [12]. Using cloud storage has the ability to eliminate the need for an expensive advanced workstation to visualise data and can allow multiple people to collaborate from many different backgrounds of knowledge and locations. For example, a vascular surgeon may inform a radiologist from another part of the world to batch images in a certain way [3]. This can lead to global image exchange and possibly an international teaching file of cases [20].

Specific applications of big data

Specific applications of big data in imaging informatics solutions can be categorised under the following subheadings:

• Efficient treatment and diagnosis processes for the patient

• Population studies using big data and their effects

• Research and education applications of big data

• Departmental efficiency solutions and cost reduction

• Safety improvements.

Efficient treatment and diagnosis processes for the patient

A large portion of research implementing big data in the clinical world is focussed on Clinical Decision Support (CDS) and Computer Aided Diagnosis (CAD). CDS can utilise many varying types of data associated with the patient, while CAD is aimed more for the identification of pathology.

CAD systems aim to introduce interfaces where multiple datasets can be viewed in order to reach a diagnosis [1,7,12,14,21]. The use of CAD can result in faster and more accurate diagnoses, which may be especially helpful for differential diagnoses [12]. The National Cancer Informatics Program (NCIP) Annotation and Image Mark-up (AIM) [11] provided a model that captured physical entities of an image, along with its characteristics, calculations, subject demographics and other observations. This model combines multiple classes or attributes for an analysis of an image. These include the AIM statement, general information, mark-up, image reference, calculation, and image semantic content.

Interfaces displaying a combination of different data types can be used for a variety of pathologies and specific reporting interfaces. Mammography has had structured reporting for more than twenty years, due to its narrow lexicon and imaging and usually the one primary focus of cancer detection [17]. However, this has now extended to other, more complex pathologies and methods, for example, the screening of tuberculosis [8] and detection of pulmonary nodules [22]. Patients sent for biopsies based on mammographic features during mammography screening [23] and phenotypes of chronic obstructive pulmonary disorder on computed tomography (CT) scans and their genetic links [24] are other examples where linking multiple types of data have both provided positive results due to the cross-communication of data to provide better diagnosis.

Scheduling is an avenue for CDS big data solutions. Multiple types of information are collected about a patient, before an imaging exam is scheduled. This can be termed the “reason for examination” and can collect information from the EMR, such as age, sex, symptoms etc. [12]. The goal for scheduling is to select an exam which is most appropriate for a patient. CPOE, which only selects an examination from a prescribed list [2], can be extended to compare data of the patient to a rule set for examinations, to determine the most appropriate imaging method [12]. This can direct both practitioners and patients to the most appropriate imaging.

Data mining a large cohort or population of patients for different treatment options and their outcomes can help determine which treatment will have the best outcome in a number of domains for a patient. Comparisons can be made to similar clinical conditions and treatment profiles. This can lead to “shared decision making”, where both patients and physicians can actively discuss which imaging and interventions are appropriate for the case at hand [9]. This can lead to more appropriate selections of treatment, for example, a South Korean study [25,26] showed that neurosurgical clipping had a lower cost of treatment of intracranial aneurysms compared to endovascular coiling. The domain of cost can be a method of treatment selection for some patients.

Population studies using big data and their effects

“Radionomics” is the term given to linking radiological data and genomics [12,14]. This can be used for population research and to holistically understand certain diseases. MIMIC-III is a large publicly available database which comprises of approximately sixty-thousand patients including information such as their demographics, vital signs, imaging reports, procedures, medication and has given researchers an avenue to test algorithms for clinical objectives [13]. Real world clinical environments are also being used to understand diseases, for example an epidemiological study investigating a variety of clinical data, including imaging, was used to reveal clinical conditions of acute cholecystitis [24]. Another example was a study [27] whose aim was to create a nosology of skeletal disorders combined both genetic and radiological findings.

Prognosis of diseases can be made with this research and has been shown to be promising for neurological diseases such as Alzheimer’s, Parkinson’s, depression and schizophrenia [7]. Using computational modelling and machine learning can help delineate patterns in CT and magnetic resonance imaging (MRI) images [1,28,29]. It must be remembered that imaging captures structural and functional changes, which is often reflective of biological and cellular pathways [30] and that this relationship can unlock new hidden information that would not be possible to ascertain without informatics analysis. Oncology is yet another speciality that demonstrates the potential for big data research [9]. Functional MRI (fMRI) can provide tumour information to a molecular level and is less invasive, faster and cheaper compared to surgery or biopsy [29].

The use of population data was also shown in a study [31] to be beneficial in complex clinical decision making. The study showed that it aided trajectory planning of a physician’s practice and would hypothetically be able to provide evidence for cases outside usual evidence-based knowledge, such as a differential diagnosis case.

Research and education applications of big data

Using big data instead of traditional, hypothesis-driven research was seen to have several positives. Selecting patients, finding protocols to isolate an effect and accounting for variables can prove to be time-consuming, impractical and lack statistical power, especially if there are small effect sizes [1]. Using real world data would mean researchers could readily identify a population of interest, as well as which patients are likely to respond to a particular treatment [6]. Big data allows the data to show naturally occurring variations in the medical environment, which can be studied in aggregate for trends [1,12,23]. Recruitment of patients can occur from data already available in the healthcare environment [10] and information can also be collected without knowing what the final relevant features in the study will be [9].

For clinical trials that are already in progress, big data can be used to pool the data from the multiple trials to create new data sets [1]. This can be done in the form of observational research, which means patients do not have to be exposed to additional exams [9]. This is especially relevant to radiology as the use of radiation is evident. Exposure to radiation is widely regarded as potentially harmful [12], thus this method of research is advantageous.

The use of data mining can also credit a radiologist or physician with educational activity, which can in turn be used as continuing education credit points. For example, a web-based program called RADPEERTM can look at and score previous interpretation of prior images before interpreting a new study. This is an example of selfand peer-evaluation which can be used as a form of continuing education. In fact, the actual ordering and search strategies executed by radiologists can also be analysed using informatics and be applied to educational applications.

Visualisation of big data solutions is another crucial part of its formulation before it is successful in the clinical world [11,20,31,32]. Without proper display of information, findings will not be effectively utilised [32] and can hinder a clinician or other professional’s comprehension of the information [31]. The way data is presented needs to be considered as different visualisation methods may be appropriate for different scenarios, which in turn can create different education and teaching instances. The representation of information is vital for a big data solution to be successful.

Information posted on the Internet by patients on forums [6,13] and social media outlets, such as Twitter [16] can provide a new data set of information for analysis. Data mining this information can give hospitals information of how to improve their services to create better patient satisfaction and better communication between institutions, professionals and patients. Another study [15], found that an internally designed and developed social media technology solution between professionals in the hospital setting could simplify communication and improve workflow.

Departmental efficiency solutions and cost reduction

The decrease of cost when using big data initiatives was mentioned in a number of articles [20,32]. This can be split up into costs for the patient and costs for the department. By using big data to assess the cost of care for a patient, the expected cost of a study and benefit of the study may be predicted. This can decrease the amount of redundant tests on a patient, as well as decreasing the amount of wasted resources in a department or hospital. The idea of accounting for the entire episode of care of a patient, instead of looking at just the individual risks, costs etc. of one treatment is an emerging model of care that is predicted to slowly gain popularity. This can be discussed between a patient and physician and can prepare a patient for nearly every encounter they may have during their stay and rehabilitation as a big picture, as opposed to only looking at the current intervention or problem being faced by the patient.

One example of departmental efficiency being implemented in a department was a recent study [33] that involved the creation of an algorithm which alerted physicians if follow-up for abnormal chest imaging results was not completed. This alert could possibly save a patient from adverse consequences, whether it is cost, time or health impacts if follow-up had not had occurred.

Another way to decrease cost is to use operational data of the department as data for imaging informatics solutions [1,12,14]. This can include the analysis of variation of imaging volumes, comparing utilisation rates of specific scanners at different time periods, as well as specific data comparing turnaround times for both radiologists and technologists. This can be used to optimise staffing in a department based on trends seen, as well as account for seasonal variations, for example a larger influx of chest x-rays during the influenza season. Work flow patterns throughout the day can also be monitored. Imaging appointments may soon have differing time slots depending on patient presentation, for example an elderly cancer patient with loss of mobility can be scheduled for a longer appointment as opposed to a young ambulatory patient. The mix of patients on a given day cannot be truly predicted, but by doing this, the scheduling is more flexible and there is the ability to inform others in the department of a delay and can allow personnel to plan accordingly.

Safety improvements

An area of interest in terms of imaging informatics solutions was the improvement of safety, more specifically the decrease of radiation dose to the patient. Specific examples of research include CT imaging studies, which is in line with the fact that CT contributes one of the highest radiation dose to patients compared to other imaging modalities such as radiography, and nuclear medicine for example [34]. Three articles gave examples of ways to give critical alerts and warnings of overexposure of radiation in CT, two notifying doctors [20,34] and the other notifying patients [35].

The standardised measure of dose is usually through the collection of the CT dose index (CTDI) and dose length product (DLP) of a CT examination, however there are disadvantages of only using these methods for the estimation of dose for the patient [34]. A possible application of big data is to find other ways of determining more accurate dose calculations, which one study [36] aimed to do by creating an algorithm for abdominal CT dose calculations using the body size diameter as a variable, which is not involved in the various CTDI or DLP calculations.

Another aspect of big data solutions that related to safety is the ability to stop re-identification of data. De-identification software has thus far been developed and tested [37] and this avenue of research is important for the progression of big data solutions into the clinical world.

Implementation of big data imaging informatics tools - limitations

Implementation of imaging informatics tools using big data has several limitations, which were broadly discussed in the literature. The types of correlations made must be investigated carefully, as if two (or more) variables are correlated in the findings of an imaging informatics program, this should not be assumed as a causal relationship between these variables. Further research investigating their relationship must be conducted before causation can be fully be appreciated in clinical radiology. An example of this is clearly demonstrated in an article [35], where the number of CT scans at a particular centre was monitored. The results showed that there was a decrease following the intervention in the study, however there was also an introduction of new abdomen/pelvis codes which also showed a decrease of number of CT’s seen at the centre. Correlations must be thoroughly scrutinised before they are accepted.

Similarly, the reliance of artificial intelligence and deep/machine learning can under-appreciate and under-represent statistical errors and using these models can lead to biased or miscorrelated solutions [19]. Irrelevant information can be exaggerated in its usefulness to the problem at hand. A sparse data set will may produce statistically unreliable results, and even with very large amounts of images or information acquired for analysis, if the data set has inaccurate information, and this will affect subsequent analysis. Without appropriate data mining, the data sets will not have the appropriate meaning in their relationships.

Radiological images and information are visual representations of complex biological systems, which are affected by underlying physics processes during image acquisition and then undergo multiple processing steps before the final image is viewed. Factoring in these variables is difficult computationally, but they govern how the image is obtained and cannot be ignored when making big data driven solutions and creating and proving relationships between variables [21]. Lack of standardisation is a limitation which was mentioned several times. This could be further subdivided into standardisation of protocol and patient care [34,35,37] lexicon and terminology, reporting structure [30] and representation of data [9,11,32]. The standardisation of data sets will make both collection and interpretation more meaningful and less prone to errors.

Kohli et al. [2] showed that the evolution of an IT savvy business includes at least four stages: localising, standardising, optimising and reusing. Radiology is noted to be at the standardising level, where there is a decrease seen in the control and flexibility of local control. Kohli et al. also pointed out that radiology is on the right track to being an IT savvy business, where in the future, big data solutions in the clinical world may be a reality.

Another notion identified in the literature, is that the use of imaging informatics may actually complicate the treatment process, by overloading the user with information. Computerised physician order entry (CPOE) and voice recognition are IT solutions introduced into radiology several years ago, and its introduction showed a decrease in efficiency. This has also been shown in recent molecular biology-based techniques employing big data solutions [10]. Over time, CPOE has now led to an increase in efficiency, after refining specific algorithms.

Furthermore this would also need to occur if any big data solutions are to be employed in the clinical setting. Although there are de-identification solutions in literature, this is still an area of research that should be examined before privacy of all information used in big data imaging informatics solutions can be guaranteed. If patient data is used, there is a risk of re-identification, especially if the data is linked to other data sets [6]. Proper authorisation of who is able to access the data being used should be mandatory [12,13] and regulation of the use of the data should occur [9,10]. There may be legal implications if the data got into the wrong hands [1].

The person(s) responsible for the implementation of the big data solutions is the focus in a number of articles [1,2,10,12,13,14,21,23]. All articles appear to suggest that the radiologist should be pivotal in the implementation of solutions, however some believed a team of professionals should also be included, such as IT professionals [2], data and medical scientists [13,21], physicists [2], statisticians [10] and other health professionals, such as nurses or radiographers [12]. The discussion of implementation should be explored further, as there was no overall agreement in literature. There must be a clear structure of how the implementation of a big data solution will occur and this discussion must be done with the appropriate people in an effective manner [38].

Future directions for research

Big data and its application in radiology is an emerging topic, thus the research currently available is very sparse and preliminary. The issue with current research is that it is either very specific imaging informatics solutions implemented to one department, or review-type articles that present a broad overview of big data. Specific research is currently not at a stage where it can be generalised to an entire population or have provided a solution that will be appropriate for implementation at all clinical centres. This is mostly due to the fact that specific imaging informatics solutions have been tailored to a distinct problem and demographic or speciality, and this particular solution may need to be adapted before it may be useful at another institution.

Future research can still create more big data imaging informatics solutions, as well as testing them in a wider population by including multiple clinical centres in studies. The possibilities for solutions are almost endless, however there is still a long way to go before these are employed and standardised in the clinical environment.

The future applications of big data to radiology should focus on optimisation of structures and systems already in place, as well as developing new technology and solutions for current and prospective problems that can be found using big data itself. The improvement in patient care and increasing efficiency should be of focus, as well as providing a more comprehensive, but straightforward approach to problem solving.


Big data applications using imaging informatics solutions are an emerging topic in literature and research. There are multiple avenues of applications that can be explored, and the current literature shows promise for applications in clinical centres. Future research needs to test solutions in a variety of environments before they can be standardised in the medical imaging department.


Track Your Manuscript