Bioinformatics and Medical Applications

Name: Bioinformatics and Medical Applications | Big Data Using Deep Learning Algorithms
Brand: Wiley
Price: 190.99 EUR
Availability: OnlineOnly

Big Data Using Deep Learning Algorithms

A. Suresh S. Vimal Y. Harold Robinson Dhinesh Kumar Ramaswami R. Udendhran(Herausgeber*in)

Wiley (Verlag)

1. Auflage

Erschienen am 24. März 2022

352 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-119-79265-9 (ISBN)

190,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

BIOINFORMATICS AND MEDICAL APPLICATIONS

The main topics addressed in this book are big data analytics problems in bioinformatics research such as microarray data analysis, sequence analysis, genomics-based analytics, disease network analysis, techniques for big data analytics, and health information technology.

Bioinformatics and Medical Applications: Big Data Using Deep Learning Algorithms analyses massive biological datasets using computational approaches and the latest cutting-edge technologies to capture and interpret biological data. The book delivers various bioinformatics computational methods used to identify diseases at an early stage by assembling cutting-edge resources into a single collection designed to enlighten the reader on topics focusing on computer science, mathematics, and biology. In modern biology and medicine, bioinformatics is critical for data management. This book explains the bioinformatician's important tools and examines how they are used to evaluate biological data and advance disease knowledge.

The editors have curated a distinguished group of perceptive and concise chapters that presents the current state of medical treatments and systems and offers emerging solutions for a more personalized approach to healthcare. Applying deep learning techniques for data-driven solutions in health information allows automated analysis whose method can be more advantageous in supporting the problems arising from medical and health-related information.

Audience

The primary audience for the book includes specialists, researchers, postgraduates, designers, experts, and engineers, who are occupied with biometric research and security-related issues.

A. Suresh, PhD is an associate professor, Department of the Networking and Communications, SRM Institute of Science & Technology, Kattankulathur, Tamil Nadu, India. He has nearly two decades of experience in teaching and his areas of specialization are data mining, artificial intelligence, image processing, multimedia, and system software. He has published 6 patents and more than 100 papers in international journals.

S. Vimal, PhD is an assistant professor in the Department of Artificial Intelligence & DS, Ramco Institute of Technology, Tamilnadu, India. He is the editor of 3 books and guest-edited multiple journal special issues. He has more than 15 years of teaching experience.

Y. Harold Robinson, PhD is currently working in the School of Technology and Engineering, Vellore Institute of Technology, Vellore, India. He has published more than 50 papers in various international journals and presented more than 70 papers in both national and international conferences.

Dhinesh Kumar Ramaswami, BE in Computer Science, is a Senior Consultant at Capgemini America Inc. He has over 9 years of experience in software development and specializes in various .net technologies. He has published more than 15 papers in international journals and national and international conferences.

R. Udendhran, PhD is an assistant professor, Department of Computer Science and Engineering at Sri Sairam Institute of Technology, Chennai, Tamil Nadu, India. He has published about 20 papers in international journals.

Weitere Details

Weitere Ausgaben

Personen

Inhalt

1
Probabilistic Optimization of Machine Learning Algorithms for Heart Disease Prediction

Jaspreet Kaur1 *, Bharti Joshi2 and Rajashree Shedge2

1Ramrao Adik Institute of Technology, Nerul, Navi Mumbai, India

2Department of Computer Engineering Ramrao, Adik Institute of Technology Nerul, Navi Mumbai, India

Abstract

Big Data and Machine Learning have been effectively used in medical management leading to cost reduction in treatment, predicting the outbreak of epidemics, avoiding preventable diseases, and, improving the quality of life.

Prediction begins with the machine learning patterns from several existing known datasets and then applying something very similar to an obscure dataset to check the result. In this chapter, we investigate Ensemble Learning which overcomes the limitations of a single algorithm such as bias and variance by using a multitude of algorithms. The focus is not solely increasing the accuracy of weak classification algorithmic programs however additionally implementing the algorithm on a medical dataset wherever it is effectively used for analysis, prediction, and treatment. The consequence of the investigation indicates that ensemble techniques are powerful in improving the forecast accuracy and displaying an acceptable performance in disease prediction. Additionally, we have worked on a procedure to further improve the accuracy post applying ensemble method by focusing on the wrongly classified records and using probabilistic optimization to select pertinent columns by increasing their weight and doing a reclassification which would result in further improved accuracy. The accuracy hence achieved by our proposed method is, by far, quite competitive.

Keywords: Kaggle dataset, machine learning, probabilistic optimization, decision tree, random forest, Naive Bayes, K means, ensemble method, confusion matrix, probability, Euclidean distance

1.1 Introduction

Healthcare and biomedicine are increasingly using big data technologies for research and development. Mammoth amount of clinical data have been generated and collected at an unparalleled scale and speed. Electronic health records (EHR) store large amounts of patient data. The quality of healthcare can be greatly improved by employing big data applications to identify trends and discover knowledge. Details generated in the hospitals fall in the following categories.

Clinical data: Doctor's notes, prescription data, medical imaging reports, laboratory, pharmacy, and insurance related data.
Patient data: EHRs related to patient admission details, diagnosis, and treatment.
Machine generated/sensor data: Data obtained from monitoring critical symptoms, emergency care data, web-based media posts, news feeds, and medical journal articles.

The pharmaceutical companies, for example, can effectively utilize this data to identify new potential drug candidates and predictive data modeling can substantially decrease the expenses on drug discovery and improve the decision-making process in healthcare. Predictive modeling helps in producing a faster and more targeted research with respect to drugs and medical devices.

AI depends on calculations that can gain from information without depending on rule-based programming while big data is the type of data that can be supplied to analytical systems so that a machine learning model could learn or, in other words, improve the accuracy of its predictions. Machine learning algorithms is classified in three sorts, particularly supervised, unsupervised, and reinforcement learning.

Perhaps, the most famous procedure in information mining is clustering which is the method of identifying similar groups of data. The groups are created in a manner wherein entities in one group are more similar to each other than to those belonging to the other groups. Although it is an unsupervised machine learning technique, such collections can be used as features in supervised AI model.

Coronary illness, the primary reason behind morbidness and fatality globally, was responsible for more deaths annually compared to any other cause [1]. Fortunately, cardiovascular failures are exceptionally preventable and straightforward way of life alterations alongside early treatment incredibly improves the prognosis. It is, nonetheless, hard to recognize high-risk patients because of the presence of different factors that add to the danger of coronary illness like diabetes, hypertension, and elevated cholesterol. This is where information mining and AI have acted the hero by creating screening devices. These devices are helpful on account of their predominance in pattern recognition and classification when contrasted with other conventional statistical methodologies.

For exploring this with the assistance of machine learning algorithms, we gathered a dataset of vascular heart disease from Kaggle [3]. It consists of three categories of input features, namely, objective consisting of real statistics, examination comprising of results of clinical assessment, and subjective handling patient related information.

Based on this information, we applied various machine learning algorithms and analyzed the accuracy achieved by each of the methods. For this report, we have used Naive Bayes, Decision Tree, Random Forest, and various combinations of using these algorithms in order to further improve the accuracy. Numerous scientists have just utilized this dataset for their examination and delivered their individual outcomes. The target of gathering and applying methods on this dataset is to improve the precision of our model. For this reason, we gave different algorithms a shot on this dataset and successfully improved the accuracy of our model.

We suggested using the ensemble method [2] which is the process of solving a particular computer intelligence problem by strategically combining multiple models, such as classifiers or experts. Additionally, we have take the wrongly classified records by all the methods and tried to understand the reason for wrong classification and modify it mathematically in order to give accurate results and improve model performance continuously.

1.1.1 Scope and Motivation

Exploring different classification and integration algorithms to perceive teams in an exceedingly real-world health record data stored electronically having high dimension capacity and find algorithms that detect clusters within reasonable computation time and ability to scale with increasing data size/features while giving the highest possible accuracy. Diagnosis is a challenging process that, as of today, involves many human-to-human interactions. A machine would increase the speed of giving a diagnosis and lead to a more rapid treatment decision and would be able to detect rare events easier than humans.

1.2 Literature Review

Over the years, many strategies have been used regarding data processing and model variability in the field of cardiovascular diagnostics. Authors in [4] show that splitting the data into 70:30 ratio using for tutoring and examination purpose and 10-fold cross proofing putting logistic regression into operation improved the accuracy of the UCI dataset to 87%.

Authors in [5] have used ensemble classification techniques using multiple classifiers followed by score level ensemble for improving the prediction accuracy. They pointed out that maximum voting produces the highest level of development. This functionality is enhanced by using feature selection.

Hybrid approach has been proposed in [6] by consolidating Random Forest along with Linear method leading to a precision of around 90%. In [7], Vertical Hoeffding Decision Tree (VHDT) was used accuracy of 85.43% using 10-fold cross-validation.

Authors in [8] outline a multi-faceted voting system that can anticipate the conceivable presence of coronary illness in humans. It employs four classifiers which are SGD, KNN, Random Forest, and Logistic Regression and joins them in a consolidated way where group formation is performed by a large vote of the species making 90% accuracy.

The strategy utilized in [9] finds these features by way of correlation which can help enhanced prediction results. UCI coronary illness dataset is used to evaluate the result with [6]. Their proposed model accomplished precision of 86.94% which outflanks Hoeffding tree technique which reported accuracy of 85.43%.

Different classifiers, mainly, Decision Tree, NB, MLP, KNN, SCRL, RBF, and SVM have been utilized in [10]. Moreover, integrated methods of bagging, boosting, and stacking have been applied to the database. The results of the examination demonstrate that the SVM strategy utilizing the boosting procedure outflanks the other previously mentioned techniques.

It was exhibited in [11] after various analyses that, if we increase the feature space of RF algorithm while using forecasts and probability of a tuple to belong to a particular class from Naive Bayes model, then we could increase the precision achieved in identifying the categories, by and large.

Studies in [12] suggested that Naive Bayes gives best result when combined with Random Forest. Also, when KNN is combined with RF or RF+NB, the errors remain same suggesting that it is the dominating method.

Authors in [13] compared the precision of various models in classification of coronary disease taking Kaggle dataset of 70,000 records as input. The algorithms used were Random Forest, Naive Bayes, Logistic Regression, and KNN among whom Random Forest was...

Systemvoraussetzungen

Als PDF speichern Als Link merken

Bioinformatics and Medical Applications

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

1 Probabilistic Optimization of Machine Learning Algorithms for Heart Disease Prediction

Abstract

1.1 Introduction

1.1.1 Scope and Motivation

1.2 Literature Review

Systemvoraussetzungen

1
Probabilistic Optimization of Machine Learning Algorithms for Heart Disease Prediction