
Machine Learning in Chemical Safety and Health
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
There is a growing interest in the application of machine learning algorithms in chemical safety and health-related model development, with applications in areas including property and toxicity prediction, consequence prediction, and fault detection. This book is the first to review the current status of machine learning implementation in chemical safety and health research and to provide guidance for implementing machine learning techniques and algorithms into chemical safety and health research.
Written by an international team of authors and edited by renowned experts in the areas of process safety and occupational and environmental health, sample topics covered within the work include:
* An introduction to the fundamentals of machine learning, including regression, classification and cross-validation, and an overview of software and tools
* Detailed reviews of various applications in the areas of chemical safety and health, including flammability prediction, consequence prediction, asset integrity management, predictive nanotoxicity and environmental exposure assessment, and more
* Perspective on the possible future development of this field
Machine Learning in Chemical Safety and Health serves as an essential guide on both the fundamentals and applications of machine learning for industry professionals and researchers in the fields of process safety, chemical safety, occupational and environmental health, and industrial hygiene.
More details
Other editions
Additional editions


Persons
Qingsheng Wang is Associate Professor of Chemical Engineering and George Armistead '23 Faculty Fellow at Texas A&M University. He has over 15 years of experience in the areas of process safety and fire protection. His experience is wide ranging, involving machine learning in chemical safety, flame retardant materials, fire and explosion dynamics, and composite manufacturing for safety and sustainability. He is a registered professional engineer (PE) and certified safety professional (CSP), and currently a principal member of the NFPA 18 and NFPA 30 committees. Professor Wang has established the Multiscale Process Safety Laboratory at Texas A&M and is currently leading the lab. He has published over 150 peer-reviewed journal publications and 6 book chapters. His work has been internationally recognized and heavily cited, and he is recognized as a world leader in the field of process safety.
Changjie Cai is Assistant Professor of Occupational and Environmental Health from Hudson College of Public Health at the University of Oklahoma Health Sciences Center. Dr Cai has formed an interdisciplinary research lab focusing on three major areas: (i) Developing portable and cost-effective devices to identify, assess and control the safety and health hazards; (ii) Integrating artificial intelligence techniques into safety and health fields; (iii) Modeling the hazard dispersion and their climate effects using chemical transport models.
Content
List of Contributors xiii
Preface xvii
1 Introduction 1
Pingfan Hu and Qingsheng Wang
1.1 Background 2
1.2 Current State 5
1.2.1 Flammability Characteristics Prediction Using Quantitative Structure-Property
Relationship 5
1.2.2 Consequence Prediction Using Quantitative Property-Consequence
Relationship 6
1.2.3 Machine Learning in Process Safety and Asset Integrity Management 6
1.2.4 Machine Learning for Process Fault Detection and Diagnosis 7
1.2.5 Intelligent Method for Chemical Emission Source Identification 7
1.2.6 Machine Learning and Deep Learning Applications in Medical Image Analysis 7
1.2.7 Predictive Nanotoxicology: Nanoinformatics Approach to Toxicity Analysis of
Nanomaterials 8
1.2.8 Machine Learning in Environmental Exposure Assessment 8
1.2.9 Air Quality Prediction Using Machine Learning 8
1.3 Software and Tools 9
1.3.1 R 9
1.3.2 Python 12
References 13
2 Machine Learning Fundamentals 19
Yan Yan
2.1 What Is Learning? 19
2.1.1 Machine Learning Applications and Examples 20
2.1.2 Machine Learning Tasks 21
2.2 Concepts of Machine Learning 22
2.3 Machine Learning Paradigms 24
2.4 Probably Approximately Correct Learning 25
2.4.1 Deterministic Setting 26
2.4.2 Stochastic Setting 29
v
0005453285.3D 5 30/8/2022 8:51:33 PM
2.5 Estimation and Approximation 31
2.6 Empirical Risk Minimization 32
2.6.1 Empirical Risk Minimizer 32
2.6.2 VC-dimension Generalization Bound 33
2.6.3 General Loss Functions 34
2.7 Regularization 35
2.7.1 Regularized Loss Minimization 35
2.7.2 Constrained and Regularized Problem 36
2.7.3 Trade-off Between Estimation and Approximation Error 37
2.8 Maximum Likelihood Principle 38
2.8.1 Maximum Likelihood Estimation 39
2.8.2 Cross Entropy Minimization 40
2.9 Optimization 41
2.9.1 Linear Regression: An Example 42
2.9.2 Closed-form Solution 42
2.9.3 Gradient Descent 43
2.9.4 Stochastic Gradient Descent 45
References 46
3 Flammability Characteristics Prediction Using QSPR Modeling 47
Yong Pan and Juncheng Jiang
3.1 Introduction 47
3.1.1 Flammability Characteristics 47
3.1.2 QSPR Application 48
3.1.2.1 Concept of QSPR 48
3.1.2.2 Trends and Characteristics of QSPR 48
3.2 Flowchart for Flammability Characteristics Prediction 49
3.2.1 Dataset Preparation 51
3.2.2 Structure Input and Molecular Simulation 52
3.2.3 Calculation of Molecular Descriptors 53
3.2.4 Preliminary Screening of Molecular Descriptors 54
3.2.5 Descriptor Selection and Modeling 55
3.2.6 Model Validation 57
3.2.6.1 Model Fitting Ability Evaluation 57
3.2.6.2 Model Stability Analysis 59
3.2.6.3 Model Predictivity Evaluation 60
3.2.7 Model Mechanism Explanation 61
3.2.8 Summary of QSPR Process 61
3.3 QSPR Review for Flammability Characteristics 62
3.3.1 Flammability Limits 62
3.3.1.1 LFLT and LFL 62
3.3.1.2 UFLT and UFL 64
3.3.2 Flash Point 65
3.3.3 Auto-ignition Temperature 68
3.3.4 Heat of Combustion 69
vi Contents
0005453285.3D 6 30/8/2022 8:51:33 PM
3.3.5 Minimum Ignition Energy 70
3.3.6 Gas-liquid Critical Temperature 70
3.3.7 Other Properties 72
3.4 Limitations 72
3.5 Conclusions and Future Prospects 73
References 73
4 Consequence Prediction and Quantitative Property-Consequence Relationship
Models 81
Zeren Jiao and Qingsheng Wang
4.1 Introduction 81
4.2 Conventional Consequence Prediction Methods 82
4.2.1 Empirical Method 82
4.2.2 Computational Fluid Dynamics (CFD) Method 83
4.2.3 Integral Method 84
4.3 Machine Learning and Deep Learning-Based Consequence Prediction Models 84
4.4 Quantitative Property-Consequence Relationship Models 86
4.4.1 Consequence Database 88
4.4.2 Property Descriptors 89
4.4.3 Machine Learning and Deep Learning Algorithms 89
4.5 Challenges and Future Directions 90
References 91
5 Machine Learning in Process Safety and Asset Integrity Management 93
Ming Yang ,Hao Sun and Rustam Abubarkirov
5.1 Opportunities and Threats 93
5.2 State-of-the-Art Reviews 95
5.2.1 Artificial Neural Networks (ANNs) 95
5.2.2 Principal Component Analysis (PCA) 97
5.2.3 Genetic Algorithm (GA) 97
5.3 Case Study of Asset Integrity Assessment 98
5.4 Data-Driven Model of Asset Integrity Assessment 105
5.4.1 Condition Monitoring Data Collection 106
5.4.2 Data Processing and Storage 106
5.4.3 Data Mining for Risk Quantification and Monitoring Control 107
5.4.4 AIM Application 107
5.4.5 The Application of the Framework 108
5.5 Conclusion 109
References 109
6 Machine Learning for Process Fault Detection and Diagnosis 113
Rajeevan Arunthavanathan, Salim Ahmed, Faisal Khan and Syed Imtiaz
6.1 Background 113
6.2 Machine Learning Approaches in Fault Detection and Diagnosis 114
6.3 Supervised Methods for Fault Detection and Diagnosis 115
Contents vii
0005453285.3D 7 30/8/2022 8:51:33 PM
6.3.1 Neural Network 115
6.3.1.1 Neural Network Theory and Algorithm 115
6.3.1.2 Neural Network Learning for Fault Classification 117
6.3.1.3 Algorithm for Fault Classification Using Neural Network 118
6.3.2 Support Vector Machine 118
6.3.2.1 Support Vector Machine Theory and Algorithm 118
6.3.3 Support Vector Machine Model Selection and Algorithm 120
6.3.4 Support Vector Machine Multiclass Classification 121
6.4 Unsupervised Learning Models for Fault Detection and Diagnosis 122
6.4.1 K-Nearest Neighbors 122
6.4.2 One-Class Support Vector Machine 123
6.4.3 One-Class Neural Network 124
6.4.4 Comparison Between Deep Learning with Machine Learning in Fault Detection
and Diagnosis 126
6.5 Intelligent FDD Using Machine Learning 127
6.5.1 Model Development 127
6.5.2 Data Collection 129
6.5.2.1 Model Development Steps 129
6.5.2.2 Result Comparison 130
6.6 Concluding Remarks 134
References 134
7 Intelligent Method for Chemical Emission Source Identification 139
Denglong Ma
7.1 Introduction 139
7.1.1 Development of Detecting Gas Emission 139
7.1.2 Development of Source Term Identification 140
7.2 Intelligent Methods for Recognizing Gas Emission 141
7.2.1 Leakage Recognition of Sequestrated CO2 in the Atmosphere 141
7.2.1.1 Gas Leakage Recognition for CO2 Geological Sequestration 142
7.2.1.2 Case Studies for CO2 Recognition 144
7.2.2 Emission Gas Identification with Artificial Olfactory 149
7.2.2.1 Features of Responses in AOS 150
7.2.2.2 Support Vector Machine Models for Gas Identification 150
7.2.2.3 Deep Learning Models for Gas Identification 155
7.3 Intelligent Methods for Identifying Emission Sources 158
7.3.1 Source Estimation with Intelligent Optimization Method 158
7.3.1.1 Principle of Source Estimation with Optimization Method 158
7.3.1.2 Case Studies of Source Estimation with Optimization Method 159
7.3.2 Source Estimation with MRE-PSO Method 159
7.3.2.1 Principle of PSO-MRE for Source Estimation 161
7.3.2.2 Case Studies 163
7.3.3 Source Estimation with PSO-Tikhonov Regulation Method 164
7.3.3.1 Principle of PSO-Tikhonov Regularization Hybrid Method 164
7.3.3.2 Case Study 167
viii Contents
0005453285.3D 8 30/8/2022 8:51:33 PM
7.3.4 Source Estimation with MCMC-MLA Method 168
7.3.4.1 Forward Gas Dispersion Model Based on MLA 168
7.3.4.2 Source Estimation with MCMC-MLA Method 169
7.3.4.3 Case Study 172
7.4 Conclusions and Future Work 173
7.4.1 Conclusions 173
7.4.2 Limitations and Future Work 177
References 178
8 Machine Learning and Deep Learning Applications in Medical Image
Analysis 183
Pingfan Hu, Changjie Cai, Yu Feng and Qingsheng Wang
8.1 Introduction 183
8.1.1 Machine Learning in Medical Imaging 183
8.1.2 Deep Learning in Medical Imaging 183
8.2 CNN-Based Models for Classification 184
8.2.1 ResNet50 184
8.2.2 YOLOv4 (Darknet53) 185
8.2.3 Grad-CAM 186
8.3 Case Study 186
8.3.1 Background 186
8.3.2 Study Design 187
8.3.3 Training and Testing Database Preparation 187
8.3.4 Results 190
8.3.4.1 Classification Performance of the Modified ResNet50 Model 190
8.3.4.2 Classification Performance of the YOLOv4 Model 190
8.3.4.3 Post-Processing Via Grad-CAM Model and HSV 193
8.3.5 Conclusion 194
8.4 Limitations and Future Work 194
References 195
9 Predictive Nanotoxicology: Nanoinformatics Approach to Toxicity Analysis of
Nanomaterials 199
Bilal M. Khan and Yoram Cohen
9.1 Predictive Nanotoxicology 199
9.1.1 Introduction 199
9.1.2 Nano Quantitative Structure-Activity Relationship (QSAR) 200
9.1.3 Importance of Data for Nanotoxicology 204
9.2 Machine Learning Modeling for Predictive Nanotoxicology 205
9.2.1 Overview 205
9.2.2 Unsupervised Learning 211
9.2.2.1 Data Exploration Via Self-Organizing Maps (SOMs) 211
9.2.2.2 Evaluating Associations among Sublethal Toxicity Responses 214
9.2.3 Supervised Learning 215
9.2.3.1 Random Forest Models 216
Contents ix
0005453285.3D 9 30/8/2022 8:51:33 PM
9.2.3.2 Support Vector Machines 216
9.2.3.3 Bayesian Networks 216
9.2.3.4 Supervised Classification and Regression-Based Models for Nano-(Q)SARs 218
9.2.4 Predictive Nano-(Q)SARs for the Assessment of Causal Relationships 220
9.3 Development of Machine Learning Based Models for Nano-(Q)SARs 224
9.3.1 Overview 224
9.3.1.1 Data-Driven Models 224
9.3.1.2 Mechanistic/Theoretical Models 225
9.3.2 Data Generation, Collection, and Preprocessing 225
9.3.3 Descriptor Selection 226
9.3.4 Model Selection and Training 229
9.3.5 Model Validation 230
9.3.5.1 Descriptor Importance 231
9.3.5.2 Applicability Domain 231
9.3.6 Model Diagnosis and Debugging 231
9.4 Nanoinformatics Approaches to Predictive Nanotoxicology 234
9.5 Summary 235
References 238
10 Machine Learning in Environmental Exposure Assessment 251
Gregory L. Watson
10.1 Introduction 251
10.2 Environmental Exposure Modeling 252
10.3 Machine Learning Exposure Models 254
10.4 Model Evaluation 257
10.5 Case Study 258
10.6 Other Topics 260
10.6.1 Bias and Fairness 260
10.6.2 Wearable Sensors 260
10.6.3 Interpretability 260
10.6.4 Extreme Events 260
10.7 Conclusion 261
References 261
11 Air Quality Prediction Using Machine Learning 267
Lan Gao, Changjie Cai and Xiao-Ming Hu
11.1 Introduction 267
11.2 Air Quality and Climate Data Acquisition 269
11.2.1 Earth Satellite Observation Datasets 269
11.2.1.1 Basics of Earth Satellite Observations 269
11.2.1.2 Earth Satellite Products 270
11.2.2 Ground-Based In Situ Observation Datasets 276
11.2.2.1 Basics of the Ground-Based In Situ Observations 276
11.2.2.2 Ground-Based In Situ Products 277
11.3 Applications of Machine Learning in Air Quality Study 279
x Contents
0005453285.3D 10 30/8/2022 8:51:34 PM
11.3.1 Shallow Learning 280
11.3.2 Deep Learning 280
11.4 An Application Practice Example 281
11.4.1 Satellite Data Acquisition and Variable Selections 282
11.4.2 Machine Learning and Deep Learning Algorithms 282
References 283
12 Current Challenges and Perspectives 289
Changjie Cai and Qingsheng Wang
12.1 Current Challenges 289
12.1.1 Data Development and Cleaning 289
12.1.2 Hardware Issues 290
12.1.3 Data Confidentiality 290
12.1.4 Other Challenges 291
12.2 Perspectives 291
12.2.1 Real-Time Monitoring and Forecast of Chemical Hazards 291
12.2.2 Toolkits for Dummies 292
12.2.3 Physics-Informed Machine Learning 292
References 293
Index 000
1
Introduction
Pingfan Hu and Qingsheng Wang
Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, USA
Machine learning (ML) is a method spanning a broad array of disciplines, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and others. Furthermore, it is the core subset of artificial intelligence (AI). The term "machine learning" was first proposed in 1959 by Arthur Samuel (Samuel 1959). Machine learning algorithms can build mathematical models based on training data to make predictions or decisions without being explicitly programmed to do so. Bayesian and Laplace's derivations of least squares and Markov chains, which date back to the seventeenth century, have previously constituted the tools and foundations widely used in ML (Andrieu et al. 2003). Since then, the ML algorithms have developed tremendously and have been widely applied in various aspects of scientific research and everyday life. These include data mining (Mitchell 1999), computer vision (Voulodimos et al. 2018), natural language processing (Cambria and White 2014), biometric recognition (Chaki et al. 2019), medical diagnosis (Bakator and Radosav 2008), detection of credit card fraud (Modi and Dayma 2017), stock market analysis (Chong et al. 2017), speech and handwriting recognition (Nassif et al. 2019), strategy games (Robertson and Watson 2015), and robotics (Pierson and Gashler 2017).
Deep learning (DL) is a relatively new branch within the field of ML. It is an algorithm that uses artificial neural networks (ANNs) as the architecture to characterize and learn data. The concept of DL originates from the research of ANNs, and a multilayer perceptron with multiple hidden layers is a DL structure (Lecun et al. 2015). DL forms a more abstract high-level representation attribute category or feature by combining low-level features to discover distributed feature representations of data. Several DL frameworks have been utilized, including deep neural network (DNN), convolutional neural network (CNN), and recurrent neural network (RNN).
The applications of ML algorithms in chemical safety and health studies date back to the mid-1990s (Lee et al. 1995). Some research used basic ML algorithms in toxicity classification and prediction studies. For other fields such as hazardous property prediction and consequence analysis, the implementation of ML/DL algorithms did not emerge until the late 2000s (Pan et al. 2008; Pan et al. 2009). Chemical safety and health, although an important field, has rarely been investigated using interdisciplinary research with applied ML. This is because at the early development stage of ML/DL, the algorithm was relatively primitive, and its excellent predictive capabilities and accuracy were not widely verified and proven. Second, due to the lack of relatively simple and easy-to-use toolkits and the high skill requirements for algorithms and programming, the applications of ML/DL algorithms in chemical safety and health research have been limited. As a result, studies implementing ML have been relatively rare in the field of chemical safety and health in the late twentieth century and first decade of the twenty-first century.
However, with the rapid advancement of AI and computer science in the past 10 years, the importance of ML/DL and their unparalleled advantages over traditional statistical methods and labor-intensive work have drawn increasing attention and hence have developed significantly. There is also growing interest in expanding the application of ML/DL in the research field of chemical safety and health in academia.
In this book, ML fundamentals as well as popular ML/DL tools for the implementation of ML/DL in chemical safety and health research are introduced (Jiao et al. 2020a). For the applications of ML/DL, the book describes flammability characteristics predictions using quantitative structure-property relationship modeling (Chapter 3), consequence prediction using quantitative property-consequence relationship modeling (Chapter 4), ML involving process safety and asset integrity management (Chapter 5), and ML for process fault detection and diagnosis (Chapter 6). Furthermore, the book describes intelligent methods for chemical emission source identification (Chapter 7), ML and DL applications in medical image analysis (Chapter 8), predictive nanotoxicology: nanoinformatics approach for toxicity analysis of nanomaterials (Chapter 9), ML in environmental exposure assessment (Chapter 10), and air quality prediction using ML (Chapter 11). This book provides useful guidance for researchers and practitioners who are interested in implementing ML/DL related to chemical safety and health. This book is an excellent reference for readers to find more information about novel ML/DL tools and algorithms.
1.1 Background
Author Tom Mitchell provides a modern definition of ML as follows: "A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P improves with experience E" (Jordan and Mitchell 2015). In general, there are three types of ML: supervised learning, unsupervised learning, and reinforcement learning.
Supervised learning learns a function from a given training data set. When new data (validation/test data) comes, it can predict the results based on the function. The training set requirements for supervised learning include inputs (features) and outputs (targets). The targets in the training set are already labeled (with specific experimental/simulation values). Common supervised learning algorithms include regression and classification algorithms. While some algorithms are only capable of classification analysis (e.g. linear discrimination analysis, naive Bayes classification), most of them (e.g. k-nearest neighbor, random forest) are able to conduct both classification analysis and regression analysis (James et al. 2017; Witten et al. 2017).
The difference between supervised learning and unsupervised learning is whether or not the target of the training set is labeled. Compared with supervised learning, the training set of unsupervised learning has no artificially labeled results. Common unsupervised learning algorithms can be used for clustering (James et al. 2017; Witten et al. 2017). There is also semi-supervised learning, which combines elements of supervised learning and unsupervised learning. The algorithm for semi-supervised learning gradually adjusts its behavior as the environment changes.
For DL, the original work on neural networks was published by Warren McCulloch and Walter Pitts in 1943 (McCulloch and Pitts 1943). They introduced the McCulloch-Pitts neural model, also known as the "linear threshold gate." As the first computational model of a neuron, the McCulloch-Pitts neural model is very simplistic, generating only a binary output. The weights and threshold require hand-tuning. In the 1950s, the perceptron became the first model with the capability to autonomously learn the optimal weight coefficients, allowing the training of a single neuron (Rosenblatt 1958). With the help of the backpropagation algorithm, neural networks began to be trained with one or two hidden layers (Rumelhart et al. 1986).
A single hidden layer neural network consists of three layers: input layer, hidden layer, and output layer. In the neural network that is trained with supervised learning, the training set contains values for the inputs x and target outputs y. The hidden layer refers to the fact that in a training set, the true values for these nodes are not observed. As shown in Figure 1.1, a notation for the values of the input features is a [0], where the term "a" stands for activation. It refers to the values that different layers of the neural network pass on to the subsequent layers. After the input layer passes on the values x to the hidden layer, the hidden layer in turn generates some sets of activations, a [1]. Finally, the output layer generates some value a [2], which is a real number that equals the value of y. The hidden layer and output layer are associated with the parameters w and b. In order to compute the outputs (a) of the neural network, which is a sigmoid function of z (s(z)), it is similar to operating repeated logistic regression. The calculations are shown in Eqs. 1.1 through 1.4. Besides the sigmoid function, other activation functions can be used to compute the hidden layer values. In modern neural networks, the default recommendation is to use hyperbolic tangent (tanh) or the rectified linear unit (ReLU).
Figure 1.1 Structure of a single hidden layer neural network.
(1.1) (1.2) (1.3) (1.4)In recent years, the ML community has determined that some cases can only be learned using DNNs rather than the single hidden layer neural networks (Hinton et al. 2006). DNNs with multiple hidden layers can use earlier layers to learn about low-level simpler features and then use the later...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.