Machine Learning in Chemical Safety and Health

Name: Machine Learning in Chemical Safety and Health | Fundamentals with Applications
Brand: Wiley
Price: 143.99 EUR
Availability: OnlineOnly

Fundamentals with Applications

Qingsheng Wang Changjie Cai(Editor)

Wiley (Publisher)

1st Edition

Published on 21. October 2022

320 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-81750-5 (ISBN)

€143.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Persons

Content

List of Contributors xiii

Preface xvii

1 Introduction 1

Pingfan Hu and Qingsheng Wang

1.1 Background 2

1.2 Current State 5

1.2.1 Flammability Characteristics Prediction Using Quantitative Structure-Property

Relationship 5

1.2.2 Consequence Prediction Using Quantitative Property-Consequence

Relationship 6

1.2.3 Machine Learning in Process Safety and Asset Integrity Management 6

1.2.4 Machine Learning for Process Fault Detection and Diagnosis 7

1.2.5 Intelligent Method for Chemical Emission Source Identification 7

1.2.6 Machine Learning and Deep Learning Applications in Medical Image Analysis 7

1.2.7 Predictive Nanotoxicology: Nanoinformatics Approach to Toxicity Analysis of

Nanomaterials 8

1.2.8 Machine Learning in Environmental Exposure Assessment 8

1.2.9 Air Quality Prediction Using Machine Learning 8

1.3 Software and Tools 9

1.3.1 R 9

1.3.2 Python 12

References 13

2 Machine Learning Fundamentals 19

Yan Yan

2.1 What Is Learning? 19

2.1.1 Machine Learning Applications and Examples 20

2.1.2 Machine Learning Tasks 21

2.2 Concepts of Machine Learning 22

2.3 Machine Learning Paradigms 24

2.4 Probably Approximately Correct Learning 25

2.4.1 Deterministic Setting 26

2.4.2 Stochastic Setting 29

0005453285.3D 5 30/8/2022 8:51:33 PM

2.5 Estimation and Approximation 31

2.6 Empirical Risk Minimization 32

2.6.1 Empirical Risk Minimizer 32

2.6.2 VC-dimension Generalization Bound 33

2.6.3 General Loss Functions 34

2.7 Regularization 35

2.7.1 Regularized Loss Minimization 35

2.7.2 Constrained and Regularized Problem 36

2.7.3 Trade-off Between Estimation and Approximation Error 37

2.8 Maximum Likelihood Principle 38

2.8.1 Maximum Likelihood Estimation 39

2.8.2 Cross Entropy Minimization 40

2.9 Optimization 41

2.9.1 Linear Regression: An Example 42

2.9.2 Closed-form Solution 42

2.9.3 Gradient Descent 43

2.9.4 Stochastic Gradient Descent 45

References 46

3 Flammability Characteristics Prediction Using QSPR Modeling 47

Yong Pan and Juncheng Jiang

3.1 Introduction 47

3.1.1 Flammability Characteristics 47

3.1.2 QSPR Application 48

3.1.2.1 Concept of QSPR 48

3.1.2.2 Trends and Characteristics of QSPR 48

3.2 Flowchart for Flammability Characteristics Prediction 49

3.2.1 Dataset Preparation 51

3.2.2 Structure Input and Molecular Simulation 52

3.2.3 Calculation of Molecular Descriptors 53

3.2.4 Preliminary Screening of Molecular Descriptors 54

3.2.5 Descriptor Selection and Modeling 55

3.2.6 Model Validation 57

3.2.6.1 Model Fitting Ability Evaluation 57

3.2.6.2 Model Stability Analysis 59

3.2.6.3 Model Predictivity Evaluation 60

3.2.7 Model Mechanism Explanation 61

3.2.8 Summary of QSPR Process 61

3.3 QSPR Review for Flammability Characteristics 62

3.3.1 Flammability Limits 62

3.3.1.1 LFLT and LFL 62

3.3.1.2 UFLT and UFL 64

3.3.2 Flash Point 65

3.3.3 Auto-ignition Temperature 68

3.3.4 Heat of Combustion 69

vi Contents

0005453285.3D 6 30/8/2022 8:51:33 PM

3.3.5 Minimum Ignition Energy 70

3.3.6 Gas-liquid Critical Temperature 70

3.3.7 Other Properties 72

3.4 Limitations 72

3.5 Conclusions and Future Prospects 73

References 73

4 Consequence Prediction and Quantitative Property-Consequence Relationship

Models 81

Zeren Jiao and Qingsheng Wang

4.1 Introduction 81

4.2 Conventional Consequence Prediction Methods 82

4.2.1 Empirical Method 82

4.2.2 Computational Fluid Dynamics (CFD) Method 83

4.2.3 Integral Method 84

4.3 Machine Learning and Deep Learning-Based Consequence Prediction Models 84

4.4 Quantitative Property-Consequence Relationship Models 86

4.4.1 Consequence Database 88

4.4.2 Property Descriptors 89

4.4.3 Machine Learning and Deep Learning Algorithms 89

4.5 Challenges and Future Directions 90

References 91

5 Machine Learning in Process Safety and Asset Integrity Management 93

Ming Yang ,Hao Sun and Rustam Abubarkirov

5.1 Opportunities and Threats 93

5.2 State-of-the-Art Reviews 95

5.2.1 Artificial Neural Networks (ANNs) 95

5.2.2 Principal Component Analysis (PCA) 97

5.2.3 Genetic Algorithm (GA) 97

5.3 Case Study of Asset Integrity Assessment 98

5.4 Data-Driven Model of Asset Integrity Assessment 105

5.4.1 Condition Monitoring Data Collection 106

5.4.2 Data Processing and Storage 106

5.4.3 Data Mining for Risk Quantification and Monitoring Control 107

5.4.4 AIM Application 107

5.4.5 The Application of the Framework 108

5.5 Conclusion 109

References 109

6 Machine Learning for Process Fault Detection and Diagnosis 113

Rajeevan Arunthavanathan, Salim Ahmed, Faisal Khan and Syed Imtiaz

6.1 Background 113

6.2 Machine Learning Approaches in Fault Detection and Diagnosis 114

6.3 Supervised Methods for Fault Detection and Diagnosis 115

Contents vii

0005453285.3D 7 30/8/2022 8:51:33 PM

6.3.1 Neural Network 115

6.3.1.1 Neural Network Theory and Algorithm 115

6.3.1.2 Neural Network Learning for Fault Classification 117

6.3.1.3 Algorithm for Fault Classification Using Neural Network 118

6.3.2 Support Vector Machine 118

6.3.2.1 Support Vector Machine Theory and Algorithm 118

6.3.3 Support Vector Machine Model Selection and Algorithm 120

6.3.4 Support Vector Machine Multiclass Classification 121

6.4 Unsupervised Learning Models for Fault Detection and Diagnosis 122

6.4.1 K-Nearest Neighbors 122

6.4.2 One-Class Support Vector Machine 123

6.4.3 One-Class Neural Network 124

6.4.4 Comparison Between Deep Learning with Machine Learning in Fault Detection

and Diagnosis 126

6.5 Intelligent FDD Using Machine Learning 127

6.5.1 Model Development 127

6.5.2 Data Collection 129

6.5.2.1 Model Development Steps 129

6.5.2.2 Result Comparison 130

6.6 Concluding Remarks 134

References 134

7 Intelligent Method for Chemical Emission Source Identification 139

Denglong Ma

7.1 Introduction 139

7.1.1 Development of Detecting Gas Emission 139

7.1.2 Development of Source Term Identification 140

7.2 Intelligent Methods for Recognizing Gas Emission 141

7.2.1 Leakage Recognition of Sequestrated CO2 in the Atmosphere 141

7.2.1.1 Gas Leakage Recognition for CO2 Geological Sequestration 142

7.2.1.2 Case Studies for CO2 Recognition 144

7.2.2 Emission Gas Identification with Artificial Olfactory 149

7.2.2.1 Features of Responses in AOS 150

7.2.2.2 Support Vector Machine Models for Gas Identification 150

7.2.2.3 Deep Learning Models for Gas Identification 155

7.3 Intelligent Methods for Identifying Emission Sources 158

7.3.1 Source Estimation with Intelligent Optimization Method 158

7.3.1.1 Principle of Source Estimation with Optimization Method 158

7.3.1.2 Case Studies of Source Estimation with Optimization Method 159

7.3.2 Source Estimation with MRE-PSO Method 159

7.3.2.1 Principle of PSO-MRE for Source Estimation 161

7.3.2.2 Case Studies 163

7.3.3 Source Estimation with PSO-Tikhonov Regulation Method 164

7.3.3.1 Principle of PSO-Tikhonov Regularization Hybrid Method 164

7.3.3.2 Case Study 167

viii Contents

0005453285.3D 8 30/8/2022 8:51:33 PM

7.3.4 Source Estimation with MCMC-MLA Method 168

7.3.4.1 Forward Gas Dispersion Model Based on MLA 168

7.3.4.2 Source Estimation with MCMC-MLA Method 169

7.3.4.3 Case Study 172

7.4 Conclusions and Future Work 173

7.4.1 Conclusions 173

7.4.2 Limitations and Future Work 177

References 178

8 Machine Learning and Deep Learning Applications in Medical Image

Analysis 183

Pingfan Hu, Changjie Cai, Yu Feng and Qingsheng Wang

8.1 Introduction 183

8.1.1 Machine Learning in Medical Imaging 183

8.1.2 Deep Learning in Medical Imaging 183

8.2 CNN-Based Models for Classification 184

8.2.1 ResNet50 184

8.2.2 YOLOv4 (Darknet53) 185

8.2.3 Grad-CAM 186

8.3 Case Study 186

8.3.1 Background 186

8.3.2 Study Design 187

8.3.3 Training and Testing Database Preparation 187

8.3.4 Results 190

8.3.4.1 Classification Performance of the Modified ResNet50 Model 190

8.3.4.2 Classification Performance of the YOLOv4 Model 190

8.3.4.3 Post-Processing Via Grad-CAM Model and HSV 193

8.3.5 Conclusion 194

8.4 Limitations and Future Work 194

References 195

9 Predictive Nanotoxicology: Nanoinformatics Approach to Toxicity Analysis of

Nanomaterials 199

Bilal M. Khan and Yoram Cohen

9.1 Predictive Nanotoxicology 199

9.1.1 Introduction 199

9.1.2 Nano Quantitative Structure-Activity Relationship (QSAR) 200

9.1.3 Importance of Data for Nanotoxicology 204

9.2 Machine Learning Modeling for Predictive Nanotoxicology 205

9.2.1 Overview 205

9.2.2 Unsupervised Learning 211

9.2.2.1 Data Exploration Via Self-Organizing Maps (SOMs) 211

9.2.2.2 Evaluating Associations among Sublethal Toxicity Responses 214

9.2.3 Supervised Learning 215

9.2.3.1 Random Forest Models 216

Contents ix

0005453285.3D 9 30/8/2022 8:51:33 PM

9.2.3.2 Support Vector Machines 216

9.2.3.3 Bayesian Networks 216

9.2.3.4 Supervised Classification and Regression-Based Models for Nano-(Q)SARs 218

9.2.4 Predictive Nano-(Q)SARs for the Assessment of Causal Relationships 220

9.3 Development of Machine Learning Based Models for Nano-(Q)SARs 224

9.3.1 Overview 224

9.3.1.1 Data-Driven Models 224

9.3.1.2 Mechanistic/Theoretical Models 225

9.3.2 Data Generation, Collection, and Preprocessing 225

9.3.3 Descriptor Selection 226

9.3.4 Model Selection and Training 229

9.3.5 Model Validation 230

9.3.5.1 Descriptor Importance 231

9.3.5.2 Applicability Domain 231

9.3.6 Model Diagnosis and Debugging 231

9.4 Nanoinformatics Approaches to Predictive Nanotoxicology 234

9.5 Summary 235

References 238

10 Machine Learning in Environmental Exposure Assessment 251

Gregory L. Watson

10.1 Introduction 251

10.2 Environmental Exposure Modeling 252

10.3 Machine Learning Exposure Models 254

10.4 Model Evaluation 257

10.5 Case Study 258

10.6 Other Topics 260

10.6.1 Bias and Fairness 260

10.6.2 Wearable Sensors 260

10.6.3 Interpretability 260

10.6.4 Extreme Events 260

10.7 Conclusion 261

References 261

11 Air Quality Prediction Using Machine Learning 267

Lan Gao, Changjie Cai and Xiao-Ming Hu

11.1 Introduction 267

11.2 Air Quality and Climate Data Acquisition 269

11.2.1 Earth Satellite Observation Datasets 269

11.2.1.1 Basics of Earth Satellite Observations 269

11.2.1.2 Earth Satellite Products 270

11.2.2 Ground-Based In Situ Observation Datasets 276

11.2.2.1 Basics of the Ground-Based In Situ Observations 276

11.2.2.2 Ground-Based In Situ Products 277

11.3 Applications of Machine Learning in Air Quality Study 279

x Contents

0005453285.3D 10 30/8/2022 8:51:34 PM

11.3.1 Shallow Learning 280

11.3.2 Deep Learning 280

11.4 An Application Practice Example 281

11.4.1 Satellite Data Acquisition and Variable Selections 282

11.4.2 Machine Learning and Deep Learning Algorithms 282

References 283

12 Current Challenges and Perspectives 289

Changjie Cai and Qingsheng Wang

12.1 Current Challenges 289

12.1.1 Data Development and Cleaning 289

12.1.2 Hardware Issues 290

12.1.3 Data Confidentiality 290

12.1.4 Other Challenges 291

12.2 Perspectives 291

12.2.1 Real-Time Monitoring and Forecast of Chemical Hazards 291

12.2.2 Toolkits for Dummies 292

12.2.3 Physics-Informed Machine Learning 292

References 293

Index 000

1
Introduction

Pingfan Hu and Qingsheng Wang

Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, USA

Machine learning (ML) is a method spanning a broad array of disciplines, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and others. Furthermore, it is the core subset of artificial intelligence (AI). The term "machine learning" was first proposed in 1959 by Arthur Samuel (Samuel 1959). Machine learning algorithms can build mathematical models based on training data to make predictions or decisions without being explicitly programmed to do so. Bayesian and Laplace's derivations of least squares and Markov chains, which date back to the seventeenth century, have previously constituted the tools and foundations widely used in ML (Andrieu et al. 2003). Since then, the ML algorithms have developed tremendously and have been widely applied in various aspects of scientific research and everyday life. These include data mining (Mitchell 1999), computer vision (Voulodimos et al. 2018), natural language processing (Cambria and White 2014), biometric recognition (Chaki et al. 2019), medical diagnosis (Bakator and Radosav 2008), detection of credit card fraud (Modi and Dayma 2017), stock market analysis (Chong et al. 2017), speech and handwriting recognition (Nassif et al. 2019), strategy games (Robertson and Watson 2015), and robotics (Pierson and Gashler 2017).

Deep learning (DL) is a relatively new branch within the field of ML. It is an algorithm that uses artificial neural networks (ANNs) as the architecture to characterize and learn data. The concept of DL originates from the research of ANNs, and a multilayer perceptron with multiple hidden layers is a DL structure (Lecun et al. 2015). DL forms a more abstract high-level representation attribute category or feature by combining low-level features to discover distributed feature representations of data. Several DL frameworks have been utilized, including deep neural network (DNN), convolutional neural network (CNN), and recurrent neural network (RNN).

The applications of ML algorithms in chemical safety and health studies date back to the mid-1990s (Lee et al. 1995). Some research used basic ML algorithms in toxicity classification and prediction studies. For other fields such as hazardous property prediction and consequence analysis, the implementation of ML/DL algorithms did not emerge until the late 2000s (Pan et al. 2008; Pan et al. 2009). Chemical safety and health, although an important field, has rarely been investigated using interdisciplinary research with applied ML. This is because at the early development stage of ML/DL, the algorithm was relatively primitive, and its excellent predictive capabilities and accuracy were not widely verified and proven. Second, due to the lack of relatively simple and easy-to-use toolkits and the high skill requirements for algorithms and programming, the applications of ML/DL algorithms in chemical safety and health research have been limited. As a result, studies implementing ML have been relatively rare in the field of chemical safety and health in the late twentieth century and first decade of the twenty-first century.

However, with the rapid advancement of AI and computer science in the past 10 years, the importance of ML/DL and their unparalleled advantages over traditional statistical methods and labor-intensive work have drawn increasing attention and hence have developed significantly. There is also growing interest in expanding the application of ML/DL in the research field of chemical safety and health in academia.

In this book, ML fundamentals as well as popular ML/DL tools for the implementation of ML/DL in chemical safety and health research are introduced (Jiao et al. 2020a). For the applications of ML/DL, the book describes flammability characteristics predictions using quantitative structure-property relationship modeling (Chapter 3), consequence prediction using quantitative property-consequence relationship modeling (Chapter 4), ML involving process safety and asset integrity management (Chapter 5), and ML for process fault detection and diagnosis (Chapter 6). Furthermore, the book describes intelligent methods for chemical emission source identification (Chapter 7), ML and DL applications in medical image analysis (Chapter 8), predictive nanotoxicology: nanoinformatics approach for toxicity analysis of nanomaterials (Chapter 9), ML in environmental exposure assessment (Chapter 10), and air quality prediction using ML (Chapter 11). This book provides useful guidance for researchers and practitioners who are interested in implementing ML/DL related to chemical safety and health. This book is an excellent reference for readers to find more information about novel ML/DL tools and algorithms.

1.1 Background

Author Tom Mitchell provides a modern definition of ML as follows: "A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P improves with experience E" (Jordan and Mitchell 2015). In general, there are three types of ML: supervised learning, unsupervised learning, and reinforcement learning.

Supervised learning learns a function from a given training data set. When new data (validation/test data) comes, it can predict the results based on the function. The training set requirements for supervised learning include inputs (features) and outputs (targets). The targets in the training set are already labeled (with specific experimental/simulation values). Common supervised learning algorithms include regression and classification algorithms. While some algorithms are only capable of classification analysis (e.g. linear discrimination analysis, naive Bayes classification), most of them (e.g. k-nearest neighbor, random forest) are able to conduct both classification analysis and regression analysis (James et al. 2017; Witten et al. 2017).

The difference between supervised learning and unsupervised learning is whether or not the target of the training set is labeled. Compared with supervised learning, the training set of unsupervised learning has no artificially labeled results. Common unsupervised learning algorithms can be used for clustering (James et al. 2017; Witten et al. 2017). There is also semi-supervised learning, which combines elements of supervised learning and unsupervised learning. The algorithm for semi-supervised learning gradually adjusts its behavior as the environment changes.

For DL, the original work on neural networks was published by Warren McCulloch and Walter Pitts in 1943 (McCulloch and Pitts 1943). They introduced the McCulloch-Pitts neural model, also known as the "linear threshold gate." As the first computational model of a neuron, the McCulloch-Pitts neural model is very simplistic, generating only a binary output. The weights and threshold require hand-tuning. In the 1950s, the perceptron became the first model with the capability to autonomously learn the optimal weight coefficients, allowing the training of a single neuron (Rosenblatt 1958). With the help of the backpropagation algorithm, neural networks began to be trained with one or two hidden layers (Rumelhart et al. 1986).

A single hidden layer neural network consists of three layers: input layer, hidden layer, and output layer. In the neural network that is trained with supervised learning, the training set contains values for the inputs x and target outputs y. The hidden layer refers to the fact that in a training set, the true values for these nodes are not observed. As shown in Figure 1.1, a notation for the values of the input features is a [0], where the term "a" stands for activation. It refers to the values that different layers of the neural network pass on to the subsequent layers. After the input layer passes on the values x to the hidden layer, the hidden layer in turn generates some sets of activations, a [1]. Finally, the output layer generates some value a [2], which is a real number that equals the value of y. The hidden layer and output layer are associated with the parameters w and b. In order to compute the outputs (a) of the neural network, which is a sigmoid function of z (s(z)), it is similar to operating repeated logistic regression. The calculations are shown in Eqs. 1.1 through 1.4. Besides the sigmoid function, other activation functions can be used to compute the hidden layer values. In modern neural networks, the default recommendation is to use hyperbolic tangent (tanh) or the rectified linear unit (ReLU).

Figure 1.1 Structure of a single hidden layer neural network.

(1.1) (1.2) (1.3) (1.4)

In recent years, the ML community has determined that some cases can only be learned using DNNs rather than the single hidden layer neural networks (Hinton et al. 2006). DNNs with multiple hidden layers can use earlier layers to learn about low-level simpler features and then use the later...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Machine Learning in Chemical Safety and Health

Description

More details

Other editions

Additional editions

Persons

Content

1
Introduction

1.1 Background

System requirements