Deep Learning

Name: Deep Learning | From Big Data to Artificial Intelligence with R
Brand: Wiley
Price: 71.99 EUR
Availability: OnlineOnly

From Big Data to Artificial Intelligence with R

Stephane Tuffery(Autor*in)

Wiley (Verlag)

1. Auflage

Erschienen am 22. November 2022

544 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-119-84503-4 (ISBN)

71,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Person

Inhalt

Acknowledgements xiii

Introduction xv

1 From Big Data to Deep Learning 1

1.1 Introduction 1

1.2 Examples of the Use of Big Data and Deep Learning 6

1.3 Big Data and Deep Learning for Companies and Organizations 9

1.3.1 Big Data in Finance 10

1.3.1.1 Google Trends 10

1.3.1.2 Google Trends and Stock Prices 11

1.3.1.3 The quantmod Package for Financial Analysis 11

1.3.1.4 Google Trends in R 13

1.3.1.5 Matching Data from quantmod and Google Trends 14

1.3.2 Big Data and Deep Learning in Insurance 18

1.3.3 Big Data and Deep Learning in Industry 18

1.3.4 Big Data and Deep Learning in Scientific Research and Education 20

1.3.4.1 Big Data in Physics and Astrophysics 20

1.3.4.2 Big Data in Climatology and Earth Sciences 21

1.3.4.3 Big Data in Education 21

1.4 Big Data and Deep Learning for Individuals 21

1.4.1 Big Data and Deep Learning in Healthcare 21

1.4.1.1 Connected Health and Telemedicine 21

1.4.1.2 Geolocation and Health 22

1.4.1.3 The Google Flu Trends 23

1.4.1.4 Research in Health and Medicine 26

1.4.2 Big Data and Deep Learning for Drivers 28

1.4.3 Big Data and Deep Learning for Citizens 29

1.4.4 Big Data and Deep Learning in the Police 30

1.5 Risks in Data Processing 32

1.5.1 Insufficient Quantity of Training Data 32

1.5.2 Poor Data Quality 32

1.5.3 Non-Representative Samples 33

1.5.4 Missing Values in the Data 33

1.5.5 Spurious Correlations 34

1.5.6 Overfitting 35

1.5.7 Lack of Explainability of Models 35

1.6 Protection of Personal Data 36

1.6.1 The Need for Data Protection 36

1.6.2 Data Anonymization 38

1.6.3 The General Data Protection Regulation 41

1.7 Open Data 43

Notes 44

2 Processing of Large Volumes of Data 49

2.1 Issues 49

2.2 The Search for a Parsimonious Model 50

2.3 Algorithmic Complexity 51

2.4 Parallel Computing 51

2.5 Distributed Computing 52

2.5.1 MapReduce 53

2.5.2 Hadoop 54

2.5.3 Computing Tools for Distributed Computing 55

2.5.4 Column-Oriented Databases 56

2.5.5 Distributed Architecture and "Analytics" 57

2.5.6 Spark 58

2.6 Computer Resources 60

2.6.1 Minimum Resources 60

2.6.2 Graphics Processing Units (GPU) and Tensor Processing Units (TPU) 61

2.6.3 Solutions in the Cloud 62

2.7 R and Python Software 62

2.8 Quantum Computing 67

Notes 68

3 Reminders of Machine Learning 71

3.1 General 71

3.2 The Optimization Algorithms 74

3.3 Complexity Reduction and Penalized Regression 85

3.4 Ensemble Methods 89

3.4.1 Bagging 89

3.4.2 Random Forests 89

3.4.3 Extra-Trees 91

3.4.4 Boosting 92

3.4.5 Gradient Boosting Methods 97

3.4.6 Synthesis of the Ensemble Methods 100

3.5 Support Vector Machines 100

3.6 Recommendation Systems 105

Notes 108

4 Natural Language Processing 111

4.1 From Lexical Statistics to Natural Language Processing 111

4.2 Uses of Text Mining and Natural Language Processing 113

4.3 The Operations of Textual Analysis 114

4.3.1 Textual Data Collection 115

4.3.2 Identification of the Language 115

4.3.3 Tokenization 116

4.3.4 Part-of-Speech Tagging 117

4.3.5 Named Entity Recognition 119

4.3.6 Coreference Resolution 124

4.3.7 Lemmatization 124

4.3.8 Stemming 129

4.3.9 Simplifications 129

4.3.10 Removal of StopWords 130

4.4 Vector Representation andWord Embedding 132

4.4.1 Vector Representation 132

4.4.2 Analysis on the Document-Term Matrix 133

4.4.3 TF-IDF Weighting 142

4.4.4 Latent Semantic Analysis 144

4.4.5 Latent Dirichlet Allocation 152

4.4.6 Word Frequency Analysis 160

4.4.7 Word2Vec Embedding 162

4.4.8 GloVe Embedding 174

4.4.9 FastText Embedding 176

4.5 Sentiment Analysis 180

Notes 184

5 Social Network Analysis 187

5.1 Social Networks 187

5.2 Characteristics of Graphs 188

5.3 Characterization of Social Networks 189

5.4 Measures of Influence in a Graph 190

5.5 Graphs with R 191

5.6 Community Detection 200

5.6.1 The Modularity of a Graph 201

5.6.2 Community Detection by Divisive Hierarchical Clustering 202

5.6.3 Community Detection by Agglomerative Hierarchical Clustering 203

5.6.4 Other Methods 204

5.6.5 Community Detection with R 205

5.7 Research and Analysis on Social Networks 208

5.8 The Business Model of Social Networks 209

5.9 Digital Advertising 211

5.10 Social Network Analysis with R 212

5.10.1 Collecting Tweets 213

5.10.2 Formatting the Corpus 215

5.10.3 Stemming and Lemmatization 216

5.10.4 Example 217

5.10.5 Clustering of Terms and Documents 225

5.10.6 Opinion Scoring 230

5.10.7 Graph of Terms with Their Connotation 231

Notes 234

6 Handwriting Recognition 237

6.1 Data 237

6.2 Issues 238

6.3 Data Processing 238

6.4 Linear and Quadratic Discriminant Analysis 243

6.5 Multinomial Logistic Regression 245

6.6 Random Forests 246

6.7 Extra-Trees 247

6.8 Gradient Boosting 249

6.9 Support Vector Machines 253

6.10 Single Hidden Layer Perceptron 258

6.11 H2O Neural Network 262

6.12 Synthesis of "Classical" Methods 267

Notes 268

7 Deep Learning 269

7.1 The Principles of Deep Learning 269

7.2 Overview of Deep Neural Networks 272

7.3 Recall on Neural Networks and Their Training 274

7.4 Difficulties of Gradient Backpropagation 284

7.5 The Structure of a Convolutional Neural Network 286

7.6 The Convolution Mechanism 288

7.7 The Convolution Parameters 290

7.8 Batch Normalization 292

7.9 Pooling 293

7.10 Dilated Convolution 295

7.11 Dropout and DropConnect 295

7.12 The Architecture of a Convolutional Neural Network 297

7.13 Principles of Deep Network Learning for Computer Vision 299

7.14 Adaptive Learning Algorithms 301

7.15 Progress in Image Recognition 304

7.16 Recurrent Neural Networks 312

7.17 Capsule Networks 317

7.18 Autoencoders 318

7.19 Generative Models 322

7.19.1 Generative Adversarial Networks 323

7.19.2 Variational Autoencoders 324

7.20 Other Applications of Deep Learning 326

7.20.1 Object Detection 326

7.20.2 Autonomous Vehicles 333

7.20.3 Analysis of Brain Activity 334

7.20.4 Analysis of the Style of a PictorialWork 336

7.20.5 Go and Chess Games 338

7.20.6 Other Games 340

Notes 341

8 Deep Learning for Computer Vision 347

8.1 Deep Learning Libraries 347

8.2 MXNet 349

8.2.1 General Information about MXNet 349

8.2.2 Creating a Convolutional Network with MXNet 350

8.2.3 Model Management with MXNet 361

8.2.4 CIFAR-10 Image Recognition with MXNet 362

8.3 Keras and TensorFlow 367

8.3.1 General Information about Keras 370

8.3.2 Application of Keras to the MNIST Database 371

8.3.3 Application of Pre-Trained Models 375

8.3.4 Explain the Prediction of a Computer Vision Model 379

8.3.5 Application of Keras to CIFAR-10 Images 382

8.3.6 Classifying Cats and Dogs 393

8.4 Configuring a Machine's GPU for Deep Learning 409

8.4.1 Checking the Compatibility of the Graphics Card 410

8.4.2 NVIDIA Driver Installation 410

8.4.3 Installation of Microsoft Visual Studio 411

8.4.4 NVIDIA CUDA To34olkit Installation 411

8.4.5 Installation of cuDNN 412

8.5 Computing in the Cloud 412

8.6 PyTorch 419

8.6.1 The Python PyTorch Package 419

8.6.2 The R torch Package 425

Notes 431

9 Deep Learning for Natural Language Processing 433

9.1 Neural Network Methods for Text Analysis 433

9.2 Text Generation Using a Recurrent Neural Network LSTM 434

9.3 Text Classification Using a LSTM or GRU Recurrent Neural Network 440

9.4 Text Classification Using a H2O Model 452

9.5 Application of Convolutional Neural Networks 456

9.6 Spam Detection Using a Recurrent Neural Network LSTM 460

9.7 Transformer Models, BERT, and Its Successors 461

Notes 479

10 Artificial Intelligence 481

10.1 The Beginnings of Artificial Intelligence 481

10.2 Human Intelligence and Artificial Intelligence 486

10.3 The Different Forms of Artificial Intelligence 488

10.4 Ethical and Societal Issues of Artificial Intelligence 493

10.5 Fears and Hopes of Artificial Intelligence 496

10.6 Some Dates of Artificial Intelligence 499

Notes 502

Conclusion 505

Note 506

Annotated Bibliography 507

On Big Data and High Dimensional Statistics 507

On Deep Learning 509

On Artificial Intelligence 511

On the Use of R and Python in Data Science and on Big Data 512

Index 515

Introduction

This book is dedicated to deep learning, which is a recent branch of a slightly older discipline: machine learning.1 Deep learning is particularly well suited to the analysis of complex data, such as images and natural language. For this reason, it is at the heart of many of the artificial intelligence applications that we will describe in this book. Although deep learning today relies almost exclusively on neural networks, we will first look at other machine learning methods, partly because of the concepts they share with neural networks and which it is important to understand in their generality, and partly to compare their results with those of deep learning methods. We will then be able to fully measure the effectiveness of deep learning methods in computer vision and automatic natural language processing problems. This is what the present book will do, recalling the theoretical foundations of these methods while showing how to implement them in concrete situations, with examples treated with the open source deep learning libraries of Python and mainly R, as indicated below. As we will see, the prodigious development of deep learning and artificial intelligence has been made possible by new theoretical concepts, by more powerful computing tools, but also by the possibility of using immense masses of various data, images, videos, audios, texts, traces on the Internet, signals from connected objects ... these big data will be very present in this book.

The Structure of the Book

Chapter 1 is an overview of deep learning and big data with their principles and applications in the main sectors of finance, insurance, industry, transport, medicine, and scientific research. A few pages are devoted to the main difficulties that can be encountered in processing data in machine learning and deep learning, particularly when it comes to big data. We must not neglect the IT risks inherent in the collection and storage, sometimes in a cloud, of large amounts of personal data. The news about certain social networks regularly reminds us of this. At the opposite end of the spectrum from their commercial vision of big data are open data, which closes the chapter.

Chapter 2 deals with concepts that data scientists must know when dealing with large volumes of data: parsimony in modeling, algorithmic complexity, parallel computing and its generalization, which is distributed computing. We devote a few pages to the MapReduce algorithm at the basis of distributed computing, its implementation in the Hadoop system, and to the database management systems, known as NoSQL and column-oriented, particularly adapted to big data. We will see that "analytical" applications such as machine learning have particular computing requirements that require specific solutions: Spark is one of them. We then review the hardware and software resources to be implemented, whether they are on the user's machine or in a cloud. We talk about the processors that enable deep learning computations to be accelerated, as well as the two most used open source software in statistics, machine learning, and deep learning: R and Python. A synoptic table compares the main machine learning methods implemented in R, Python (scikit-learn library) and Spark (MLlib). We also found it interesting to mention quantum computing, for which specific versions of algorithms are starting to be designed, notably in linear algebra, machine learning, optimization, and cryptography. The prospects of quantum computing are still distant but very promising, with the possibility of a considerable reduction in computing time.

Chapter 3 recalls some essential principles of machine learning and data science: the bias-variance dilemma in modeling, complexity reduction methods, optimization algorithms, such as gradient descent, Newton or Levenberg-Marquardt, ensemble (or aggregation) methods by random forests, Extra-Trees or boosting, and useful methods for big data, such as incremental algorithms and recommendation systems used by social networks and online commerce. Apart from these reminders, it is assumed that the reader is familiar with machine learning methods but, if required, a bibliography is given at the end of the book and notes are provided in each chapter for specific references.

Chapter 4 presents natural language processing methods. The principles of textual analysis are introduced, including segmentation into units or tokenization, part-of-speech tagging, named entity recognition, lemmatization, and other simplification operations that aim to reduce the volume of data and the complexity of the problem as much as possible while retaining the maximum amount of information, which is a constant concern in statistics and machine learning. We then describe the operations of vector representation of words, which go from the classical document-term matrix to the methods of word embedding, which started with Word2Vec, GloVe, and fastText, and the list of these is continuously growing. We speak of embedding because each word is associated with a point in a vector space of fairly small dimensions, of the order of a few hundred, i.e. much less than different terms, with the remarkable property that two semantically close words correspond to close points in the vector space, and that arithmetic operations in this vector space can lead to identities such as "King" - "Man" + "Woman" = "Queen". These vector embeddings preserve not only the proximity of words but also their relations. They are therefore an efficient way to transform documents for analysis, for example, to classify them into categories: spam or non-spam, type of message, subject of the complaint, etc. We also discuss topic modeling, which uses methods such as latent Dirichlet allocation to detect all the topics present in a corpus of documents. We present another current method of natural language processing, sentiment analysis, which seeks to detect the sentiments expressed in a text, either in a binary form of positive or negative sentiments, or in a more elaborate form of joy, fear, anger, etc. Neural methods applied to natural language processing are discussed in Chapter 9, after the one devoted to the principles of deep learning.

Chapter 5 shows how to analyze social networks, starting from the notions of graph theory and taking the example of Twitter. We are particularly interested in the so-called centrality and influence measures, as they are very important in social networks and web search engines. We are also interested in the detection of communities, which are the dense sub-graphs that can constitute a partition of the studied graph. The search for communities in a graph is an active field of research, in various domains (biology, sociology, marketing), because the vertices of a same community tend to have in common interesting properties. Some considerations are turned to the economic model of social networks and to digital advertising and what is called programmatic advertising.

Chapter 6 deals with the classical problem of recognizing handwritten digits on bank checks and postal codes on envelopes, among others. On a well-known dataset (MNIST), it compares the different machine learning methods previously discussed in the book: in particular penalized regression, random forests, gradient boosting, and support vector machines.

Chapter 7 is a long and important chapter on deep learning. It explains the principles of deep learning and the architecture of deep neural networks, especially convolutional and recurrent networks, which are those most widely used for computer vision and natural language processing today. The many features designed to optimize their performance are presented, such as pooling, normalization, dropout, and adaptive learning, with indications on how best to use them. We review the fundamental learning mechanism of neural networks, backpropagation, the difficulties encountered in its application to multilayer networks with the vanishing gradient phenomenon that led for a while to the "winter of artificial intelligence," and the solutions found in the last ten years by new ideas and increased computing power. Particular networks are described: autoencoders for data compression, and generative neural networks that are increasingly being developed to have artificial intelligence produce texts, images or music. Illustrations show the interest of deep learning for subjects ranging from object detection to strategy games.

Chapter 8 presents the application to computer vision of the methods seen in Chapter 7, using MXNet, Keras-TensorFlow, and PyTorch libraries. In particular, they are applied to three classical datasets: (1) the MNIST database already discussed in Chapter 6, which allows the performances of classical and deep learning methods to be compared; (2) the CIFAR-10 image database; and (3) a database of cat and dog pictures. We apply transfer learning. We sketch the question of the explicability of machine learning algorithms by applying the LIME method to images to find out on which parts of the image the model relies for its predictions. We show how to configure a computer with a Windows operating system to use its graphics processing unit (GPU) for deep learning computations, which are much faster on these graphics processors than on classical processors (CPUs). This configuration is not very simple and it is necessary to follow the different steps indicated. The chapter concludes with examples of cloud computing, using the Google Colab platform with a Jupyter notebook running Python code.

Chapter 9 returns to natural language...

Systemvoraussetzungen

Als PDF speichern Als Link merken