
Mathematics of Deep Learning
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
This course aims at providing a mathematical perspective to some key elements of the so-called deep neural networks (DNNs). Much of the interest on deep learning has focused on the implementation of DNN-based algorithms. Our hope is that this compact textbook will offer a complementary point of view that emphasizes the underlying mathematical ideas. We believe that a more foundational perspective will help to answer important questions that have only received empirical answers so far.
Our goal is to introduce basic concepts from deep learning in a rigorous mathematical fashion, e.g. introduce mathematical definitions of deep neural networks (DNNs), loss functions, the backpropagation algorithm, etc.
We attempt to identify for each concept the simplest setting that minimizes technicalities but still contains the key mathematics.
The book focuses on deep learning techniques and introduces them almost immediately. Other techniques such as regression and SVM are briefly introduced and used as a steppingstone for explaining basic ideas of deep learning.
Throughout these notes, the rigorous definitions and statements are supplemented by heuristic explanations and figures. The book is organized so that each chapter introduces a key concept. When teaching this course, some chapters could be presented as a part of a single lecture whereas the others have more material and would take several lectures.
More details
Other editions
Additional editions


Previous edition

Persons
Leonid Berland received his Ph. D. in 1985 from Kharkiv University (Ukraine). He joined the Pennsylvania State University (PSU) in 1991, and he is currently a Professor of Mathematics and a member of the Materials Research Institute at PSU. He is a founding co-director of PSU Centers for Interdisciplinary Mathematics and for Mathematics of Living and Mimetic Matter. He is known for his works at the interface between mathematics and other disciplines such as physics, materials sciences, life sciences, and most recently, computer science. He co-authored three books and more than 100 publications. His interdisciplinary works received research awards from leading research agencies in the USA, such as NSF, the US Department of Energy, and the National Institute of Health as well as internationally (Bi-National Science Foundation and NATO). Most recently his work was recognized with the Humboldt Research Award of 2021. His teaching excellence was recognized by C.I. Noll Award for Excellence in Teaching by Eberly College of Science at Penn State.
Pierre-Emmanuel Jabin is currently a distinguished professor at the Pennsylvania State University since August 2020. He was a student of École Normale Supérieure from 1995 to 1999; he earned his Ph.D. in 2000 and his HRD in 2003 both at Université Pierre et Marie Curie (Paris VI). He was more recently a professor at the University of Maryland from 2011 to 2020, where he was also director of the Center for Scientific Computation and Mathematical Modeling from 2016 to 2020. Jabin's work in applied mathematics is internationally recognized and he has made seminal contributions to the theory and applications of many-particle/multi-agent systems together with advection and transport phenomena. Jabin was an invited speaker at the International Congress of Mathematicians in Rio de Janeiro in 2018.
Content
- Intro
- Contents
- 1 About this book
- 2 Introduction to machine learning: what and why?
- 2.1 Some motivation
- 2.2 What is machine learning?
- 3 Classification problem
- 4 The fundamentals of artificial neural networks (ANNs)
- 4.1 Basic definitions
- 4.2 ANN classifiers and the softmax function
- 4.3 The universal approximation theorem
- 4.4 Why is non-linearity in ANNs necessary?
- 4.4.1 0+0=8?
- 4.4.2 Non-linear activation functions are necessary in ANNs
- 4.5 Why do we need biases?
- 4.6 Exercises
- 5 Supervised, unsupervised, and semi-supervised learning
- 5.1 Basic definitions
- 5.2 Example of unsupervised learning: detecting bank fraud
- 5.3 Exercises
- 6 The regression problem
- 6.1 What is regression? How does it relate to ANNs?
- 6.2 Example: linear regression in dimension 1
- 6.3 Logistic regression as a single neuron ANN
- 6.3.1 1D example: studying for an exam
- 6.3.2 2D example of admittance to graduate school: separation of sets and decision boundary
- 6.3.3 Relation between ANNs and regression
- 6.3.4 Logistic regression vs. networks with many layers
- 6.4 Exercises
- 7 Support vector machine
- 7.1 Preliminaries: convex sets and their separation, geometric Hahn-Banach theorem
- 7.2 Support vector machine
- 7.3 Hard-margin SVM classifiers and support vectors
- 7.4 Soft margin SVM classifier
- 7.5 Exercises
- 8 Kernel methods
- 8.1 Kernels: what/why?
- 8.1.1 Recalling the basic principles of linear regression
- 8.1.2 How a linear kernel arises in linear regression
- 8.1.3 Linear regression in general dimension
- 8.1.4 Ridge regression
- 8.1.5 Solving the ridge regression by the dual problem
- 8.1.6 How linear kernel arises in ridge regression
- 8.1.7 Feature space and feature map in classification and regression: examples
- 8.2 Kernel: definitions and basic properties
- 8.2.1 Definitions
- 8.2.2 The ``kernel trick'': connecting feature maps and kernels
- 8.2.3 Choosing the right kernel: the example of the Gaussian kernel
- 8.3 Exercises
- 9 Gradient descent method in the training of DNNs
- 9.1 Deterministic gradient descent for the minimization of multivariable functions
- 9.2 Additive loss functions
- 9.3 What are SGD algorithms? When to use them?
- 9.4 Epochs in SGD
- 9.5 Weights
- 9.6 Choosing the batch size through a numerical example
- 9.7 Exercises
- 10 Backpropagation
- 10.1 Computational complexity
- 10.2 Chain rule review
- 10.3 Diagrammatic representation of the chain rule in simple examples
- 10.4 The case of a simple DNN with one neuron per layer
- 10.5 Backpropagation algorithm for general DNNs
- 10.6 Exercises
- 11 Convolutional neural networks (CNNs)
- 11.1 Convolution
- 11.1.1 Convolution of functions
- 11.1.2 Convolution of matrices
- 11.1.3 Hadamard product and feature detection
- 11.2 Convolutional layers
- 11.3 Padding layer
- 11.4 Pooling layer
- 11.5 Building CNNs
- 11.6 Equivariance and invariance
- 11.7 Summary of CNNs
- 11.8 Exercises
- A Review of the chain rule
- Bibliography
- Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.