15 Math Concepts Every Data Scientist Should Know

Name: 15 Math Concepts Every Data Scientist Should Know | Understand and learn how to apply the math behind data science algorithms
Brand: Packt Publishing Limited
Availability: OnlineOnly

Understand and learn how to apply the math behind data science algorithms

David Hoyle(Author)

Packt Publishing Limited

1st Edition

Published on 13. January 2025

510 pages

E-Book

ePUB with Adobe-DRM

System requirements

E-Book

ePUB without DRM

System requirements

978-1-83763-194-0 (ISBN)

from €26.99

Available for download

Watchlist: see prices

Description

All prices

More details

Other editions

Person

Content

Cover
Title Page
Copyright
Dedication
Contributors
Table of Contents
Preface
Part 1: Essential Concepts
Chapter 1: Recap of Mathematical Notation and Terminology
Technical requirements
Number systems
Notation for numbers and fields
Complex numbers
What we learned
Linear algebra
Vectors
Matrices
What we learned
Sums, products, and logarithms
Sums and the ? ? notation
Products and the ? ? notation
Logarithms
What we learned
Differential and integral calculus
Differentiation
Finding maxima and minima
Integration
What we learned
Analysis
Limits
Order notation
Taylor series expansions
What we learned
Combinatorics
Binomial coefficients
What we learned
Summary
Notes and further reading
Chapter 2: Random Variables and Probability Distributions
Technical requirements
All data is random
A little example
Systematic variation can be learned - random variation can't
Random variation is not just measurement error
What are the consequences of data being random?
What we learned
Random variables and probability distributions
A new concept - random variables
Summarizing probability distributions
Continuous distributions
Transforming and combining random variables
Named distributions
What we learned
Sampling from distributions
How datasets relate to random variables and probability distributions
How big is the population from which a dataset is sampled?
How to sample
Generating your own random numbers code example
Sampling from numpy distributions code example
What we learned
Understanding statistical estimators
Consistency, bias, and efficiency
The empirical distribution function
What we learned
The Central Limit Theorem
Sums of random variables
CLT code example
CLT example with discrete variables
Computational estimation of a PDF from data
KDE code example
What we learned
Summary
Exercises
Chapter 3: Matrices and Linear Algebra
Technical requirements
Inner and outer products of vectors
Inner product of two vectors
Outer product of two vectors
What we learned
Matrices as transformations
Matrix multiplication
The identity matrix
The inverse matrix
More examples of matrices as transformations
Matrix transformation code example
What we learned
Matrix decompositions
Eigen-decompositions
Eigenvector and eigenvalues
Eigen-decomposition of a square matrix
Eigen-decomposition code example
Singular value decomposition
The SVD of a complex matrix
What we learned
Matrix properties
Trace
Determinant
What we learned
Matrix factorization and dimensionality reduction
Dimensionality reduction
Principal component analysis
Non-negative matrix factorization
What we learned
Summary
Exercises
Notes and further reading
Chapter 4: Loss Functions and Optimization
Technical requirements
Loss functions - what are they?
Risk functions
There are many loss functions
Different loss functions = different end results
Loss functions for anything
A loss function by any other name
What we learned
Least Squares
The squared-loss function
OLS regression
OLS, outliers, and robust regression
What we learned
Linear models
Practical issues
The model residuals
OLS regression code example
What we learned
Gradient descent
Locating the minimum of a simple risk function
Gradient descent code example
Gradient descent is a general technique
Beyond simple gradient descent
What we learned
Summary
Exercises
Chapter 5: Probabilistic Modeling
Technical requirements
Likelihood
A simple probabilistic model
Log likelihood
Maximum likelihood estimation
What we have learned
Bayes' theorem
Conditional probability and Bayes' theorem
Priors
The posterior
What we have learned
Bayesian modeling
Bayesian model averaging
MAP estimation
As ?N? becomes large the prior becomes irrelevant
Least squares as an approximation to Bayesian modeling
What we have learned
Bayesian modeling in practice
Analytic approximation of the posterior
Computational sampling
MCMC code example
Probabilistic programming languages
What we have learned
Summary
Exercises
Part 2: Intermediate Concepts
Chapter 6: Time Series and Forecasting
Technical requirements
What is time series data?
What does auto-correlation mean for modeling time series data?
The auto-correlation function (ACF)
The partial auto-correlation function (PACF)
Other data science implications of time series data
What we have learned
ARIMA models
Integrated
Auto-regression
Moving average
Combining the AR(p), I(d), and MA(q) into an ARIMA model
Variants of ARIMA modeling
What we have learned
ARIMA modeling in practice
Unit root testing
Interpreting ACF and PACF plots
auto.arima
What we have learned
Machine learning approaches to time series analysis
Routine application of machine learning to time series analysis
Deep learning approaches to time series analysis
AutoML approaches to time series analysis
What we have learned
Summary
Exercises
Notes and further reading
Chapter 7: Hypothesis Testing
Technical requirements
What is a hypothesis test?
Example
The general form of a hypothesis test
The p-value
The effect of increasing sample size
The effect of decreasing noise
One-tailed and two-tailed tests
Using samples variances in the test statistic - the t-test
Computationally intensive methods for p-value estimation
Parametric versus non-parametric hypothesis tests
What we learned
Confidence intervals
What does a confidence interval really represent?
Confidence intervals for any parameter
A confidence interval code example
What we learned
Type I and Type II errors, and power
What we learned
Summary
Exercises
Notes and further reading
Chapter 8: Model Complexity
Technical requirements
Generalization, overfitting, and the role of model complexity
Overfitting
Why overfitting is bad
Overfitting increases the variability of predictions
Underfitting is also a problem
Measuring prediction error
What we learned
The bias-variance trade-off
Proof of the bias-variance trade-off formula
Double descent - a modern twist on the generalization error diagram
What we learned
Model complexity measures for model selection
Selecting between classes of models
Akaike Information Criterion
Bayesian Information Criterion
What we learned
Summary
Notes and further reading
Chapter 9: Function Decomposition
Technical requirements
Why do we want to decompose a function?
What is a decomposition of a function?
Example 1 - decomposing a one-dimensional function into symmetric and anti-symmetric parts
Example 2 - decomposing a time series into its seasonal and non-seasonal components
What we've learned
Expanding a function in terms of basis functions
What we've learned
Fourier series
What we've learned
Fourier transforms
The multi-dimensional Fourier transform
What we've learned
The discrete Fourier transform
DFT code example
Uses of the DFT
What is the difference between the DFT, Fourier series, and the Fourier transform?
What we've learned
Summary
Exercises
Chapter 10: Network Analysis
Technical requirements
Graphs and network data
Network data is about relationships
Example 1 - substituting goods in a supermarket
Example 2 - international trade
What is a graph?
What we've learned
Basic characteristics of graphs
Undirected and directed edges
The adjacency matrix
In-degree and out-degree
Centrality
What we've learned
Different types of graphs
Fully connected graphs
Disconnected graphs
Directed acyclic graphs
Small-world networks
Scale-free networks
What we've learned
Community detection and decomposing graphs
What is a community?
How to do community detection
Community detection algorithms
Community detection code example
What we've learned
Summary
Exercises
Notes and further reading
Part 3: Selected Advanced Concepts
Chapter 11: Dynamical Systems
Technical requirements
What is a dynamical system and what is an evolution equation?
Time can be discrete or continuous
Time does not have to mean chronological time
Evolution equations
What we learned
First-order discrete Markov processes
Variations of first-order Markov processes
A Markov process is a probabilistic model
The transition probability matrix
Properties of the transition probability matrix
Epidemic modeling with a first-order discrete Markov process
The transition probability matrix is a network
Using the transition matrix to generate state trajectories
Evolution of the state probability distribution
Stationary distributions and limiting distributions
First-order discrete Markov processes are memoryless
Likelihood of the state sequence
What we learned
Higher-order discrete Markov processes
Second-order discrete Markov processes
Evolution of the state probability distribution in higher-order models
A higher-order discrete Markov process is a first-order discrete Markov process in disguise
Higher-order discrete Markov processes are still memoryless
What we learned
Hidden Markov Models
Emission probabilities
Making inferences with an HMM
What we learned
Summary
Exercises
Notes and further reading
Chapter 12: Kernel Methods
Technical requirements
The role of inner products in common learning algorithms
Sometimes we need new features in our inner products
What we learned
The kernel trick
What is a kernel?
Commonly used kernels
Kernel functions for other mathematical objects
Combining kernels
Positive semi-definite kernels
Mercer's theorem and the kernel trick
Kernelized algorithms
What we learned
An example of a kernelized learning algorithm
kFDA code example
What we learned
Summary
Exercises
Chapter 13: Information Theory
Technical requirements
What is information and why is it useful?
The concept of information
The mathematical definition of information
Information theory applies to continuous distributions as well
Why we measure information on a logarithmic scale
Why is quantifying information useful?
What we've learned
Entropy as expected information
Entropy
What we've learned
Mutual information
Conditional entropy
Mutual information for continuous variables
Mutual information as a measure of correlation
Mutual information code example
What we've learned
The Kullback-Leibler divergence
Relative entropy
KL-divergence for continuous variables
Using the KL-divergence for approximation
Variational inference
What we've learned
Summary
Exercises
Notes and further reading
Chapter 14: Non-Parametric Bayesian Methods
Technical requirements
What are non-parametric Bayesian methods?
We still have parameters
The different types of non-parametric Bayesian methods
The pros and cons of non-parametric Bayesian methods
What we learned
Gaussian processes
The kernel function
Fitting GPR models
Prediction using GPR models
GPR code example
What we learned
Dirichlet processes
How do DPs differ from GPs?
The DP notation
Sampling a function from a DP
Generating a sample of data from a DP
Bayesian non-parametric inference using a DP
What we learned
Summary
Exercises
Chapter 15: Random Matrices
Technical requirements
What is a random matrix?
What we learned
Using random matrices to represent interactions in large-scale systems
What we learned
Universal behavior of large random matrices
The Wigner semicircle law
What does RMT study?
Universal is universal
The classical Gaussian matrix ensembles
What we learned
Random matrices and high-dimensional covariance matrices
The Marcenko-Pastur distribution is a bulk distribution
Universality in the singular values of ???X _? _??
The Marcenko-Pastur distribution and neural networks
What we learned
Summary
Exercises
Notes and further reading
Index
About PACKT
Other Books You May Enjoy

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

15 Math Concepts Every Data Scientist Should Know

Description

All prices

More details

Other editions

Additional editions

Person

Content

System requirements