
15 Math Concepts Every Data Scientist Should Know
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
All prices
More details
Other editions
Additional editions

Person
David Hoyle has over 30 years' experience in machine learning, statistics, and mathematical modeling. He gained a BSc. degree in mathematics and physics and a Ph.D. in theoretical physics from the University of Bristol. He did research at the University of Cambridge and led his own research groups as an Associate Professor at the University of Exeter and the University of Manchester. Previously, he worked for Lloyds Banking Group - one of the UK's largest retail banks, and as joint Head of Data Science for AutoTrader UK. He now works for the global customer data science company dunnhumby, building statistical and machine learning models for the world's largest retailers, including Tesco UK and Walmart. He lives and works in Manchester, UK.
Content
- Cover
- Title Page
- Copyright
- Dedication
- Contributors
- Table of Contents
- Preface
- Part 1: Essential Concepts
- Chapter 1: Recap of Mathematical Notation and Terminology
- Technical requirements
- Number systems
- Notation for numbers and fields
- Complex numbers
- What we learned
- Linear algebra
- Vectors
- Matrices
- What we learned
- Sums, products, and logarithms
- Sums and the ? ? notation
- Products and the ? ? notation
- Logarithms
- What we learned
- Differential and integral calculus
- Differentiation
- Finding maxima and minima
- Integration
- What we learned
- Analysis
- Limits
- Order notation
- Taylor series expansions
- What we learned
- Combinatorics
- Binomial coefficients
- What we learned
- Summary
- Notes and further reading
- Chapter 2: Random Variables and Probability Distributions
- Technical requirements
- All data is random
- A little example
- Systematic variation can be learned - random variation can't
- Random variation is not just measurement error
- What are the consequences of data being random?
- What we learned
- Random variables and probability distributions
- A new concept - random variables
- Summarizing probability distributions
- Continuous distributions
- Transforming and combining random variables
- Named distributions
- What we learned
- Sampling from distributions
- How datasets relate to random variables and probability distributions
- How big is the population from which a dataset is sampled?
- How to sample
- Generating your own random numbers code example
- Sampling from numpy distributions code example
- What we learned
- Understanding statistical estimators
- Consistency, bias, and efficiency
- The empirical distribution function
- What we learned
- The Central Limit Theorem
- Sums of random variables
- CLT code example
- CLT example with discrete variables
- Computational estimation of a PDF from data
- KDE code example
- What we learned
- Summary
- Exercises
- Chapter 3: Matrices and Linear Algebra
- Technical requirements
- Inner and outer products of vectors
- Inner product of two vectors
- Outer product of two vectors
- What we learned
- Matrices as transformations
- Matrix multiplication
- The identity matrix
- The inverse matrix
- More examples of matrices as transformations
- Matrix transformation code example
- What we learned
- Matrix decompositions
- Eigen-decompositions
- Eigenvector and eigenvalues
- Eigen-decomposition of a square matrix
- Eigen-decomposition code example
- Singular value decomposition
- The SVD of a complex matrix
- What we learned
- Matrix properties
- Trace
- Determinant
- What we learned
- Matrix factorization and dimensionality reduction
- Dimensionality reduction
- Principal component analysis
- Non-negative matrix factorization
- What we learned
- Summary
- Exercises
- Notes and further reading
- Chapter 4: Loss Functions and Optimization
- Technical requirements
- Loss functions - what are they?
- Risk functions
- There are many loss functions
- Different loss functions = different end results
- Loss functions for anything
- A loss function by any other name
- What we learned
- Least Squares
- The squared-loss function
- OLS regression
- OLS, outliers, and robust regression
- What we learned
- Linear models
- Practical issues
- The model residuals
- OLS regression code example
- What we learned
- Gradient descent
- Locating the minimum of a simple risk function
- Gradient descent code example
- Gradient descent is a general technique
- Beyond simple gradient descent
- What we learned
- Summary
- Exercises
- Chapter 5: Probabilistic Modeling
- Technical requirements
- Likelihood
- A simple probabilistic model
- Log likelihood
- Maximum likelihood estimation
- What we have learned
- Bayes' theorem
- Conditional probability and Bayes' theorem
- Priors
- The posterior
- What we have learned
- Bayesian modeling
- Bayesian model averaging
- MAP estimation
- As ?N? becomes large the prior becomes irrelevant
- Least squares as an approximation to Bayesian modeling
- What we have learned
- Bayesian modeling in practice
- Analytic approximation of the posterior
- Computational sampling
- MCMC code example
- Probabilistic programming languages
- What we have learned
- Summary
- Exercises
- Part 2: Intermediate Concepts
- Chapter 6: Time Series and Forecasting
- Technical requirements
- What is time series data?
- What does auto-correlation mean for modeling time series data?
- The auto-correlation function (ACF)
- The partial auto-correlation function (PACF)
- Other data science implications of time series data
- What we have learned
- ARIMA models
- Integrated
- Auto-regression
- Moving average
- Combining the AR(p), I(d), and MA(q) into an ARIMA model
- Variants of ARIMA modeling
- What we have learned
- ARIMA modeling in practice
- Unit root testing
- Interpreting ACF and PACF plots
- auto.arima
- What we have learned
- Machine learning approaches to time series analysis
- Routine application of machine learning to time series analysis
- Deep learning approaches to time series analysis
- AutoML approaches to time series analysis
- What we have learned
- Summary
- Exercises
- Notes and further reading
- Chapter 7: Hypothesis Testing
- Technical requirements
- What is a hypothesis test?
- Example
- The general form of a hypothesis test
- The p-value
- The effect of increasing sample size
- The effect of decreasing noise
- One-tailed and two-tailed tests
- Using samples variances in the test statistic - the t-test
- Computationally intensive methods for p-value estimation
- Parametric versus non-parametric hypothesis tests
- What we learned
- Confidence intervals
- What does a confidence interval really represent?
- Confidence intervals for any parameter
- A confidence interval code example
- What we learned
- Type I and Type II errors, and power
- What we learned
- Summary
- Exercises
- Notes and further reading
- Chapter 8: Model Complexity
- Technical requirements
- Generalization, overfitting, and the role of model complexity
- Overfitting
- Why overfitting is bad
- Overfitting increases the variability of predictions
- Underfitting is also a problem
- Measuring prediction error
- What we learned
- The bias-variance trade-off
- Proof of the bias-variance trade-off formula
- Double descent - a modern twist on the generalization error diagram
- What we learned
- Model complexity measures for model selection
- Selecting between classes of models
- Akaike Information Criterion
- Bayesian Information Criterion
- What we learned
- Summary
- Notes and further reading
- Chapter 9: Function Decomposition
- Technical requirements
- Why do we want to decompose a function?
- What is a decomposition of a function?
- Example 1 - decomposing a one-dimensional function into symmetric and anti-symmetric parts
- Example 2 - decomposing a time series into its seasonal and non-seasonal components
- What we've learned
- Expanding a function in terms of basis functions
- What we've learned
- Fourier series
- What we've learned
- Fourier transforms
- The multi-dimensional Fourier transform
- What we've learned
- The discrete Fourier transform
- DFT code example
- Uses of the DFT
- What is the difference between the DFT, Fourier series, and the Fourier transform?
- What we've learned
- Summary
- Exercises
- Chapter 10: Network Analysis
- Technical requirements
- Graphs and network data
- Network data is about relationships
- Example 1 - substituting goods in a supermarket
- Example 2 - international trade
- What is a graph?
- What we've learned
- Basic characteristics of graphs
- Undirected and directed edges
- The adjacency matrix
- In-degree and out-degree
- Centrality
- What we've learned
- Different types of graphs
- Fully connected graphs
- Disconnected graphs
- Directed acyclic graphs
- Small-world networks
- Scale-free networks
- What we've learned
- Community detection and decomposing graphs
- What is a community?
- How to do community detection
- Community detection algorithms
- Community detection code example
- What we've learned
- Summary
- Exercises
- Notes and further reading
- Part 3: Selected Advanced Concepts
- Chapter 11: Dynamical Systems
- Technical requirements
- What is a dynamical system and what is an evolution equation?
- Time can be discrete or continuous
- Time does not have to mean chronological time
- Evolution equations
- What we learned
- First-order discrete Markov processes
- Variations of first-order Markov processes
- A Markov process is a probabilistic model
- The transition probability matrix
- Properties of the transition probability matrix
- Epidemic modeling with a first-order discrete Markov process
- The transition probability matrix is a network
- Using the transition matrix to generate state trajectories
- Evolution of the state probability distribution
- Stationary distributions and limiting distributions
- First-order discrete Markov processes are memoryless
- Likelihood of the state sequence
- What we learned
- Higher-order discrete Markov processes
- Second-order discrete Markov processes
- Evolution of the state probability distribution in higher-order models
- A higher-order discrete Markov process is a first-order discrete Markov process in disguise
- Higher-order discrete Markov processes are still memoryless
- What we learned
- Hidden Markov Models
- Emission probabilities
- Making inferences with an HMM
- What we learned
- Summary
- Exercises
- Notes and further reading
- Chapter 12: Kernel Methods
- Technical requirements
- The role of inner products in common learning algorithms
- Sometimes we need new features in our inner products
- What we learned
- The kernel trick
- What is a kernel?
- Commonly used kernels
- Kernel functions for other mathematical objects
- Combining kernels
- Positive semi-definite kernels
- Mercer's theorem and the kernel trick
- Kernelized algorithms
- What we learned
- An example of a kernelized learning algorithm
- kFDA code example
- What we learned
- Summary
- Exercises
- Chapter 13: Information Theory
- Technical requirements
- What is information and why is it useful?
- The concept of information
- The mathematical definition of information
- Information theory applies to continuous distributions as well
- Why we measure information on a logarithmic scale
- Why is quantifying information useful?
- What we've learned
- Entropy as expected information
- Entropy
- What we've learned
- Mutual information
- Conditional entropy
- Mutual information for continuous variables
- Mutual information as a measure of correlation
- Mutual information code example
- What we've learned
- The Kullback-Leibler divergence
- Relative entropy
- KL-divergence for continuous variables
- Using the KL-divergence for approximation
- Variational inference
- What we've learned
- Summary
- Exercises
- Notes and further reading
- Chapter 14: Non-Parametric Bayesian Methods
- Technical requirements
- What are non-parametric Bayesian methods?
- We still have parameters
- The different types of non-parametric Bayesian methods
- The pros and cons of non-parametric Bayesian methods
- What we learned
- Gaussian processes
- The kernel function
- Fitting GPR models
- Prediction using GPR models
- GPR code example
- What we learned
- Dirichlet processes
- How do DPs differ from GPs?
- The DP notation
- Sampling a function from a DP
- Generating a sample of data from a DP
- Bayesian non-parametric inference using a DP
- What we learned
- Summary
- Exercises
- Chapter 15: Random Matrices
- Technical requirements
- What is a random matrix?
- What we learned
- Using random matrices to represent interactions in large-scale systems
- What we learned
- Universal behavior of large random matrices
- The Wigner semicircle law
- What does RMT study?
- Universal is universal
- The classical Gaussian matrix ensembles
- What we learned
- Random matrices and high-dimensional covariance matrices
- The Marcenko-Pastur distribution is a bulk distribution
- Universality in the singular values of ???X _? _??
- The Marcenko-Pastur distribution and neural networks
- What we learned
- Summary
- Exercises
- Notes and further reading
- Index
- About PACKT
- Other Books You May Enjoy
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.
File format: ePUB
Copy protection: without DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use a reader that can handle the file format ePUB, such as Adobe Digital Editions or FBReader – both free (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePUB works well for novels and non-fiction books – i.e., 'flowing' text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook does not use copy protection or Digital Rights Management
For more information, see our eBook Help page.