Mastering Predictive Analytics with R, Second Edition

Name: Mastering Predictive Analytics with R, Second Edition | Machine learning techniques for advanced models
Brand: Packt Publishing Limited
Availability: OnlineOnly

Machine learning techniques for advanced models

James D. Miller(Author)

Packt Publishing Limited

2nd Edition

Published on 8. July 2025

448 pages

E-Book

ePUB with Adobe-DRM

System requirements

E-Book

PDF with Adobe-DRM

System requirements

978-1-78712-435-6 (ISBN)

from €45.59

Available for download

Watchlist: see prices

Description

All prices

More details

Person

Miller James D. :
James D. Miller is an IBM certified expert, Master Consultant, Application/System Architect with +35 years of applications & system design/development experience across multiple platforms, technologies and data formats, including Big Data. His experience includes IBM Planning Analytics, BI, Web architecture & design, systems analysis, GUI design & testing, Data modeling, design, and development of OLAP, Client/Server, Web & Mainframe applications and systems utilizing: Planning Analytics Workspace (PAW), IBM Watson Analytics, Cognos BI & TM1, Framework Manager, dynaSight/ArcPlan, ASP, DHTML, XML, MS Visual Basic, VBA, PERL, R, SPLUNK, MS SQL Server, ORACLE, etc. He has authored numerous books, including Implementing Splunk - Second Edition; Mastering Splunk; Hands-On Machine Learning with IBM Watson; IBM Watson Projects; Statistics for Data Science; Mastering Predictive Analytics with R - Second Edition and others. Project areas include those with Data Analytics, Planning Analytics, and FOPM projects, holding various roles from architect, developer, technical and project leader.Forte Rui Miguel :
Rui Miguel Forte is currently the chief data scientist at Workable. He was born and raised in Greece and studied in the UK. He is an experienced data scientist, having over 10 years of work experience in a diverse array of industries spanning mobile marketing, health informatics, education technology, and human resources technology. His projects have included predictive modeling of user behavior in mobile marketing promotions, speaker intent identification in an intelligent tutor, information extraction techniques for job applicant resumes and fraud detection for job scams. He currently teaches R, MongoDB, and other data science technologies to graduate students in the Business Analytics MSc program at the Athens University of Economics and Business. In addition, he has lectured in a number of seminars, specialization programs, and R schools for working data science professionals in Athens. His core programming knowledge is in R and Java, and he has extensive experience working with a variety of database technologies such as Oracle, PostgreSQL, MongoDB, and HBase. He holds a Master's degree in Electrical and Electronic Engineering from Imperial College London and is currently researching machine learning applications in information extraction and natural language processing.

Content

Cover
Copyright
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Gearing Up for Predictive Modeling
Models
Learning from data
The core components of a model
Our first model - k-nearest neighbors
Types of model
Supervised, unsupervised, semi-supervised, and reinforcement learning models
Parametric and nonparametric models
Regression and classification models
Real-time and batch machine learning models
The process of predictive modeling
Defining the model's objective
Collecting the data
Picking a model
Pre-processing the data
Exploratory data analysis
Feature transformations
Encoding categorical features
Missing data
Outliers
Removing problematic features
Feature engineering and dimensionality reduction
Training and assessing the model
Repeating with different models and final model selection
Deploying the model
Summary
Chapter 2: Tidying Data and Measuring Performance
Getting started
Tidying data
Categorizing data quality
The first step
The next step
The final step
Performance metrics
Assessing regression models
Assessing classification models
Assessing binary classification models
Cross-validation
Learning curves
Plot and ping
Summary
Chapter 3: Linear Regression
Introduction to linear regression
Assumptions of linear regression
Simple linear regression
Estimating the regression coefficients
Multiple linear regression
Predicting CPU performance
Predicting the price of used cars
Assessing linear regression models
Residual analysis
Significance tests for linear regression
Performance metrics for linear regression
Comparing different regression models
Test set performance
Problems with linear regression
Multicollinearity
Outliers
Feature selection
Regularization
Ridge regression
Least absolute shrinkage and selection operator (lasso)
Implementing regularization in R
Polynomial regression
Summary
Chapter 4: Generalized Linear Models
Classifying with linear regression
Introduction to logistic regression
Generalized linear models
Interpreting coefficients in logistic regression
Assumptions of logistic regression
Maximum likelihood estimation
Predicting heart disease
Assessing logistic regression models
Model deviance
Test set performance
Regularization with the lasso
Classification metrics
Extensions of the binary logistic classifier
Multinomial logistic regression
Predicting glass type
Ordinal logistic regression
Predicting wine quality
Poisson regression
Negative Binomial regression
Summary
Chapter 5: Neural Networks
The biological neuron
The artificial neuron
Stochastic gradient descent
Gradient descent and local minima
The perceptron algorithm
Linear separation
The logistic neuron
Multilayer perceptron networks
Training multilayer perceptron networks
The back propagation algorithm
Predicting the energy efficiency of buildings
Evaluating multilayer perceptrons for regression
Predicting glass type revisited
Predicting handwritten digits
Receiver operating characteristic curves
Radial basis function networks
Summary
Chapter 6: Support Vector Machines
Maximal margin classification
Support vector classification
Inner products
Kernels and support vector machines
Predicting chemical biodegration
Predicting credit scores
Multiclass classification with support vector machines
Summary
Chapter 7: Tree-Based Methods
The intuition for tree models
Algorithms for training decision trees
Classification and regression trees
CART regression trees
Tree pruning
Missing data
Regression model trees
CART classification trees
C5.0
Predicting class membership on synthetic 2D data
Predicting the authenticity of banknotes
Predicting complex skill learning
Tuning model parameters in CART trees
Variable importance in tree models
Regression model trees in action
Improvements to the M5 model
Summary
Chapter 8: Dimensionality Reduction
Defining DR
Correlated data analyses
Scatterplots
Causation
The degree of correlation
Reporting on correlation
Principal component analysis
Using R to understand PCA
Independent component analysis
Defining independence
ICA pre-processing
Factor analysis
Explore and confirm
Using R for factor analysis
The output
NNMF
Summary
Chapter 9: Ensemble Methods
Bagging
Margins and out-of-bag observations
Predicting complex skill learning with bagging
Predicting heart disease with bagging
Limitations of bagging
Boosting
AdaBoost
AdaBoost for binary classification
Predicting atmospheric gamma ray radiation
Predicting complex skill learning with boosting
Limitations of boosting
The importance of variables in random forests
XGBoost
Summary
Chapter 10: Probabilistic Graphical Models
A little graph theory
Bayes' theorem
Conditional independence
Bayesian networks
The Naïve Bayes classifier
Predicting the sentiment of movie reviews
Predicting promoter gene sequences
Predicting letter patterns in English words
Summary
Chapter 11 : Topic Modeling
An overview of topic modeling
Latent Dirichlet Allocation
The Dirichlet distribution
The generative process
Fitting an LDA model
Modeling the topics of online news stories
Model stability
Finding the number of topics
Topic distributions
Word distributions
LDA extensions
Modeling tweet topics
Word clouding
Summary
Chapter 12: Recommendation Systems
Rating matrix
Measuring user similarity
Collaborative filtering
User-based collaborative filtering
Item-based collaborative filtering
Singular value decomposition
Predicting recommendations for movies and jokes
Loading and pre-processing the data
Exploring the data
Evaluating binary top-N recommendations
Evaluating non-binary top-N recommendations
Evaluating individual predictions
Other approaches to recommendation systems
Summary
Chapter 13: Scaling Up
Starting the project
Data definition
Experience
Data of scale - big data
Using Excel to gauge your data
Characteristics of big data
Volume
Varieties
Sources and spans
Structure
Statistical noise
Training models at scale
Pain by phase
Specific challenges
Heterogeneity
Scale
Location
Timeliness
Privacy
Collaborations
Reproducibility
A path forward
Opportunities
Bigger data, bigger hardware
Breaking up
Sampling
Aggregation
Dimensional reduction
Alternatives
Chunking
Alternative language integrations
Summary
Chapter 14: Deep Learning
Machine learning or deep learning
What is deep learning?
An alternative to manual instruction
Growing importance
Deeper data?
Deep learning for IoT
Use cases
Word embedding
Word prediction
Word vectors
Numerical representations of contextual similarities
Netflix learns
Implementations
Deep learning architectures
Artificial neural networks
Recurrent neural networks
Summary
Index

System requirements

File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)

System requirements:

Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).

The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.

Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.

For more information, see our ebook Help page.

File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)

System requirements:

Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).

The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.

Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.

For more information, see our eBook Help page.

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Mastering Predictive Analytics with R, Second Edition

Description

All prices

More details

Person

Content

System requirements