
Mastering Predictive Analytics with R, Second Edition
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
All prices
More details
Person
James D. Miller is an IBM certified expert, Master Consultant, Application/System Architect with +35 years of applications & system design/development experience across multiple platforms, technologies and data formats, including Big Data. His experience includes IBM Planning Analytics, BI, Web architecture & design, systems analysis, GUI design & testing, Data modeling, design, and development of OLAP, Client/Server, Web & Mainframe applications and systems utilizing: Planning Analytics Workspace (PAW), IBM Watson Analytics, Cognos BI & TM1, Framework Manager, dynaSight/ArcPlan, ASP, DHTML, XML, MS Visual Basic, VBA, PERL, R, SPLUNK, MS SQL Server, ORACLE, etc. He has authored numerous books, including Implementing Splunk - Second Edition; Mastering Splunk; Hands-On Machine Learning with IBM Watson; IBM Watson Projects; Statistics for Data Science; Mastering Predictive Analytics with R - Second Edition and others. Project areas include those with Data Analytics, Planning Analytics, and FOPM projects, holding various roles from architect, developer, technical and project leader.Forte Rui Miguel :
Rui Miguel Forte is currently the chief data scientist at Workable. He was born and raised in Greece and studied in the UK. He is an experienced data scientist, having over 10 years of work experience in a diverse array of industries spanning mobile marketing, health informatics, education technology, and human resources technology. His projects have included predictive modeling of user behavior in mobile marketing promotions, speaker intent identification in an intelligent tutor, information extraction techniques for job applicant resumes and fraud detection for job scams. He currently teaches R, MongoDB, and other data science technologies to graduate students in the Business Analytics MSc program at the Athens University of Economics and Business. In addition, he has lectured in a number of seminars, specialization programs, and R schools for working data science professionals in Athens. His core programming knowledge is in R and Java, and he has extensive experience working with a variety of database technologies such as Oracle, PostgreSQL, MongoDB, and HBase. He holds a Master's degree in Electrical and Electronic Engineering from Imperial College London and is currently researching machine learning applications in information extraction and natural language processing.
Content
- Cover
- Copyright
- Credits
- About the Authors
- About the Reviewer
- www.PacktPub.com
- Customer Feedback
- Table of Contents
- Preface
- Chapter 1: Gearing Up for Predictive Modeling
- Models
- Learning from data
- The core components of a model
- Our first model - k-nearest neighbors
- Types of model
- Supervised, unsupervised, semi-supervised, and reinforcement learning models
- Parametric and nonparametric models
- Regression and classification models
- Real-time and batch machine learning models
- The process of predictive modeling
- Defining the model's objective
- Collecting the data
- Picking a model
- Pre-processing the data
- Exploratory data analysis
- Feature transformations
- Encoding categorical features
- Missing data
- Outliers
- Removing problematic features
- Feature engineering and dimensionality reduction
- Training and assessing the model
- Repeating with different models and final model selection
- Deploying the model
- Summary
- Chapter 2: Tidying Data and Measuring Performance
- Getting started
- Tidying data
- Categorizing data quality
- The first step
- The next step
- The final step
- Performance metrics
- Assessing regression models
- Assessing classification models
- Assessing binary classification models
- Cross-validation
- Learning curves
- Plot and ping
- Summary
- Chapter 3: Linear Regression
- Introduction to linear regression
- Assumptions of linear regression
- Simple linear regression
- Estimating the regression coefficients
- Multiple linear regression
- Predicting CPU performance
- Predicting the price of used cars
- Assessing linear regression models
- Residual analysis
- Significance tests for linear regression
- Performance metrics for linear regression
- Comparing different regression models
- Test set performance
- Problems with linear regression
- Multicollinearity
- Outliers
- Feature selection
- Regularization
- Ridge regression
- Least absolute shrinkage and selection operator (lasso)
- Implementing regularization in R
- Polynomial regression
- Summary
- Chapter 4: Generalized Linear Models
- Classifying with linear regression
- Introduction to logistic regression
- Generalized linear models
- Interpreting coefficients in logistic regression
- Assumptions of logistic regression
- Maximum likelihood estimation
- Predicting heart disease
- Assessing logistic regression models
- Model deviance
- Test set performance
- Regularization with the lasso
- Classification metrics
- Extensions of the binary logistic classifier
- Multinomial logistic regression
- Predicting glass type
- Ordinal logistic regression
- Predicting wine quality
- Poisson regression
- Negative Binomial regression
- Summary
- Chapter 5: Neural Networks
- The biological neuron
- The artificial neuron
- Stochastic gradient descent
- Gradient descent and local minima
- The perceptron algorithm
- Linear separation
- The logistic neuron
- Multilayer perceptron networks
- Training multilayer perceptron networks
- The back propagation algorithm
- Predicting the energy efficiency of buildings
- Evaluating multilayer perceptrons for regression
- Predicting glass type revisited
- Predicting handwritten digits
- Receiver operating characteristic curves
- Radial basis function networks
- Summary
- Chapter 6: Support Vector Machines
- Maximal margin classification
- Support vector classification
- Inner products
- Kernels and support vector machines
- Predicting chemical biodegration
- Predicting credit scores
- Multiclass classification with support vector machines
- Summary
- Chapter 7: Tree-Based Methods
- The intuition for tree models
- Algorithms for training decision trees
- Classification and regression trees
- CART regression trees
- Tree pruning
- Missing data
- Regression model trees
- CART classification trees
- C5.0
- Predicting class membership on synthetic 2D data
- Predicting the authenticity of banknotes
- Predicting complex skill learning
- Tuning model parameters in CART trees
- Variable importance in tree models
- Regression model trees in action
- Improvements to the M5 model
- Summary
- Chapter 8: Dimensionality Reduction
- Defining DR
- Correlated data analyses
- Scatterplots
- Causation
- The degree of correlation
- Reporting on correlation
- Principal component analysis
- Using R to understand PCA
- Independent component analysis
- Defining independence
- ICA pre-processing
- Factor analysis
- Explore and confirm
- Using R for factor analysis
- The output
- NNMF
- Summary
- Chapter 9: Ensemble Methods
- Bagging
- Margins and out-of-bag observations
- Predicting complex skill learning with bagging
- Predicting heart disease with bagging
- Limitations of bagging
- Boosting
- AdaBoost
- AdaBoost for binary classification
- Predicting atmospheric gamma ray radiation
- Predicting complex skill learning with boosting
- Limitations of boosting
- The importance of variables in random forests
- XGBoost
- Summary
- Chapter 10: Probabilistic Graphical Models
- A little graph theory
- Bayes' theorem
- Conditional independence
- Bayesian networks
- The Naïve Bayes classifier
- Predicting the sentiment of movie reviews
- Predicting promoter gene sequences
- Predicting letter patterns in English words
- Summary
- Chapter 11 : Topic Modeling
- An overview of topic modeling
- Latent Dirichlet Allocation
- The Dirichlet distribution
- The generative process
- Fitting an LDA model
- Modeling the topics of online news stories
- Model stability
- Finding the number of topics
- Topic distributions
- Word distributions
- LDA extensions
- Modeling tweet topics
- Word clouding
- Summary
- Chapter 12: Recommendation Systems
- Rating matrix
- Measuring user similarity
- Collaborative filtering
- User-based collaborative filtering
- Item-based collaborative filtering
- Singular value decomposition
- Predicting recommendations for movies and jokes
- Loading and pre-processing the data
- Exploring the data
- Evaluating binary top-N recommendations
- Evaluating non-binary top-N recommendations
- Evaluating individual predictions
- Other approaches to recommendation systems
- Summary
- Chapter 13: Scaling Up
- Starting the project
- Data definition
- Experience
- Data of scale - big data
- Using Excel to gauge your data
- Characteristics of big data
- Volume
- Varieties
- Sources and spans
- Structure
- Statistical noise
- Training models at scale
- Pain by phase
- Specific challenges
- Heterogeneity
- Scale
- Location
- Timeliness
- Privacy
- Collaborations
- Reproducibility
- A path forward
- Opportunities
- Bigger data, bigger hardware
- Breaking up
- Sampling
- Aggregation
- Dimensional reduction
- Alternatives
- Chunking
- Alternative language integrations
- Summary
- Chapter 14: Deep Learning
- Machine learning or deep learning
- What is deep learning?
- An alternative to manual instruction
- Growing importance
- Deeper data?
- Deep learning for IoT
- Use cases
- Word embedding
- Word prediction
- Word vectors
- Numerical representations of contextual similarities
- Netflix learns
- Implementations
- Deep learning architectures
- Artificial neural networks
- Recurrent neural networks
- Summary
- Index
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.