
Advanced Machine Learning with R
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Master machine learning techniques with real-world projects that interface TensorFlow with R, H2O, MXNet, and other languages
Key Features:
Gain expertise in machine learning, deep learning and other techniquesBuild intelligent end-to-end projects for finance, social media, and a variety of domainsImplement multi-class classification, regression, and clustering
Book Description:
R is one of the most popular languages when it comes to exploring the mathematical side of machine learning and easily performing computational statistics.
This Learning Path shows you how to leverage the R ecosystem to build efficient machine learning applications that carry out intelligent tasks within your organization. You'll tackle realistic projects such as building powerful machine learning models with ensembles to predict employee attrition. You'll explore different clustering techniques to segment customers using wholesale data and use TensorFlow and Keras-R for performing advanced computations. You'll also be introduced to reinforcement learning along with its various use cases and models. Additionally, it shows you how some of these black-box models can be diagnosed and understood.
By the end of this Learning Path, you'll be equipped with the skills you need to deploy machine learning techniques in your own projects.
This Learning Path includes content from the following Packt products:
R Machine Learning Projects by Dr. Sunil Kumar ChinnamgariMastering Machine Learning with R - Third Edition by Cory Lesmeister
What you will learn:
Develop a joke recommendation engine to recommend jokes that match users' tastesBuild autoencoders for credit card fraud detectionWork with image recognition and convolutional neural networksMake predictions for casino slot machine using reinforcement learningImplement NLP techniques for sentiment analysis and customer segmentationProduce simple and effective data visualizations for improved insightsUse NLP to extract insights for textImplement tree-based classifiers including random forest and boosted tree
Who this book is for:
If you are a data analyst, data scientist, or machine learning developer this is an ideal Learning Path for you. Each project will help you test your skills in implementing machine learning algorithms and techniques. A basic understanding of machine learning and working knowledge of R programming is necessary to get the most out of this Learning Path.
Cory Lesmeister has over fourteen years of quantitative experience and is currently a senior data scientist for the advanced analytics team at Cummins, Inc. in Columbus, Indiana. He has spent 16 years at Eli Lilly and Company in sales, market research, Lean Six Sigma, marketing analytics, and new product forecasting. He also has several years of experience in the insurance and banking industries, both as a consultant and as a manager of marketing analytics. A former US Army active duty and reserve officer, Cory was stationed in Baghdad, Iraq, in 2009 serving as the strategic advisor to the 29,000-person Iraqi Oil Police, succeeding where others failed by acquiring and delivering promised equipment to help the country secure and protect its oil infrastructure. He has a BBA in aviation administration from the University of North Dakota and a commercial helicopter license. Dr. Sunil Kumar Chinnamgari has a Ph.D. in computer science and he specializes in machine learning and natural language processing. He is an AI researcher with more than 14 years of industry experience. Currently, he works in the capacity of a lead data scientist with a US financial giant. He has published several research papers in Scopus and IEEE journals and is a frequent speaker at various meetups. He is an avid coder and has won multiple hackathons. In his spare time, Sunil likes to teach, travel, and spend time with family.
More details
Other editions
Additional editions

Content
- Cover
- Title Page
- Copyright and Credits
- About Packt
- Contributors
- Table of Contents
- Preface
- Chapter 1: Preparing and Understanding Data
- Overview
- Reading the data
- Handling duplicate observations
- Descriptive statistics
- Exploring categorical variables
- Handling missing values
- Zero and near-zero variance features
- Treating the data
- Correlation and linearity
- Summary
- Chapter 2: Linear Regression
- Univariate linear regression
- Building a univariate model
- Reviewing model assumptions
- Multivariate linear regression
- Loading and preparing the data
- Modeling and evaluation - stepwise regression
- Modeling and evaluation - MARS
- Reverse transformation of natural log predictions
- Summary
- Chapter 3: Logistic Regression
- Classification methods and linear regression
- Logistic regression
- Model training and evaluation
- Training a logistic regression algorithm
- Weight of evidence and information value
- Feature selection
- Cross-validation and logistic regression
- Multivariate adaptive regression splines
- Model comparison
- Summary
- Chapter 4: Advanced Feature Selection in Linear Models
- Regularization overview
- Ridge regression
- LASSO
- Elastic net
- Data creation
- Modeling and evaluation
- Ridge regression
- LASSO
- Elastic net
- Summary
- Chapter 5: K-Nearest Neighbors and Support Vector Machines
- K-nearest neighbors
- Support vector machines
- Manipulating data
- Dataset creation
- Data preparation
- Modeling and evaluation
- KNN modeling
- Support vector machine
- Summary
- Chapter 6: Tree-Based Classification
- An overview of the techniques
- Understanding a regression tree
- Classification trees
- Random forest
- Gradient boosting
- Datasets and modeling
- Classification tree
- Random forest
- Extreme gradient boosting - classification
- Feature selection with random forests
- Summary
- Chapter 7: Neural Networks and Deep Learning
- Introduction to neural networks
- Deep learning - a not-so-deep overview
- Deep learning resources and advanced methods
- Creating a simple neural network
- Data understanding and preparation
- Modeling and evaluation
- An example of deep learning
- Keras and TensorFlow background
- Loading the data
- Creating the model function
- Model training
- Summary
- Chapter 8: Creating Ensembles and Multiclass Methods
- Ensembles
- Data understanding
- Modeling and evaluation
- Random forest model
- Creating an ensemble
- Summary
- Chapter 9: Cluster Analysis
- Hierarchical clustering
- Distance calculations
- K-means clustering
- Gower and PAM
- Gower
- PAM
- Random forest
- Dataset background
- Data understanding and preparation
- Modeling
- Hierarchical clustering
- K-means clustering
- Gower and PAM
- Random forest and PAM
- Summary
- Chapter 10: Principal Component Analysis
- An overview of the principal components
- Rotation
- Data
- Data loading and review
- Training and testing datasets
- PCA modeling
- Component extraction
- Orthogonal rotation and interpretation
- Creating scores from the components
- Regression with MARS
- Test data evaluation
- Summary
- Chapter 11: Association Analysis
- An overview of association analysis
- Creating transactional data
- Data understanding
- Data preparation
- Modeling and evaluation
- Summary
- Chapter 12: Time Series and Causality
- Univariate time series analysis
- Understanding Granger causality
- Time series data
- Data exploration
- Modeling and evaluation
- Univariate time series forecasting
- Examining the causality
- Linear regression
- Vector autoregression
- Summary
- Chapter 13: Text Mining
- Text mining framework and methods
- Topic models
- Other quantitative analysis
- Data overview
- Data frame creation
- Word frequency
- Word frequency in all addresses
- Lincoln's word frequency
- Sentiment analysis
- N-grams
- Topic models
- Classifying text
- Data preparation
- LASSO model
- Additional quantitative analysis
- Summary
- Chapter 14: Exploring the Machine Learning Landscape
- ML versus software engineering
- Types of ML methods
- Supervised learning
- Unsupervised learning
- Semi-supervised learning
- Reinforcement learning
- Transfer learning
- ML terminology - a quick review
- Deep learning
- Big data
- Natural language processing
- Computer vision
- Cost function
- Model accuracy
- Confusion matrix
- Predictor variables
- Response variable
- Dimensionality reduction
- Class imbalance problem
- Model bias and variance
- Underfitting and overfitting
- Data preprocessing
- Holdout sample
- Hyperparameter tuning
- Performance metrics
- Feature engineering
- Model interpretability
- ML project pipeline
- Business understanding
- Understanding and sourcing the data
- Preparing the data
- Model building and evaluation
- Model deployment
- Learning paradigm
- Datasets
- Summary
- Chapter 15: Predicting Employee Attrition Using Ensemble Models
- Philosophy behind ensembling
- Getting started
- Understanding the attrition problem and the dataset
- K-nearest neighbors model for benchmarking the performance
- Bagging
- Bagged classification and regression trees (treeBag) implementation
- Support vector machine bagging (SVMBag) implementation
- Naive Bayes (nbBag) bagging implementation
- Randomization with random forests
- Implementing an attrition prediction model with random forests
- Boosting
- The GBM implementation
- Building attrition prediction model with XGBoost
- Stacking
- Building attrition prediction model with stacking
- Summary
- Chapter 16: Implementing a Jokes Recommendation Engine
- Fundamental aspects of recommendation engines
- Recommendation engine categories
- Content-based filtering
- Collaborative filtering
- Hybrid filtering
- Getting started
- Understanding the Jokes recommendation problem and the dataset
- Converting the DataFrame
- Dividing the DataFrame
- Building a recommendation system with an item-based collaborative filtering technique
- Building a recommendation system with a user-based collaborative filtering technique
- Building a recommendation system based on an association-rule mining technique
- The Apriori algorithm
- Content-based recommendation engine
- Differentiating between ITCF and content-based recommendations
- Building a hybrid recommendation system for Jokes recommendations
- Summary
- References
- Chapter 17: Sentiment Analysis of Amazon Reviews with NLP
- The sentiment analysis problem
- Getting started
- Understanding the Amazon reviews dataset
- Building a text sentiment classifier with the BoW approach
- Pros and cons of the BoW approach
- Understanding word embedding
- Building a text sentiment classifier with pretrained word2vec word embedding based on Reuters news corpus
- Building a text sentiment classifier with GloVe word embedding
- Building a text sentiment classifier with fastText
- Summary
- Chapter 18: Customer Segmentation Using Wholesale Data
- Understanding customer segmentation
- Understanding the wholesale customer dataset and the segmentation problem
- Categories of clustering algorithms
- Identifying the customer segments in wholesale customer data using k-means clustering
- Working mechanics of the k-means algorithm
- Identifying the customer segments in the wholesale customer data using DIANA
- Identifying the customer segments in the wholesale customers data using AGNES
- Summary
- Chapter 19: Image Recognition Using Deep Neural Networks
- Technical requirements
- Understanding computer vision
- Achieving computer vision with deep learning
- Convolutional Neural Networks
- Layers of CNNs
- Introduction to the MXNet framework
- Understanding the MNIST dataset
- Implementing a deep learning network for handwritten digit recognition
- Implementing dropout to avoid overfitting
- Implementing the LeNet architecture with the MXNet library
- Implementing computer vision with pretrained models
- Summary
- Chapter 20: Credit Card Fraud Detection Using Autoencoders
- Machine learning in credit card fraud detection
- Autoencoders explained
- Types of AEs based on hidden layers
- Types of AEs based on restrictions
- Applications of AEs
- The credit card fraud dataset
- Building AEs with the H2O library in R
- Autoencoder code implementation for credit card fraud detection
- Summary
- Chapter 21: Automatic Prose Generation with Recurrent Neural Networks
- Understanding language models
- Exploring recurrent neural networks
- Comparison of feedforward neural networks and RNNs
- Backpropagation through time
- Problems and solutions to gradients in RNN
- Exploding gradients
- Vanishing gradients
- Building an automated prose generator with an RNN
- Implementing the project
- Summary
- Chapter 22: Winning the Casino Slot Machines with Reinforcement Learning
- Understanding RL
- Comparison of RL with other ML algorithms
- Terminology of RL
- The multi-arm bandit problem
- Strategies for solving MABP
- The epsilon-greedy algorithm
- Boltzmann or softmax exploration
- Decayed epsilon greedy
- The upper confidence bound algorithm
- Thompson sampling
- Multi-arm bandit - real-world use cases
- Solving the MABP with UCB and Thompson sampling algorithms
- Summary
- Appendix: Creating a Package
- Other Books You May Enjoy
- Index
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.