
R: Recipes for Analysis, Visualization and Machine Learning
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
All prices
More details
Persons
Yu-Wei, Chiu (David Chiu) is the founder of LargitData (www.LargitData.com), a startup company that mainly focuses on providing big data and machine learning products. He has previously worked for Trend Micro as a software engineer, where he was responsible for building big data platforms for business intelligence and customer relationship management systems. In addition to being a start-up entrepreneur and data scientist, he specializes in using Spark and Hadoop to process big data and apply data mining techniques for data analysis. Yu-Wei is also a professional lecturer and has delivered lectures on big data and machine learning in R and Python, and given tech talks at a variety of conferences. In 2015, Yu-Wei wrote Machine Learning with R Cookbook, Packt Publishing. In 2013, Yu-Wei reviewed Bioinformatics with R Cookbook, Packt Publishing. For more information, please visit his personal website at www.ywchiu.com. **********************************Acknowledgement************************************** I have immense gratitude for my family and friends for supporting and encouraging me to complete this book. I would like to sincerely thank my mother, Ming-Yang Huang (Miranda Huang); my mentor, Man-Kwan Shan; the proofreader of this book, Brendan Fisher; Members of LargitData; Data Science Program (DSP); and other friends who have offered their support.Gohil Atmajitsinh :
nanViswanathan Shanthi :
nanViswanathan Viswa :
Viswa Viswanathan is an associate professor of Computing and Decision Sciences at the Stillman School of Business in Seton Hall University. After completing his PhD in Artificial Intelligence, Viswa spent a decade in academia and then switched to a leadership position in the software industry for another decade during which he worked for Infosys, Igate, and Starbase. He embraced academia once again in 2001. Viswa has taught extensively in fields ranging from operations research, computer science, software engineering, management information systems, and enterprise systems. In addition to university teaching, Viswa has conducted training programs for industry professionals and has written several peer-reviewed research publications in journals such as Operations Research, IEEE Software, Computers and Industrial Engineering, and International Journal of Artificial Intelligence in Education. He has authored a book titled Data Analytics with R:A hands-on approach. Viswa thoroughly enjoys hands-on software development and has single-handedly conceived, architected, developed, and deployed several web-based applications. Apart from his deep interest in technical fields such as data analytics, artificial intelligence, computer science, and software engineering, Viswa harbors a deep interest in education with special emphasis on the roots of learning and methods to foster deeper learning. He has done research in this area and hopes to pursue the subject further. Viswa would like to express deep gratitude to professors Amitava Bagchi and Anup Sen, who were inspirational forces during his early research career. He is also grateful to several extremely intelligent colleagues, notable among them being Rajesh Venkatesh, Dan Richner, and Sriram Bala, who significantly shaped his thinking. His aunt, Analdavalli; his sister, Sankari; and his wife, Shanthi, taught him much about hard work, and even the little he has absorbed has helped him immensely. His sons, Nitin and Siddarth, have helped with numerous insightful comments on various topics.
Content
- Cover
- Copyright
- Credits
- Preface
- Table of Contents
- Module 1
- Chapter 1: A Simple Guide to R
- Installing packages and getting help in R
- Data types in R
- Special values in R
- Matrices in R
- Editing a matrix in R
- Data frames in R
- Editing a data frame in R
- Importing data in R
- Exporting data in R
- Writing a function in R
- Writing if else statements in R
- Basic loops in R
- Nested loops in R
- The apply, lapply, sapply, and tapply functions
- Using par to beautify a plot in R
- Saving plots
- Chapter 2: Practical Machine Learning with R
- Introduction
- Downloading and installing R
- Downloading and installing RStudio
- Installing and loading packages
- Reading and writing data
- Using R to manipulate data
- Applying basic statistics
- Visualizing data
- Getting a dataset for machine learning
- Chapter 3: Acquire and Prepare the Ingredients - Your Data
- Introduction
- Reading data from CSV files
- Reading XML data
- Reading JSON data
- Reading data from fixed-width formatted files
- Reading data from R files and R libraries
- Removing cases with missing values
- Replacing missing values with the mean
- Removing duplicate cases
- Rescaling a variable to [0,1]
- Normalizing or standardizing data in a data frame
- Binning numerical data
- Creating dummies for categorical variables
- Chapter 4: What's in There? - Exploratory Data Analysis
- Introduction
- Creating standard data summaries
- Extracting a subset of a dataset
- Splitting a dataset
- Creating random data partitions
- Generating standard plots such as histograms, boxplots, and scatterplots
- Generating multiple plots on a grid
- Selecting a graphics device
- Creating plots with the lattice package
- Creating plots with the ggplot2 package
- Creating charts that facilitate comparisons
- Creating charts that help visualize a possible causality
- Creating multivariate plots
- Chapter 5: Where Does It Belong? - Classification
- Introduction
- Generating error/classification-confusion matrices
- Generating ROC charts
- Building, plotting, and evaluating - classification trees
- Using random forest models for classification
- Classifying using Support Vector Machine
- Classifying using the Naïve Bayes approach
- Classifying using the KNN approach
- Using neural networks for classification
- Classifying using linear discriminant function analysis
- Classifying using logistic regression
- Using AdaBoost to combine classification tree models
- Chapter 6: Give Me a Number - Regression
- Introduction
- Computing the root mean squared error
- Building KNN models for regression
- Performing linear regression
- Performing variable selection in linear regression
- Building regression trees
- Building random forest models for regression
- Using neural networks for regression
- Performing k-fold cross-validation
- Performing leave-one-out-cross-validation to limit overfitting
- Chapter 7: Can You Simplify That? - Data Reduction Techniques
- Introduction
- Performing cluster analysis using K-means clustering
- Performing cluster analysis using hierarchical clustering
- Reducing dimensionality with principal component analysis
- Chapter 8: Lessons from History - Time Series Analysis
- Introduction
- Creating and examining date objects
- Operating on date objects
- Performing preliminary analyses on time series data
- Using time series objects
- Decomposing time series
- Filtering time series data
- Smoothing and forecasting using the Holt-Winters method
- Building an automated ARIMA model
- Chapter 9: It's All About Your Connections - Social Network Analysis
- Introduction
- Downloading social network data using public APIs
- Creating adjacency matrices and edge lists
- Plotting social network data
- Computing important network metrics
- Chapter 10: Put Your Best Foot Forward - Document and Present Your Analysis
- Introduction
- Generating reports of your data analysis with R Markdown and knitr
- Creating interactive web applications with shiny
- Creating PDF presentations of your analysis with R Presentation
- Chapter 11: Work Smarter, Not Harder - Efficient and Elegant R Code
- Introduction
- Exploiting vectorized operations
- Processing entire rows or columns using the apply function
- Applying a function to all elements of a collection with lapply and sapply
- Applying functions to subsets of a vector
- Using the split-apply-combine strategy with plyr
- Slicing, dicing, and combining data with data tables
- Chapter 12: Where in the World? - Geospatial Analysis
- Introduction
- Downloading and plotting a Google map of an area
- Overlaying data on the downloaded Google map
- Importing ESRI shape files into R
- Using the sp package to plot geographic data
- Getting maps from the maps package
- Creating spatial data frames from regular data frames containing spatial and other data
- Creating spatial data frames by combining regular data frames with spatial objects
- Adding variables to an existing spatial data frame
- Chapter 13: Playing Nice - Connecting to Other Systems
- Introduction
- Using Java objects in R
- Using JRI to call R functions from Java
- Using Rserve to call R functions from Java
- Executing R scripts from Java
- Using the xlsx package to connect to Excel
- Reading data from relational databases - MySQL
- Reading data from NoSQL databases - MongoDB
- Module 2
- Chapter 1: Basic and Interactive Plots
- Introduction
- Introducing a scatter plot
- Scatter plots with texts, labels, and lines
- Connecting points in a scatter plot
- Generating an interactive scatter plot
- A simple bar plot
- An interactive bar plot
- A simple line plot
- Line plot to tell an effective story
- Generating an interactive Gantt/timeline chart in R
- Merging histograms
- Making an interactive bubble plot
- Constructing a waterfall plot in R
- Chapter 2: Heat Maps and Dendrograms
- Introduction
- Constructing a simple dendrogram
- Creating dendrograms with colors and labels
- Creating a heat map
- Generating a heat map with customized colors
- Generating an integrated dendrogram and a heat map
- Creating a three-dimensional heat map and a stereo map
- Constructing a tree map in R
- Chapter 3: Maps
- Introduction
- Introducing regional maps
- Introducing choropleth maps
- A guide to contour maps
- Constructing maps with bubbles
- Integrating text with maps
- Introducing shapefiles
- Creating cartograms
- Chapter 4: The Pie Chart and Its Alternatives
- Introduction
- Generating a simple pie chart
- Constructing pie charts with labels
- Creating donut plots and interactive plots
- Generating a slope chart
- Constructing a fan plot
- Chapter 5: Adding the Third Dimension
- Introduction
- Constructing a 3D scatter plot
- Generating a 3D scatter plot with text
- A simple 3D pie chart
- A simple 3D histogram
- Generating a 3D contour plot
- Integrating a 3D contour and a surface plot
- Animating a 3D surface plot
- Chapter 6: Data in Higher Dimensions
- Introduction
- Constructing a sunflower plot
- Creating a hexbin plot
- Generating interactive calendar maps
- Creating Chernoff faces in R
- Constructing a coxcomb plot in R
- Constructing network plots
- Constructing a radial plot
- Generating a very basic pyramid plot
- Chapter 7: Visualizing Continuous Data
- Introduction
- Generating a candlestick plot
- Generating interactive candlestick plots
- Generating a decomposed time series
- Plotting a regression line
- Constructing a box and whiskers plot
- Generating a violin plot
- Generating a quantile-quantile plot (QQ plot)
- Generating a density plot
- Generating a simple correlation plot
- Chapter 8: Visualizing Text and XKCD-style Plots
- Introduction
- Generating a word cloud
- Constructing a word cloud from a document
- Generating a comparison cloud
- Constructing a correlation plot and a phrase tree
- Generating plots with custom fonts
- Generating an XKCD-style plot
- Chapter 9: Creating Applications in R
- Introduction
- Creating animated plots in R
- Creating a presentation in R
- A basic introduction to API and XML
- Constructing a bar plot using XML in R
- Creating a very simple shiny app in R
- Module 3
- Chapter 1: Data Exploration with RMS Titanic
- Introduction
- Reading a Titanic dataset from a CSV file
- Converting types on character variables
- Detecting missing values
- Imputing missing values
- Exploring and visualizing data
- Predicting passenger survival with a decision tree
- Validating the power of prediction with a confusion matrix
- Assessing performance with the ROC curve
- Chapter 2: R and Statistics
- Introduction
- Understanding data sampling in R
- Operating a probability distribution in R
- Working with univariate descriptive statistics in R
- Performing correlations and multivariate analysis
- Operating linear regression and multivariate analysis
- Conducting an exact binomial test
- Performing student's t-test
- Performing the Kolmogorov-Smirnov test
- Understanding the Wilcoxon Rank Sum and Signed Rank test
- Working with Pearson's Chi-squared test
- Conducting a one-way ANOVA
- Performing a two-way ANOVA
- Chapter 3: Understanding Regression Analysis
- Introduction
- Fitting a linear regression model with lm
- Summarizing linear model fits
- Using linear regression to predict unknown values
- Generating a diagnostic plot of a fitted model
- Fitting a polynomial regression model with lm
- Fitting a robust linear regression model with rlm
- Studying a case of linear regression on SLID data
- Applying the Gaussian model for generalized linear regression
- Applying the Poisson model for generalized linear regression
- Applying the Binomial model for generalized linear regression
- Fitting a generalized additive model to data
- Visualizing a generalized additive model
- Diagnosing a generalized additive model
- Chapter 4: Classification (I) - Tree, Lazy, and Probabilistic
- Introduction
- Preparing the training and testing datasets
- Building a classification model with recursive partitioning trees
- Visualizing a recursive partitioning tree
- Measuring the prediction performance of a recursive partitioning tree
- Pruning a recursive partitioning tree
- Building a classification model with a conditional inference tree
- Visualizing a conditional inference tree
- Measuring the prediction performance of a conditional inference tree
- Classifying data with the k-nearest neighbor classifier
- Classifying data with logistic regression
- Classifying data with the Naïve Bayes classifier
- Chapter 5: Classification (II) - Neural Network and SVM
- Introduction
- Classifying data with a support vector machine
- Choosing the cost of a support vector machine
- Visualizing an SVM fit
- Predicting labels based on a model trained by a support vector machine
- Tuning a support vector machine
- Training a neural network with neuralnet
- Visualizing a neural network trained by neuralnet
- Predicting labels based on a model trained by neuralnet
- Training a neural network with nnet
- Predicting labels based on a model trained by nnet
- Chapter 6: Model Evaluation
- Introduction
- Estimating model performance with k-fold cross-validation
- Performing cross-validation with the e1071 package
- Performing cross-validation with the caret package
- Ranking the variable importance with the caret package
- Ranking the variable importance with the rminer package
- Finding highly correlated features with the caret package
- Selecting features using the caret package
- Measuring the performance of the regression model
- Measuring prediction performance with a confusion matrix
- Measuring prediction performance using ROCR
- Comparing an ROC curve using the caret package
- Measuring performance differences between models with the caret package
- Chapter 7: Ensemble Learning
- Introduction
- Classifying data with the bagging method
- Performing cross-validation with the bagging method
- Classifying data with the boosting method
- Performing cross-validation with the boosting method
- Classifying data with gradient boosting
- Calculating the margins of a classifier
- Calculating the error evolution of the ensemble method
- Classifying data with random forest
- Estimating the prediction errors of different classifiers
- Chapter 8: Clustering
- Introduction
- Clustering data with hierarchical clustering
- Cutting trees into clusters
- Clustering data with the k-means method
- Drawing a bivariate cluster plot
- Comparing clustering methods
- Extracting silhouette information from clustering
- Obtaining the optimum number of clusters for k-means
- Clustering data with the density-based method
- Clustering data with the model-based method
- Visualizing a dissimilarity matrix
- Validating clusters externally
- Chapter 9: Association Analysis and Sequence Mining
- Introduction
- Transforming data into transactions
- Displaying transactions and associations
- Mining associations with the Apriori rule
- Pruning redundant rules
- Visualizing association rules
- Mining frequent itemsets with Eclat
- Creating transactions with temporal information
- Mining frequent sequential patterns with cSPADE
- Chapter 10 : Dimension Reduction
- Introduction
- Performing feature selection with FSelector
- Performing dimension reduction with PCA
- Determining the number of principal components using the scree test
- Determining the number of principal components using the Kaiser method
- Visualizing multivariate data using biplot
- Performing dimension reduction with MDS
- Reducing dimensions with SVD
- Compressing images with SVD
- Performing nonlinear dimension reduction with ISOMAP
- Performing nonlinear dimension reduction with Local Linear Embedding
- Chapter 11: Big Data Analysis (R and Hadoop)
- Introduction
- Preparing the RHadoop environment
- Installing rmr2
- Installing rhdfs
- Operating HDFS with rhdfs
- Implementing a word count problem with RHadoop
- Comparing the performance between an R MapReduce program and a standard R program
- Testing and debugging the rmr2 program
- Installing plyrmr
- Manipulating data with plyrmr
- Conducting machine learning with RHadoop
- Configuring RHadoop clusters on Amazon EMR
- Appendix A: Resources for R and Machine Learning
- Appendix B: Dataset - Survival of Passengers on the Titanic
- Bibliography
- Index
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.