
Data Analysis with R
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
- [*]Gain a deeper understanding of fundamentals of applied statistics
- [*]A practical guide to performing data analysis in practice
Book DescriptionFrequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. With over 7,000 user contributed packages, it's easy to find support for the latest and greatest algorithms and techniques. Starting with the basics of R and statistical reasoning, Data Analysis with R dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this book begins with a review of R and its syntax. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with "messy data", large data, communicating results, and facilitating reproducibility. This book is engineered to be an invaluable resource through many stages of anyone's career as a data analyst. What you will learn - [*]Navigate the R environment
- [*]Describe and visualize the behavior of data and relationships between data
- [*]Gain a thorough understanding of statistical reasoning and sampling
- [*]Employ hypothesis tests to draw inferences from your data
- [*]Learn Bayesian methods for estimating parameters
- [*]Perform regression to predict continuous variables
- [*]Apply powerful classification methods to predict categorical data
- [*]Handle missing data gracefully using multiple imputation
- [*]Identify and manage problematic data points
- [*]Employ parallelization and Rcpp to scale your analyses to larger data
- [*]Put best practices into effect to make your job easier and facilitate reproducibility
Who this book is forWhether you are learning data analysis for the first time, or you want to deepen the understanding you already have, this book will prove to an invaluable resource. If you are looking for a book to bring you all the way through the fundamentals to the application of advanced and effective analytics methodologies, and have some prior programming experience and a mathematical background, then this is for you.
More details
Other editions
Additional editions

Content
- Cover
- Copyright
- Credits
- About the Author
- About the Reviewer
- www.PacktPub.com
- Table of Contents
- Preface
- Chapter 1: RefresheR
- Navigating the basics
- Arithmetic and assignment
- Logicals and characters
- Flow of control
- Getting help in R
- Vectors
- Subsetting
- Vectorized functions
- Advanced subsetting
- Recycling
- Functions
- Matrices
- Loading data into R
- Working with packages
- Exercises
- Summary
- Chapter 2: The Shape of Data
- Univariate data
- Frequency distributions
- Central tendency
- Spread
- Populations, samples, and estimation
- Probability distributions
- Visualization methods
- Exercises
- Summary
- Chapter 3: Describing Relationships
- Multivariate data
- Relationships between a categorical and a continuous variable
- Relationships between two categorical variables
- The relationship between two continuous variables
- Covariance
- Correlation coefficients
- Comparing multiple correlations
- Visualization methods
- Categorical and continuous variables
- Two categorical variables
- Two continuous variables
- More than two continuous variables
- Exercises
- Summary
- Chapter 4: Probability
- Basic probability
- A tale of two interpretations
- Sampling from distributions
- Parameters
- The binomial distribution
- The normal distribution
- The three-sigma rule and using z-tables
- Exercises
- Summary
- Chapter 5: Using Data to Reason About the World
- Estimating means
- The sampling distribution
- Interval estimation
- How did we get 1.96?
- Smaller samples
- Exercises
- Summary
- Chapter 6: Testing Hypotheses
- Null Hypothesis Significance Testing
- One and two-tailed tests
- When things go wrong
- A warning about significance
- A warning about p-values
- Testing the mean of one sample
- Assumptions of the one sample t-test
- Testing two means
- Don't be fooled!
- Assumptions of the independent samples t-test
- Testing more than two means
- Assumptions of ANOVA
- Testing independence of proportions
- What if my assumptions are unfounded?
- Exercises
- Summary
- Chapter 7: Bayesian Methods
- The big idea behind Bayesian analysis
- Choosing a prior
- Who cares about coin flips
- Enter MCMC - stage left
- Using JAGS and runjags
- Fitting distributions the Bayesian way
- The Bayesian independent samples t-test
- Exercises
- Summary
- Chapter 8: Predicting Continuous Variables
- Linear models
- Simple linear regression
- Simple linear regression with a binary predictor
- A word of warning
- Multiple regression
- Regression with a non-binary predictor
- Kitchen sink regression
- The bias-variance trade-off
- Cross-validation
- Striking a balance
- Linear regression diagnostics
- Second Anscombe relationship
- Third Anscombe relationship
- Fourth Anscombe relationship
- Advanced topics
- Exercises
- Summary
- Chapter 9: Predicting Categorical Variables
- k-Nearest Neighbors
- Using k-NN in R
- Confusion matrices
- Limitations of k-NN
- Logistic regression
- Using logistic regression in R
- Decision trees
- Random forests
- Choosing a classifier
- The vertical decision boundary
- The diagonal decision boundary
- The crescent decision boundary
- The circular decision boundary
- Exercises
- Summary
- Chapter 10: Sources of Data
- Relational Databases
- Why didn't we just do that in SQL?
- Using JSON
- XML
- Other data formats
- Online repositories
- Exercises
- Summary
- Chapter 11: Dealing with Messy Data
- Analysis with missing data
- Visualizing missing data
- Types of missing data
- So which one is it?
- Unsophisticated methods for dealing with missing data
- Complete case analysis
- Pairwise deletion
- Mean substitution
- Hot deck imputation
- Regression imputation
- Stochastic regression imputation
- Multiple imputation
- So how does mice come up with the imputed values?
- Multiple imputation in practice
- Analysis with unsanitized data
- Checking for out-of-bounds data
- Checking the data type of a column
- Checking for unexpected categories
- Checking for outliers, entry errors, or unlikely data points
- Chaining assertions
- Other messiness
- OpenRefine
- Regular expressions
- tidyr
- Exercises
- Summary
- Chapter 12: Dealing with Large Data
- Wait to optimize
- Using a bigger and faster machine
- Be smart about your code
- Allocation of memory
- Vectorization
- Using optimized packages
- Using another R implementation
- Use parallelization
- Getting started with parallel R
- An example of (some) substance
- Using Rcpp
- Be smarter about your code
- Exercises
- Summary
- Chapter 13: Reproducibility and Best Practices
- R Scripting
- RStudio
- Running R scripts
- An example script
- Scripting and reproducibility
- R projects
- Version control
- Communicating results
- Exercises
- Summary
- Index
System requirements
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.