Mastering Data Analysis with R

Packt Publishing Limited
  • 1. Auflage
  • |
  • erschienen am 30. September 2015
  • |
  • 396 Seiten
E-Book | ePUB mit Adobe DRM | Systemvoraussetzungen
978-1-78398-203-5 (ISBN)
Gain sharp insights into your data and solve real-world data science problems with R-from data munging to modeling and visualizationAbout This BookHandle your data with precision and care for optimal business intelligenceRestructure and transform your data to inform decision-makingPacked with practical advice and tips to help you get to grips with data miningWho This Book Is ForIf you are a data scientist or R developer who wants to explore and optimize your use of R's advanced features and tools, this is the book for you. A basic knowledge of R is required, along with an understanding of database logic.What You Will LearnConnect to and load data from R's range of powerful databasesSuccessfully fetch and parse structured and unstructured dataTransform and restructure your data with efficient R packagesDefine and build complex statistical models with glmDevelop and train machine learning algorithmsVisualize social networks and graph dataDeploy supervised and unsupervised classification algorithmsDiscover how to visualize spatial data with RIn DetailR is an essential language for sharp and successful data analysis. Its numerous features and ease of use make it a powerful way of mining, managing, and interpreting large sets of data. In a world where understanding big data has become key, by mastering R you will be able to deal with your data effectively and efficiently.This book will give you the guidance you need to build and develop your knowledge and expertise. Bridging the gap between theory and practice, this book will help you to understand and use data for a competitive advantage.Beginning with taking you through essential data mining and management tasks such as munging, fetching, cleaning, and restructuring, the book then explores different model designs and the core components of effective analysis. You will then discover how to optimize your use of machine learning algorithms for classification and recommendation systems beside the traditional and more recent statistical methods.Style and approachCovering the essential tasks and skills within data science, Mastering Data Analysis provides you with solutions to the challenges of data science. Each section gives you a theoretical overview before demonstrating how to put the theory to work with real-world use cases and hands-on examples.
  • Englisch
  • Birmingham
  • |
  • Großbritannien
978-1-78398-203-5 (9781783982035)
1783982039 (1783982039)
weitere Ausgaben werden ermittelt
Gergely Daroczi is a former assistant professor of statistics and an enthusiastic R user and package developer. He is the founder and CTO of an R-based reporting web application at and a PhD candidate in sociology. He is currently working as the lead R developer/research data scientist at in Los Angeles.
Besides maintaining around half a dozen R packages, mainly dealing with reporting, Gergely has coauthored the books Introduction to R for Quantitative Finance and Mastering R for Quantitative Finance (both by Packt Publishing) by providing and reviewing the R source code. He has contributed to a number of scientific journal articles, mainly in social sciences but in medical sciences as well.
  • Cover
  • Copyright
  • Credits
  • About the Author
  • About the Reviewers
  • Table of Contents
  • Preface
  • Chapter 1: Hello, Data!
  • Loading text files of a reasonable size
  • Data files larger than the physical memory
  • Benchmarking text file parsers
  • Loading a subset of text files
  • Filtering flat files before loading to R
  • Loading data from databases
  • Setting up the test environment
  • MySQL and MariaDB
  • PostgreSQL
  • Oracle database
  • ODBC database access
  • Using a graphical user interface to connect to databases
  • Other database backends
  • Importing data from other statistical systems
  • Loading Excel spreadsheets
  • Summary
  • Chapter 2: Getting Data from the Web
  • Loading datasets from the Internet
  • Other popular online data formats
  • Reading data from HTML tables
  • Reading tabular data from static Web pages
  • Scraping data from other online sources
  • R packages to interact with data source APIs
  • Socrata Open Data API
  • Finance APIs
  • Fetching time series with Quandl
  • Google documents and analytics
  • Online search trends
  • Historical weather data
  • Other online data sources
  • Summary
  • Chapter 3: Filtering and Summarizing Data
  • Drop needless data
  • Drop needless data in an efficient way
  • Drop needless data in another efficient way
  • Aggregation
  • Quicker aggregation with base R commands
  • Convenient helper functions
  • High-performance helper functions
  • Aggregate with data.table
  • Running benchmarks
  • Summary functions
  • Adding up the number of cases in subgroups
  • Summary
  • Chapter 4: Restructuring Data
  • Transposing matrices
  • Filtering data by string matching
  • Rearranging data
  • dplyr versus data.table
  • Computing new variables
  • Memory profiling
  • Creating multiple variables at a time
  • Computing new variables with dplyr
  • Merging datasets
  • Reshaping data in a flexible way
  • Converting wide tables to the long table format
  • Converting long tables to the wide table format
  • Tweaking performance
  • The evolution of the reshape packages
  • Summary
  • Chapter 5: Building Models (authored by Renata Nemeth and Gergely Toth)
  • The motivation behind multivariate models
  • Linear regression with continuous predictors
  • Model interpretation
  • Multiple predictors
  • Model assumptions
  • How well does the line fit in the data?
  • Discrete predictors
  • Summary
  • Chapter 6: Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth)
  • The modeling workflow
  • Logistic regression
  • Data considerations
  • Goodness of model fit
  • Model comparison
  • Models for count data
  • Poisson regression
  • Negative binomial regression
  • Multivariate non-linear models
  • Summary
  • Chapter 7: Unstructured Data
  • Importing the corpus
  • Cleaning the corpus
  • Visualizing the most frequent words in the corpus
  • Further cleanup
  • Stemming words
  • Lemmatisation
  • Analyzing the associations among terms
  • Some other metrics
  • The segmentation of documents
  • Summary
  • Chapter 8: Polishing Data
  • The types and origins of missing data
  • Identifying missing data
  • By-passing missing values
  • Overriding the default arguments of a function
  • Getting rid of missing data
  • Filtering missing data before or during the actual analysis
  • Data imputation
  • Modeling missing values
  • Comparing different imputation methods
  • Not imputing missing values
  • Multiple imputation
  • Extreme values and outliers
  • Testing extreme values
  • Using robust methods
  • Summary
  • Chapter 9: From Big to Small Data
  • Adequacy tests
  • Normality
  • Multivariate normality
  • Dependence of variables
  • KMO and Barlett's test
  • Principal Component Analysis
  • PCA algorithms
  • Determining the number of components
  • Interpreting components
  • Rotation methods
  • Outlier-detection with PCA
  • Factor analysis
  • Principal Component Analysis versus Factor Analysis
  • Multidimensional Scaling
  • Summary
  • Chapter 10: Classification and Clustering
  • Cluster analysis
  • Hierarchical clustering
  • Determining the ideal number of clusters
  • K-means clustering
  • Visualizing clusters
  • Latent class models
  • Latent Class Analysis
  • LCR models
  • Discriminant analysis
  • Logistic regression
  • Machine learning algorithms
  • The K-Nearest Neighbors algorithm
  • Classification trees
  • Random forest
  • Other algorithms
  • Summary
  • Chapter 11: Social Network Analysis of the R Ecosystem
  • Loading network data
  • Centrality measures of networks
  • Visualizing network data
  • Interactive network plots
  • Custom plot layouts
  • Analyzing R package dependencies with an R package
  • Further network analysis resources
  • Summary
  • Chapter 12: Analyzing Time-series
  • Creating time-series objects
  • Visualizing time-series
  • Seasonal decomposition
  • Holt-Winters filtering
  • Autoregressive Integrated Moving Average models
  • Outlier detection
  • More complex time-series objects
  • Advanced time-series analysis
  • Summary
  • Chapter 13: Data Around Us
  • Geocoding
  • Visualizing point data in space
  • Finding polygon overlays of point data
  • Plotting thematic maps
  • Rendering polygons around points
  • Contour lines
  • Voronoi diagrams
  • Satellite maps
  • Interactive maps
  • Querying Google Maps
  • JavaScript mapping libraries
  • Alternative map designs
  • Spatial statistics
  • Summary
  • Chapter 14: Analyzing the R Community
  • R Foundation members
  • Visualizing supporting members around the world
  • R package maintainers
  • The number of packages per maintainer
  • The R-help mailing list
  • Volume of the R-help mailing list
  • Forecasting the e-mail volume in the future
  • Analyzing overlaps between our lists of R users
  • Further ideas on extending the capture-recapture models
  • The number of R users in social media
  • R-related posts in social media
  • Summary
  • Appendix: References
  • Index

Dateiformat: EPUB
Kopierschutz: Adobe-DRM (Digital Rights Management)


Computer (Windows; MacOS X; Linux): Installieren Sie bereits vor dem Download die kostenlose Software Adobe Digital Editions (siehe E-Book Hilfe).

Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions (siehe E-Book Hilfe).

E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)

Das Dateiformat EPUB ist sehr gut für Romane und Sachbücher geeignet - also für "fließenden" Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein "harter" Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.

Weitere Informationen finden Sie in unserer E-Book Hilfe.

Download (sofort verfügbar)

43,65 €
inkl. 19% MwSt.
Download / Einzel-Lizenz
ePUB mit Adobe DRM
siehe Systemvoraussetzungen
E-Book bestellen

Unsere Web-Seiten verwenden Cookies. Mit der Nutzung dieser Web-Seiten erklären Sie sich damit einverstanden. Mehr Informationen finden Sie in unserem Datenschutzhinweis. Ok