R: Data Analysis and Visualization

Name: R: Data Analysis and Visualization
Brand: Packt Publishing
Price: 60.49 EUR
Availability: OnlineOnly

Tony Fischetti(Author)

Packt Publishing

Published on 24. June 2016

1783 pages

E-Book

PDF with Adobe-DRM

System requirements

978-1-78646-048-6 (ISBN)

€60.49incl. 7% vat

System requirements

for PDF with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Master the art of building analytical models using RAbout This BookLoad, wrangle, and analyze your data using the world's most powerful statistical programming languageBuild and customize publication-quality visualizations of powerful and stunning R graphsDevelop key skills and techniques with R to create and customize data mining algorithmsUse R to optimize your trading strategy and build up your own risk management systemDiscover how to build machine learning algorithms, prepare data, and dig deep into data prediction techniques with RWho This Book Is ForThis course is for data scientist or quantitative analyst who are looking at learning R and take advantage of its powerful analytical design framework. It's a seamless journey in becoming a full-stack R developerWhat You Will LearnDescribe and visualize the behavior of data and relationships between dataGain a thorough understanding of statistical reasoning and samplingHandle missing data gracefully using multiple imputationCreate diverse types of bar charts using the default R functionsProduce and customize density plots and histograms with lattice and ggplot2Get to know the top classification algorithms written in RFamiliarize yourself with algorithms written in R for spatial data mining, text mining, and so onUnderstand relationships between market factors and their impact on your portfolioHarness the power of R to build machine learning algorithms with real-world data science applicationsLearn specialized machine learning techniques for text mining, big data, and moreIn DetailThe R learning path created for you has five connected modules,which are a mini-course in their own right.As you complete each one, you'll have gained key skills and be ready for the material in the next module!This course begins by looking at the Data Analysis with R module. This will help you navigate the R environment. You'll gain a thorough understanding of statistical reasoning and sampling. Finally, you'll be able to put best practices into effect to make your job easier and facilitate reproducibility.The second place to explore is R Graphs,which will help you leverage powerful default R graphics and utilize advanced graphics systems such as lattice and ggplot2, the grammar of graphics. Through inspecting large datasets using tableplot and stunning 3D visualizations, you will know how to produce, customize, and publish advanced visualizations using this popular and powerful framework.With the third module, Learning Data Mining with R, you will learn how to manipulate data with R using code snippets and be introduced to mining frequent patterns, association, and correlations while working with R programs. You will finish this module feeling confident in your ability to know which data mining algorithm to apply in any situation.The Mastering R for Quantitative Finance module pragmatically introduces both the quantitative finance concepts and their modeling in R, enabling you to build a tailor-made trading system on your own. By the end of the module, you will be well-versed with various financial techniques using R and will be able to place good bets while making financial decisions.Finally, we'll look at the Machine Learning with R module. With this module, you'll discover all the analytical tools you need to gain insights from complex data and learn how to choose the correct algorithm for your specific needs. You'll also learn to apply machine learning methods to deal with common tasks, including classification, prediction, forecasting, market analysis, and clustering.Style and approachLearn data analysis, data visualization techniques, data mining, and machine learning all using R and also learn to build models in quantitative finance using this powerful language

More details

Content

Cover
TOC
RefresheR
Navigating the basics
Getting help in R
Vectors
Functions
Matrices
Loading data into R
Working with packages
The Shape of Data
Univariate data
Frequency distributions
Central tendency
Spread
Populations, samples, and estimation
Probability distributions
Visualization methods
Describing Relationships
Multivariate data
Relationships between a categorical and a continuous variable
Relationships between two categorical variables
The relationship between two continuous variables
Visualization methods
Probability
Basic probability
A tale of two interpretations
Sampling from distributions
The normal distribution
Using Data to Reason About the World
Estimating means
The sampling distribution
Interval estimation
Smaller samples
Testing Hypotheses
Null Hypothesis Significance Testing
Testing the mean of one sample
Testing two means
Testing more than two means
Testing independence of proportions
What if my assumptions are unfounded?
Bayesian Methods
The big idea behind Bayesian analysis
Choosing a prior
Who cares about coin flips
Enter MCMC - stage left
Using JAGS and runjags
Fitting distributions the Bayesian way
The Bayesian independent samples t-test
Predicting Continuous Variables
Linear models
Simple linear regression
Simple linear regression with a binary predictor
Multiple regression
Regression with a non-binary predictor
Kitchen sink regression
The bias-variance trade-off
Linear regression diagnostics
Advanced topics
Exercises
Summary
Predicting Categorical Variables
k-Nearest Neighbors
Logistic regression
Decision trees
Random forests
Choosing a classifier
Sources of Data
Relational Databases
Using JSON
XML
Other data formats
Online repositories
Dealing with Messy Data
Analysis with missing data
Analysis with unsanitized data
Other messiness
Dealing with Large Data
Wait to optimize
Using a bigger and faster machine
Be smart about your code
Using optimized packages
Using another R implementation
Use parallelization
Using Rcpp
Be smarter about your code
Reproducibility and Best Practices
R Scripting
R projects
Version control
Communicating results
R Graphics
Base graphics using the default package
Trellis graphs using lattice
Graphs inspired by Grammar of Graphics
Basic Graph Functions
Introduction
Creating basic scatter plots
Creating line graphs
Creating bar charts
Creating histograms and density plots
Creating box plots
Adjusting x and y axes' limits
Creating heat maps
Creating pairs plots
Creating multiple plot matrix layouts
Adding and formatting legends
Creating graphs with maps
Saving and exporting graphs
Beyond the Basics - Adjusting Key Parameters
Introduction
Setting colors of points, lines, and bars
Setting plot background colors
Setting colors for text elements - axis annotations, labels, plot titles, and legends
Choosing color combinations and palettes
Setting fonts for annotations and titles
Choosing plotting point symbol styles and sizes
Choosing line styles and width
Choosing box styles
Adjusting axis annotations and tick marks
Formatting log axes
Setting graph margins and dimensions
Creating Scatter Plots
Introduction
Grouping data points within a scatter plot
Highlighting grouped data points by size and symbol type
Labeling data points
Correlation matrix using pairs plots
Adding error bars
Using jitter to distinguish closely packed data points
Adding linear model lines
Adding nonlinear model curves
Adding nonparametric model curves with lowess
Creating three-dimensional scatter plots
Creating Quantile-Quantile plots
Displaying the data density on axes
Creating scatter plots with a smoothed density representation
Creating Line Graphs and Time Series Charts
Introduction
Adding customized legends for multiple-line graphs
Using margin labels instead of legends for multiple-line graphs
Adding horizontal and vertical grid lines
Adding marker lines at specific x and y values using abline
Creating sparklines
Plotting functions of a variable in a dataset
Formatting time series data for plotting
Plotting the date or time variable on the x axis
Annotating axis labels in different human-readable time formats
Adding vertical markers to indicate specific time events
Plotting data with varying time-averaging periods
Creating stock charts
Creating Bar, Dot, and Pie Charts
Introduction
Creating bar charts with more than one factor variable
Creating stacked bar charts
Adjusting the orientation of bars - horizontal and vertical
Adjusting bar widths, spacing, colors, and borders
Displaying values on top of or next to the bars
Placing labels inside bars
Creating bar charts with vertical error bars
Modifying dot charts by grouping variables
Making better, readable pie charts with clockwise-ordered slices
Labeling a pie chart with percentage values for each slice
Adding a legend to a pie chart
Creating Histograms
Introduction
Visualizing distributions as count frequencies or probability densities
Setting the bin size and the number of breaks
Adjusting histogram styles - bar colors, borders, and axes
Overlaying a density line over a histogram
Multiple histograms along the diagonal of a pairs plot
Histograms in the margins of line and scatter plots
Box and Whisker Plots
Introduction
Creating box plots with narrow boxes for a small number of variables
Grouping over a variable
Varying box widths by the number of observations
Creating box plots with notches
Including or excluding outliers
Creating horizontal box plots
Changing the box styling
Adjusting the extent of plot whiskers outside the box
Showing the number of observations
Splitting a variable at arbitrary values into subsets
Creating Heat Maps and Contour Plots
Introduction
Creating heat maps of a single Z variable with a scale
Creating correlation heat maps
Summarizing multivariate data in a single heat map
Creating contour plots
Creating filled contour plots
Creating three-dimensional surface plots
Visualizing time series as calendar heat maps
Creating Maps
Introduction
Plotting global data by countries on a world map
Creating graphs with regional maps
Plotting data on Google maps
Creating and reading KML data
Working with ESRI shapefiles
Data Visualization Using Lattice
Introduction
Creating bar charts
Creating stacked bar charts
Creating bar charts to visualize cross-tabulation
Creating a conditional histogram
Visualizing distributions through a kernel-density plot
Creating a normal Q-Q plot
Visualizing an empirical Cumulative Distribution Function
Creating a boxplot
Creating a conditional scatter plot
Data Visualization Using ggplot2
Introduction
Creating bar charts
Creating multiple bar charts
Creating a bar chart with error bars
Visualizing the density of a numeric variable
Creating a box plot
Creating a layered plot with a scatter plot and fitted line
Creating a line chart
Graph annotation with ggplot
Inspecting Large Datasets
Introduction
Multivariate continuous data visualization
Multivariate categorical data visualization
Visualizing mixed data
Zooming and filtering
Three-dimensional Visualizations
Introduction
Three-dimensional scatter plots
Three-dimensional scatter plots with a regression plane
Three-dimensional bar charts
Three-dimensional density plots
Finalizing Graphs for Publications and Presentations
Introduction
Exporting graphs in high-resolution image formats - PNG, JPEG, BMP, and TIFF
Exporting graphs in vector formats - SVG, PDF, and PS
Adding mathematical and scientific notations (typesetting)
Adding text descriptions to graphs
Using graph templates
Choosing font families and styles under Windows, Mac OS X, and Linux
Choosing fonts for PostScripts and PDFs
Warming Up
Big data
Data source
Data mining
Social network mining
Text mining
Web data mining
Why R?
Statistics
Machine learning
Data attributes and description
Data cleaning
Data integration
Data dimension reduction
Data transformation and discretization
Visualization of results
Mining Frequent Patterns, Associations, and Correlations
An overview of associations and patterns
Market basket analysis
Hybrid association rules mining
Mining sequence dataset
The R implementation
High-performance algorithms
Classification
Classification
Generic decision tree induction
High-value credit card customers classification using ID3
Web spam detection using C4.5
Web key resource page judgment using CART
Trojan traffic identification method and Bayes classification
Identify spam e-mail and Naïve Bayes classification
Rule-based classification of player types in computer games and rule-based classification
Advanced Classification
Ensemble (EM) methods
Biological traits and the Bayesian belief network
Protein classification and the k-Nearest Neighbors algorithm
Document retrieval and Support Vector Machine
Classification using frequent patterns
Classification using the backpropagation algorithm
Cluster Analysis
Search engines and the k-means algorithm
Automatic abstraction of document texts and the k-medoids algorithm
The CLARA algorithm
CLARANS
Unsupervised image categorization and affinity propagation clustering
News categorization and hierarchical clustering
Advanced Cluster Analysis
Customer categorization analysis of e-commerce and DBSCAN
Clustering web pages and OPTICS
Visitor analysis in the browser cache and DENCLUE
Recommendation system and STING
Web sentiment analysis and CLIQUE
Opinion mining and WAVE clustering
User search intent and the EM algorithm
Customer purchase data analysis and clustering high-dimensional data
SNS and clustering graph and network data
Outlier Detection
Credit card fraud detection and statistical methods
Activity monitoring - the detection of fraud involving mobile phones and proximity-based methods
Intrusion detection and density-based methods
Intrusion detection and clustering-based methods
Monitoring the performance of the web server and classification-based methods
Detecting novelty in text, topic detection, and mining contextual outliers
Collective outliers on spatial data
Outlier detection in high-dimensional data
Mining Stream, Time-series, and Sequence Data
The credit card transaction flow and STREAM algorithm
Predicting future prices and time-series analysis
Stock market data and time-series clustering and classification
Web click streams and mining symbolic sequences
Mining sequence patterns in transactional databases
Graph Mining and Network Analysis
Graph mining
Mining frequent subgraph patterns
Social network mining
Mining Text and Web Data
Text mining and TM packages
Text summarization
The question answering system
Genre categorization of web pages
Categorizing newspaper articles and newswires into topics
Web usage mining with web logs
Time Series Analysis
Multivariate time series analysis
Volatility modeling
References and reading list
Factor Models
Arbitrage pricing theory
Modeling in R
References
Forecasting Volume
Motivation
The intensity of trading
The volume forecasting model
Implementation in R
References
Big Data - Advanced Analytics
Getting data from open sources
Introduction to big data analysis in R
K-means clustering on big data
Big data linear regression analysis
References
FX Derivatives
Terminology and notations
Currency options
Exchange options
Quanto options
References
Interest Rate Derivatives and Models
The Black model
The Vasicek model
The Cox-Ingersoll-Ross model
Parameter estimation of interest rate models
Using the SMFI5 package
References
Exotic Options
A general pricing approach
The role of dynamic hedging
How R can help a lot
A glance beyond vanillas
Greeks - the link back to the vanilla world
Pricing the Double-no-touch option
Another way to price the Double-no-touch option
The life of a Double-no-touch option - a simulation
Exotic options embedded in structured products
References
Optimal Hedging
Hedging of derivatives
Hedging in the presence of transaction costs
Further extensions
References
Fundamental Analysis
The basics of fundamental analysis
Collecting data
Revealing connections
Including multiple variables
Separating investment targets
Setting classification rules
Backtesting
Industry-specific investment
References
Technical Analysis, Neural Networks, and Logoptimal Portfolios
Market efficiency
Technical analysis
Neural networks
Logoptimal portfolios
References
Asset and Liability Management
Data preparation
Interest rate risk measurement
Liquidity risk measurement
Modeling non-maturity deposits
References
Capital Adequacy
Principles of the Basel Accords
Risk measures
Risk categories
References
Systemic Risks
Systemic risk in a nutshell
The dataset used in our examples
Core-periphery decomposition
The simulation method
Possible interpretations and suggestions
References
Introducing Machine Learning
The origins of machine learning
Uses and abuses of machine learning
How machines learn
Machine learning in practice
Machine learning with R
Managing and Understanding Data
R data structures
Managing data with R
Exploring and understanding data
Lazy Learning - Classification Using Nearest Neighbors
Understanding nearest neighbor classification
Example - diagnosing breast cancer with the k-NN algorithm
Probabilistic Learning - Classification Using Naive Bayes
Understanding Naive Bayes
Example - filtering mobile phone spam with the Naive Bayes algorithm
Divide and Conquer - Classification Using Decision Trees and Rules
Understanding decision trees
Example - identifying risky bank loans using C5.0 decision trees
Understanding classification rules
Example - identifying poisonous mushrooms with rule learners
Forecasting Numeric Data - Regression Methods
Understanding regression
Example - predicting medical expenses using linear regression
Understanding regression trees and model trees
Example - estimating the quality of wines with regression trees and model trees
Black Box Methods - Neural Networks and Support Vector Machines
Understanding neural networks
Example - Modeling the strength of concrete with ANNs
Understanding Support Vector Machines
Example - performing OCR with SVMs
Finding Patterns - Market Basket Analysis Using Association Rules
Understanding association rules
Example - identifying frequently purchased groceries with association rules
Finding Groups of Data - Clustering with k-means
Understanding clustering
Example - finding teen market segments using k-means clustering
Evaluating Model Performance
Measuring performance for classification
Estimating future performance
Improving Model Performance
Tuning stock models for better performance
Improving model performance with meta-learning
Specialized Machine Learning Topics
Working with proprietary files and databases
Working with online data and services
Working with domain-specific data
Improving the performance of R
Reflect and Test Yourself Answers
Module 1: Data Analysis with R
Module 2: R Graphs
Module 4: Mastering R for Quantitative Finance
Module 5: Machine Learning with R
Bibliography
_GoBack
_GoBack
_GoBack
_GoBack
_GoBack
_GoBack
_GoBack
_GoBack
_GoBack
_GoBack
chapter_7
_GoBack
_GoBack
OLE_LINK2
__DdeLink__1761_1471499930
_GoBack
_GoBack
_GoBack
_GoBack
Acerbi
_GoBack
_GoBack
_GoBack
__DdeLink__2136_2072555086
__DdeLink__2314_2072555086
__DdeLink__2316_2072555086
__DdeLink__2318_2072555086
__DdeLink__2403_2072555086
__DdeLink__2405_2072555086
__DdeLink__2407_2072555086
__DdeLink__2409_2072555086
__DdeLink__2497_2072555086
__DdeLink__2493_2072555086
_GoBack
_GoBack
_GoBack
_GoBack
_GoBack
_GoBack
_GoBack

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

R: Data Analysis and Visualization

Description

More details

Content

System requirements