
R: Data Analysis and Visualization
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Content
- Cover
- TOC
- RefresheR
- Navigating the basics
- Getting help in R
- Vectors
- Functions
- Matrices
- Loading data into R
- Working with packages
- The Shape of Data
- Univariate data
- Frequency distributions
- Central tendency
- Spread
- Populations, samples, and estimation
- Probability distributions
- Visualization methods
- Describing Relationships
- Multivariate data
- Relationships between a categorical and a continuous variable
- Relationships between two categorical variables
- The relationship between two continuous variables
- Visualization methods
- Probability
- Basic probability
- A tale of two interpretations
- Sampling from distributions
- The normal distribution
- Using Data to Reason About the World
- Estimating means
- The sampling distribution
- Interval estimation
- Smaller samples
- Testing Hypotheses
- Null Hypothesis Significance Testing
- Testing the mean of one sample
- Testing two means
- Testing more than two means
- Testing independence of proportions
- What if my assumptions are unfounded?
- Bayesian Methods
- The big idea behind Bayesian analysis
- Choosing a prior
- Who cares about coin flips
- Enter MCMC - stage left
- Using JAGS and runjags
- Fitting distributions the Bayesian way
- The Bayesian independent samples t-test
- Predicting Continuous Variables
- Linear models
- Simple linear regression
- Simple linear regression with a binary predictor
- Multiple regression
- Regression with a non-binary predictor
- Kitchen sink regression
- The bias-variance trade-off
- Linear regression diagnostics
- Advanced topics
- Exercises
- Summary
- Predicting Categorical Variables
- k-Nearest Neighbors
- Logistic regression
- Decision trees
- Random forests
- Choosing a classifier
- Sources of Data
- Relational Databases
- Using JSON
- XML
- Other data formats
- Online repositories
- Dealing with Messy Data
- Analysis with missing data
- Analysis with unsanitized data
- Other messiness
- Dealing with Large Data
- Wait to optimize
- Using a bigger and faster machine
- Be smart about your code
- Using optimized packages
- Using another R implementation
- Use parallelization
- Using Rcpp
- Be smarter about your code
- Reproducibility and Best Practices
- R Scripting
- R projects
- Version control
- Communicating results
- R Graphics
- Base graphics using the default package
- Trellis graphs using lattice
- Graphs inspired by Grammar of Graphics
- Basic Graph Functions
- Introduction
- Creating basic scatter plots
- Creating line graphs
- Creating bar charts
- Creating histograms and density plots
- Creating box plots
- Adjusting x and y axes' limits
- Creating heat maps
- Creating pairs plots
- Creating multiple plot matrix layouts
- Adding and formatting legends
- Creating graphs with maps
- Saving and exporting graphs
- Beyond the Basics - Adjusting Key Parameters
- Introduction
- Setting colors of points, lines, and bars
- Setting plot background colors
- Setting colors for text elements - axis annotations, labels, plot titles, and legends
- Choosing color combinations and palettes
- Setting fonts for annotations and titles
- Choosing plotting point symbol styles and sizes
- Choosing line styles and width
- Choosing box styles
- Adjusting axis annotations and tick marks
- Formatting log axes
- Setting graph margins and dimensions
- Creating Scatter Plots
- Introduction
- Grouping data points within a scatter plot
- Highlighting grouped data points by size and symbol type
- Labeling data points
- Correlation matrix using pairs plots
- Adding error bars
- Using jitter to distinguish closely packed data points
- Adding linear model lines
- Adding nonlinear model curves
- Adding nonparametric model curves with lowess
- Creating three-dimensional scatter plots
- Creating Quantile-Quantile plots
- Displaying the data density on axes
- Creating scatter plots with a smoothed density representation
- Creating Line Graphs and Time Series Charts
- Introduction
- Adding customized legends for multiple-line graphs
- Using margin labels instead of legends for multiple-line graphs
- Adding horizontal and vertical grid lines
- Adding marker lines at specific x and y values using abline
- Creating sparklines
- Plotting functions of a variable in a dataset
- Formatting time series data for plotting
- Plotting the date or time variable on the x axis
- Annotating axis labels in different human-readable time formats
- Adding vertical markers to indicate specific time events
- Plotting data with varying time-averaging periods
- Creating stock charts
- Creating Bar, Dot, and Pie Charts
- Introduction
- Creating bar charts with more than one factor variable
- Creating stacked bar charts
- Adjusting the orientation of bars - horizontal and vertical
- Adjusting bar widths, spacing, colors, and borders
- Displaying values on top of or next to the bars
- Placing labels inside bars
- Creating bar charts with vertical error bars
- Modifying dot charts by grouping variables
- Making better, readable pie charts with clockwise-ordered slices
- Labeling a pie chart with percentage values for each slice
- Adding a legend to a pie chart
- Creating Histograms
- Introduction
- Visualizing distributions as count frequencies or probability densities
- Setting the bin size and the number of breaks
- Adjusting histogram styles - bar colors, borders, and axes
- Overlaying a density line over a histogram
- Multiple histograms along the diagonal of a pairs plot
- Histograms in the margins of line and scatter plots
- Box and Whisker Plots
- Introduction
- Creating box plots with narrow boxes for a small number of variables
- Grouping over a variable
- Varying box widths by the number of observations
- Creating box plots with notches
- Including or excluding outliers
- Creating horizontal box plots
- Changing the box styling
- Adjusting the extent of plot whiskers outside the box
- Showing the number of observations
- Splitting a variable at arbitrary values into subsets
- Creating Heat Maps and Contour Plots
- Introduction
- Creating heat maps of a single Z variable with a scale
- Creating correlation heat maps
- Summarizing multivariate data in a single heat map
- Creating contour plots
- Creating filled contour plots
- Creating three-dimensional surface plots
- Visualizing time series as calendar heat maps
- Creating Maps
- Introduction
- Plotting global data by countries on a world map
- Creating graphs with regional maps
- Plotting data on Google maps
- Creating and reading KML data
- Working with ESRI shapefiles
- Data Visualization Using Lattice
- Introduction
- Creating bar charts
- Creating stacked bar charts
- Creating bar charts to visualize cross-tabulation
- Creating a conditional histogram
- Visualizing distributions through a kernel-density plot
- Creating a normal Q-Q plot
- Visualizing an empirical Cumulative Distribution Function
- Creating a boxplot
- Creating a conditional scatter plot
- Data Visualization Using ggplot2
- Introduction
- Creating bar charts
- Creating multiple bar charts
- Creating a bar chart with error bars
- Visualizing the density of a numeric variable
- Creating a box plot
- Creating a layered plot with a scatter plot and fitted line
- Creating a line chart
- Graph annotation with ggplot
- Inspecting Large Datasets
- Introduction
- Multivariate continuous data visualization
- Multivariate categorical data visualization
- Visualizing mixed data
- Zooming and filtering
- Three-dimensional Visualizations
- Introduction
- Three-dimensional scatter plots
- Three-dimensional scatter plots with a regression plane
- Three-dimensional bar charts
- Three-dimensional density plots
- Finalizing Graphs for Publications and Presentations
- Introduction
- Exporting graphs in high-resolution image formats - PNG, JPEG, BMP, and TIFF
- Exporting graphs in vector formats - SVG, PDF, and PS
- Adding mathematical and scientific notations (typesetting)
- Adding text descriptions to graphs
- Using graph templates
- Choosing font families and styles under Windows, Mac OS X, and Linux
- Choosing fonts for PostScripts and PDFs
- Warming Up
- Big data
- Data source
- Data mining
- Social network mining
- Text mining
- Web data mining
- Why R?
- Statistics
- Machine learning
- Data attributes and description
- Data cleaning
- Data integration
- Data dimension reduction
- Data transformation and discretization
- Visualization of results
- Mining Frequent Patterns, Associations, and Correlations
- An overview of associations and patterns
- Market basket analysis
- Hybrid association rules mining
- Mining sequence dataset
- The R implementation
- High-performance algorithms
- Classification
- Classification
- Generic decision tree induction
- High-value credit card customers classification using ID3
- Web spam detection using C4.5
- Web key resource page judgment using CART
- Trojan traffic identification method and Bayes classification
- Identify spam e-mail and Naïve Bayes classification
- Rule-based classification of player types in computer games and rule-based classification
- Advanced Classification
- Ensemble (EM) methods
- Biological traits and the Bayesian belief network
- Protein classification and the k-Nearest Neighbors algorithm
- Document retrieval and Support Vector Machine
- Classification using frequent patterns
- Classification using the backpropagation algorithm
- Cluster Analysis
- Search engines and the k-means algorithm
- Automatic abstraction of document texts and the k-medoids algorithm
- The CLARA algorithm
- CLARANS
- Unsupervised image categorization and affinity propagation clustering
- News categorization and hierarchical clustering
- Advanced Cluster Analysis
- Customer categorization analysis of e-commerce and DBSCAN
- Clustering web pages and OPTICS
- Visitor analysis in the browser cache and DENCLUE
- Recommendation system and STING
- Web sentiment analysis and CLIQUE
- Opinion mining and WAVE clustering
- User search intent and the EM algorithm
- Customer purchase data analysis and clustering high-dimensional data
- SNS and clustering graph and network data
- Outlier Detection
- Credit card fraud detection and statistical methods
- Activity monitoring - the detection of fraud involving mobile phones and proximity-based methods
- Intrusion detection and density-based methods
- Intrusion detection and clustering-based methods
- Monitoring the performance of the web server and classification-based methods
- Detecting novelty in text, topic detection, and mining contextual outliers
- Collective outliers on spatial data
- Outlier detection in high-dimensional data
- Mining Stream, Time-series, and Sequence Data
- The credit card transaction flow and STREAM algorithm
- Predicting future prices and time-series analysis
- Stock market data and time-series clustering and classification
- Web click streams and mining symbolic sequences
- Mining sequence patterns in transactional databases
- Graph Mining and Network Analysis
- Graph mining
- Mining frequent subgraph patterns
- Social network mining
- Mining Text and Web Data
- Text mining and TM packages
- Text summarization
- The question answering system
- Genre categorization of web pages
- Categorizing newspaper articles and newswires into topics
- Web usage mining with web logs
- Time Series Analysis
- Multivariate time series analysis
- Volatility modeling
- References and reading list
- Factor Models
- Arbitrage pricing theory
- Modeling in R
- References
- Forecasting Volume
- Motivation
- The intensity of trading
- The volume forecasting model
- Implementation in R
- References
- Big Data - Advanced Analytics
- Getting data from open sources
- Introduction to big data analysis in R
- K-means clustering on big data
- Big data linear regression analysis
- References
- FX Derivatives
- Terminology and notations
- Currency options
- Exchange options
- Quanto options
- References
- Interest Rate Derivatives and Models
- The Black model
- The Vasicek model
- The Cox-Ingersoll-Ross model
- Parameter estimation of interest rate models
- Using the SMFI5 package
- References
- Exotic Options
- A general pricing approach
- The role of dynamic hedging
- How R can help a lot
- A glance beyond vanillas
- Greeks - the link back to the vanilla world
- Pricing the Double-no-touch option
- Another way to price the Double-no-touch option
- The life of a Double-no-touch option - a simulation
- Exotic options embedded in structured products
- References
- Optimal Hedging
- Hedging of derivatives
- Hedging in the presence of transaction costs
- Further extensions
- References
- Fundamental Analysis
- The basics of fundamental analysis
- Collecting data
- Revealing connections
- Including multiple variables
- Separating investment targets
- Setting classification rules
- Backtesting
- Industry-specific investment
- References
- Technical Analysis, Neural Networks, and Logoptimal Portfolios
- Market efficiency
- Technical analysis
- Neural networks
- Logoptimal portfolios
- References
- Asset and Liability Management
- Data preparation
- Interest rate risk measurement
- Liquidity risk measurement
- Modeling non-maturity deposits
- References
- Capital Adequacy
- Principles of the Basel Accords
- Risk measures
- Risk categories
- References
- Systemic Risks
- Systemic risk in a nutshell
- The dataset used in our examples
- Core-periphery decomposition
- The simulation method
- Possible interpretations and suggestions
- References
- Introducing Machine Learning
- The origins of machine learning
- Uses and abuses of machine learning
- How machines learn
- Machine learning in practice
- Machine learning with R
- Managing and Understanding Data
- R data structures
- Managing data with R
- Exploring and understanding data
- Lazy Learning - Classification Using Nearest Neighbors
- Understanding nearest neighbor classification
- Example - diagnosing breast cancer with the k-NN algorithm
- Probabilistic Learning - Classification Using Naive Bayes
- Understanding Naive Bayes
- Example - filtering mobile phone spam with the Naive Bayes algorithm
- Divide and Conquer - Classification Using Decision Trees and Rules
- Understanding decision trees
- Example - identifying risky bank loans using C5.0 decision trees
- Understanding classification rules
- Example - identifying poisonous mushrooms with rule learners
- Forecasting Numeric Data - Regression Methods
- Understanding regression
- Example - predicting medical expenses using linear regression
- Understanding regression trees and model trees
- Example - estimating the quality of wines with regression trees and model trees
- Black Box Methods - Neural Networks and Support Vector Machines
- Understanding neural networks
- Example - Modeling the strength of concrete with ANNs
- Understanding Support Vector Machines
- Example - performing OCR with SVMs
- Finding Patterns - Market Basket Analysis Using Association Rules
- Understanding association rules
- Example - identifying frequently purchased groceries with association rules
- Finding Groups of Data - Clustering with k-means
- Understanding clustering
- Example - finding teen market segments using k-means clustering
- Evaluating Model Performance
- Measuring performance for classification
- Estimating future performance
- Improving Model Performance
- Tuning stock models for better performance
- Improving model performance with meta-learning
- Specialized Machine Learning Topics
- Working with proprietary files and databases
- Working with online data and services
- Working with domain-specific data
- Improving the performance of R
- Reflect and Test Yourself Answers
- Module 1: Data Analysis with R
- Module 2: R Graphs
- Module 4: Mastering R for Quantitative Finance
- Module 5: Machine Learning with R
- Bibliography
- _GoBack
- _GoBack
- _GoBack
- _GoBack
- _GoBack
- _GoBack
- _GoBack
- _GoBack
- _GoBack
- _GoBack
- chapter_7
- _GoBack
- _GoBack
- OLE_LINK2
- __DdeLink__1761_1471499930
- _GoBack
- _GoBack
- _GoBack
- _GoBack
- Acerbi
- _GoBack
- _GoBack
- _GoBack
- __DdeLink__2136_2072555086
- __DdeLink__2314_2072555086
- __DdeLink__2316_2072555086
- __DdeLink__2318_2072555086
- __DdeLink__2403_2072555086
- __DdeLink__2405_2072555086
- __DdeLink__2407_2072555086
- __DdeLink__2409_2072555086
- __DdeLink__2497_2072555086
- __DdeLink__2493_2072555086
- _GoBack
- _GoBack
- _GoBack
- _GoBack
- _GoBack
- _GoBack
- _GoBack
System requirements
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.