
R All-in-One For Dummies
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
With R All-in-One For Dummies, you get five mini-books in one, offering a complete and thorough resource on the R programming language and a road map for making sense of the sea of data we're all swimming in. Maybe you're pursuing a career in data science, maybe you're looking to infuse a little statistics know-how into your existing career, or maybe you're just R-curious. This book has your back. Along with providing an overview of coding in R and how to work with the language, this book delves into the types of projects and applications R programmers tend to tackle the most. You'll find coverage of statistical analysis, machine learning, and data management with R.
* Grasp the basics of the R programming language and write your first lines of code
* Understand how R programmers use code to analyze data and perform statistical analysis
* Use R to create data visualizations and machine learning programs
* Work through sample projects to hone your R coding skill
This is an excellent all-in-one resource for beginning coders who'd like to move into the data space by knowing more about R.
More details
Other editions
Additional editions

Person
Content
Book 1: Introducing R 5
Chapter 1: R: What It Does and How It Does It 7
Chapter 2: Working with Packages, Importing, and Exporting 37
Book 2: Describing Data 51
Chapter 1: Getting Graphic 53
Chapter 2: Finding Your Center 93
Chapter 3: Deviating from the Average 103
Chapter 4: Meeting Standards and Standings 113
Chapter 5: Summarizing It All 125
Chapter 6: What's Normal? 145
Book 3: Analyzing Data 163
Chapter 1: The Confidence Game: Estimation 165
Chapter 2: One-Sample Hypothesis Testing 181
Chapter 3: Two-Sample Hypothesis Testing 207
Chapter 4: Testing More than Two Samples 233
Chapter 5: More Complicated Testing 257
Chapter 6: Regression: Linear, Multiple, and the General Linear Model 279
Chapter 7: Correlation: The Rise and Fall of Relationships 315
Chapter 8: Curvilinear Regression: When Relationships Get Complicated 335
Chapter 9: In Due Time 359
Chapter 10: Non-Parametric Statistics 371
Chapter 11: Introducing Probability 393
Chapter 12: Probability Meets Regression: Logistic Regression 415
Book 4: Learning from Data 423
Chapter 1: Tools and Data for Machine Learning Projects 425
Chapter 2: Decisions, Decisions, Decisions 449
Chapter 3: Into the Forest, Randomly 467
Chapter 4: Support Your Local Vector 483
Chapter 5: K-Means Clustering 503
Chapter 6: Neural Networks 519
Chapter 7: Exploring Marketing 537
Chapter 8: From the City That Never Sleeps 557
Book 5: Harnessing R: Some Projects to Keep You Busy 573
Chapter 1: Working with a Browser 575
Chapter 2: Dashboards -- How Dashing! 603
Index 639
Chapter 1
R: What It Does and How It Does It
IN THIS CHAPTER
Introducing statistics
Getting R and RStudio on your computer
Starting a session with R
Working with R functions
Working with R structures
So you're ready to journey into the wonderful world of R! Designed by and for statisticians and data scientists, R has a short but illustrious history.
In the 1990s, Ross Ihaka and Robert Gentleman developed R at the University of Auckland, New Zealand. The R Core Team and the R Foundation for Statistical Computing support R, which has a huge worldwide user base.
Before I tell you about R, however, I have to introduce you to the world that R lives in - the world of data and statistics.
The Statistical (and Related) Ideas You Just Have to Know
The analytical tools that R provides are based on statistical concepts I help you explore in this section. As you'll see, these concepts are based on common sense.
Samples and populations
If you watch TV on election night, you know that one of the main events is the prediction of the outcome immediately after the polls close (and before all the votes are counted). How is it that pundits almost always get it right?
The idea is to talk to a sample of voters right after they vote. If they're truthful about how they marked their ballots, and if the sample is representative of the population of voters, analysts can use the sample data to draw conclusions about the population.
That, in a nutshell, is what statistics is all about - using the data from samples to draw conclusions about populations.
Here's another example. Imagine that your job is to find the average height of 10-year-old children in the United States. Because you probably wouldn't have the time or the resources to measure every child, you'd measure the heights of a representative sample. Then you'd average those heights and use that average as the estimate of the population average.
Estimating the population average is one kind of inference that statisticians make from sample data. I discuss inference in more detail in the later section "Inferential Statistics: Testing Hypotheses."
Here's some important terminology: Properties of a population (like the population average) are called parameters, and properties of a sample (like the sample average) are called statistics. If your only concern is the sample properties (like the heights of the children in your sample), the statistics you calculate are descriptive. (I discuss descriptive statistics in Book 2.) If you're concerned about estimating the population properties, your statistics are inferential. (I discuss inferential statistics in Book 3.)
Now for an important convention about notation: Statisticians use Greek letters (µ, s, ?) to stand for parameters, and English letters (, s, r) to stand for statistics. Figure 1-1 summarizes the relationship between populations and samples, and between parameters and statistics.
FIGURE 1-1: The relationship between populations, samples, parameters, and statistics.
Variables: Dependent and independent
A variable is something that can take on more than one value - like your age, the value of the dollar against other currencies, or the number of games your favorite sports team wins. Something that can have only one value is a constant. Scientists tell us that the speed of light is a constant, and we use the constant p to calculate the area of a circle.
Statisticians work with independent variables and dependent variables. In any study or experiment, you'll find both kinds. Statisticians assess the relationship between them.
For example, imagine a computerized training method designed to increase a person's IQ. How would a researcher find out whether this method does what it's supposed to do? First, the researcher would randomly assign a sample of people to one of two groups. One group would receive the training method, and the other would complete another kind of computer-based activity - like reading text on a website. Before and after each group completes its activities, the researcher measures each person's IQ. What happens next? I discuss that topic in the later section "Inferential Statistics: Testing Hypotheses."
For now, understand that the independent variable here is Type of Activity. The two possible values of this variable are IQ Training and Reading Text. The dependent variable is the change in IQ from Before to After.
A dependent variable is what a researcher measures. In an experiment, an independent variable is what a researcher manipulates. In other contexts, a researcher can't manipulate an independent variable. Instead, they note naturally occurring values of the independent variable and how they affect a dependent variable.
In general, the objective is to find out whether changes in an independent variable are associated with changes in a dependent variable.
In examples that appear throughout this book, I show you how to use R to calculate characteristics of groups of scores, or to compare groups of scores. Whenever I show you a group of scores, I'm talking about the values of a dependent variable.
Types of data
When you do statistical work, you can run into four kinds of data. And when you work with a variable, the way you work with it depends on what kind of data it is:
The first kind is nominal data. If a set of numbers happens to be nominal data, the numbers are labels - their values don't signify anything. On a sports team, the jersey numbers are nominal. They just identify the players.
The next kind is ordinal data. In this data type, the numbers are more than just labels. As the name ordinal might tell you, the order of the numbers is important. If I ask you to rank ten foods from the one you like best (1) to the one you like least (10), we'd have a set of ordinal data.
But the difference between your third-favorite food and your fourth-favorite food might not be the same as the difference between your ninth-favorite and your tenth-favorite. So this type of data lacks equal intervals and equal differences.
Interval data gives us equal differences. The Fahrenheit scale of temperature is a good example. The difference between 30o and 40o is the same as the difference between 90o and 100o. So each degree is an interval.
People are sometimes surprised to find out that on the Fahrenheit scale a temperature of 80o is not twice as hot as 40o. For ratio statements ("twice as much as," "half as much as") to make sense, zero has to mean the complete absence of the thing you're measuring. A temperature of 0o F doesn't mean the complete absence of heat - it's just an arbitrary point on the Fahrenheit scale. (The same holds true for Celsius.)
The fourth kind of data, ratio, provides a meaningful zero point. On the Kelvin scale of temperature, zero means absolute zero, where all molecular motion (the basis of heat) stops. So 200o Kelvin is twice as hot as 100o Kelvin. Another example is length. Eight inches is twice as long as 4 inches. Zero inches means a complete absence of length.
An independent variable or a dependent variable can be either nominal, ordinal, interval, or ratio. The analytical tools you use depend on the type of data you work with.
A little probability
When statisticians make decisions, they use probability to express their confidence about those decisions. They can never be absolutely certain about what they decide. They can tell you only how probable their conclusions are.
What do we mean by probability? Mathematicians and philosophers might give you complex definitions. In my experience, however, the best way to understand probability is in terms of examples.
Here's a simple example: If you toss a coin, what's the probability that it turns up heads? If the coin is fair, you might figure that you have a 50-50 chance of heads and a 50-50 chance of tails. And you'd be right. In terms of the kinds of numbers associated with probability, that's ½.
Think about rolling a fair die (one member of a pair of dice). What's the probability that you roll a 4? Well, a die has six faces and one of them is 4, so that's ?.
Still another example: Select one card at random from a standard deck of 52 cards. What's the probability that it's a diamond? A deck of cards has four suits, so that's ¼.
These examples tell you that if you want to know the probability that an event occurs, count how many ways that event can happen and divide by the total number of events that can happen. In the first two examples (heads, 4), the event you're interested in happens only one way. For the coin, we divide 1 by 2. For the die, we divide 1 by 6. In the third example (diamond), the event can happen 13 ways (Ace through King), so we divide 13 by 52 (to get ¼).
Now for a slightly more complicated example. Toss a coin and roll a die at the same time. What's the probability of tails and...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.