Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
Simplify stats and learn how to graph, analyze, and interpret data the easy way
Statistical Analysis with R For Dummies makes stats approachable by combining clear explanations with practical applications. You'll learn how to download and use R and RStudio-two free, open-source tools-to learn statistics concepts, create graphs, test hypotheses, and draw meaningful conclusions. Get started by learning the basics of statistics and R, calculate descriptive statistics, and use inferential statistics to test hypotheses. Then, visualize it all with graphs and charts. This Dummies guide is your well-marked path to sailing through statistics.
This is the perfect introduction to R for students, professionals, and the stat-curious.
Joseph Schmuller is a cognitive scientist and statistical analyst who creates online learning tools and books on data science. He is the author of R All-in One For Dummies, all five editions of Statistical Analysis with Excel For Dummies, Statistical Analysis with R For Dummies, and R Projects For Dummies, among others.
Introduction 1
Part 1: Getting Started with Statistical Analysis with R. 7
Chapter 1: Data, Statistics, and Decisions 9
Chapter 2: R: What It Does and How It Does It 17
Part 2: Describing Data 47
Chapter 3: Getting Graphic 49
Chapter 4: Finding Your Center 79
Chapter 5: Deviating from the Average 91
Chapter 6: Meeting Standards and Standings 101
Chapter 7: Summarizing It All 113
Chapter 8: What's Normal? 133
Part 3: Drawing Conclusions from Data 153
Chapter 9: The Confidence Game: Estimation 155
Chapter 10: One-Sample Hypothesis Testing 171
Chapter 11: Two-Sample Hypothesis Testing 197
Chapter 12: Testing More than Two Samples 223
Chapter 13: More Complicated Testing 249
Chapter 14: Regression: Linear, Multiple, and the General Linear Model 273
Chapter 15: Correlation: The Rise and Fall of Relationships 311
Chapter 16: Curvilinear Regression: When Relationships Get Complicated 333
Part 4: Working with Probability 357
Chapter 17: Introducing Probability 359
Chapter 18: Introducing Modeling 381
Chapter 19: Probability Meets Regression: Logistic Regression 403
Part 5: The Part of Tens 413
Chapter 20: Ten Tips for Excel Émigrés 415
Chapter 21: Ten Valuable Online R Resources 429
Index 433
Chapter 1
IN THIS CHAPTER
Introducing statistical concepts
Generalizing from samples to populations
Getting into probability
Testing hypotheses
Two types of error
Statistics? That's all about crunching numbers into arcane-looking formulas, right? Not really. Statistics, first and foremost, is about decision-making. Some number-crunching is involved, of course, but the primary goal is to use numbers to make decisions. Statisticians look at data and wonder what the numbers are saying. What kinds of trends are in the data? What kinds of predictions are possible? What conclusions can you make?
To make sense of data and answer these questions, statisticians have developed a wide variety of analytical tools.
About the number-crunching part: If you had to do it via pencil-and-paper (or with the aid of a pocket calculator), you'd soon grow discouraged with the amount of computation involved and the errors that might creep in. Software like R helps you crunch the data and compute the numbers. As a bonus, R can also help you comprehend statistical concepts.
Developed specifically for statistical analysis, R is a computer language that implements many of the analytical tools statisticians have developed for decision-making. I wrote this book to show you how to use these tools in your work.
The analytical tools that R provides are based on statistical concepts I help you explore in the remainder of this chapter. As you'll see, these concepts are based on common sense.
If you watch TV on election night, you know that one exciting occurrence that takes place before the main event is the prediction of the outcome immediately after the polls close (and before all the votes are counted). How is it that pundits almost always get it right?
The idea is to talk to a sample of voters right after they vote. If they're truthful about how they marked their ballots, and if the sample is representative of the population of voters, analysts can use the sample data to draw conclusions about the population.
That, in a nutshell, is what statistics is all about - using the data from samples to draw conclusions about populations.
Here's another example. Imagine that your job is to find the average height of 10-year-old children in the United States. Because you probably wouldn't have the time or the resources to measure every child, you'd measure the heights of a representative sample. Then you'd average those heights and use that average as the estimate of the population average.
Estimating the population average is one kind of inference that statisticians make from sample data. I discuss inference in more detail in the upcoming section "Inferential Statistics: Testing Hypotheses."
Here's some important terminology: Properties of a population (like the population average) are called parameters, and properties of a sample (like the sample average) are called statistics. If your only concern is the sample properties (like the heights of the children in your sample), the statistics you calculate are descriptive. If you're concerned about estimating the population properties, your statistics are inferential.
Now for an important convention about notation: Statisticians use Greek letters (µ, s, ?) to stand for parameters, and English letters (, s, r) to stand for statistics. Figure 1-1 summarizes the relationship between populations and samples, and between parameters and statistics.
FIGURE 1-1: The relationship between populations, samples, parameters, and statistics.
A variable is something that can take on more than one value - like your age, the value of the dollar against other currencies, or the number of games your favorite sports team wins. Something that can have only one value is a constant. Scientists tell us that the speed of light is a constant, and we use the constant p to calculate the area of a circle.
Statisticians work with independent variables and dependent variables. In any study or experiment, you'll find both kinds. Statisticians assess the relationship between them.
Imagine a computerized training method designed to increase a person's IQ. How would a researcher find out whether this method does what it's supposed to do? First, that person would randomly assign a sample of people to one of two groups. One group would receive the training method, and the other would complete another kind of computer-based activity - like reading text on a website. Before and after each group completes its activities, the researcher measures each person's IQ. What happens next? I discuss that topic in the upcoming section "Inferential Statistics: Testing Hypotheses."
For now, understand that the independent variable here is Type of Activity. The two possible values of this variable are IQ Training and Reading Text. The dependent variable is the change in IQ from Before to After.
A dependent variable is what a researcher measures. In an experiment, an independent variable is what a researcher manipulates. In other contexts, a researcher can't manipulate an independent variable. Instead, they note naturally occurring values of the independent variable and how they affect a dependent variable.
In general, the objective is to find out whether changes in an independent variable are associated with changes in a dependent variable.
In the examples that appear throughout this book, I show you how to use R to calculate characteristics of groups of scores, or to compare groups of scores. Whenever I show you a group of scores, I'm talking about the values of a dependent variable.
When you do statistical work, you can run into four kinds of data. And when you work with a variable, the way you work with it depends on what kind of data it is. The first kind is nominal data. If a set of numbers happens to be nominal data, the numbers are labels - their values don't signify anything. On a sports team, the jersey numbers are nominal. They just identify the players.
The next kind is ordinal data. In this data type, the numbers are more than just labels. As the name ordinal might tell you, the order of the numbers is important. If I ask you to rank ten foods from the one you like best (1) to the one you like least (10), we'd have a set of ordinal data.
But the difference between your third-favorite food and your fourth-favorite food might not be the same as the difference between your ninth-favorite and your tenth-favorite. So this type of data lacks equal intervals and equal differences.
Interval data gives us equal differences. The Fahrenheit scale of temperature is a good example. The difference between 30o and 40o is the same as the difference between 90o and 100o. So each degree is an interval.
People are sometimes surprised to find out that on the Fahrenheit scale, a temperature of 80o is not twice as hot as 40o. For ratio statements ("twice as much as," "half as much as") to make sense, zero has to mean the complete absence of the thing you're measuring. A temperature of 0o F doesn't mean the complete absence of heat - it's just an arbitrary point on the Fahrenheit scale. (The same holds true for Celsius.)
The fourth kind of data, ratio, provides a meaningful zero point. On the Kelvin scale of temperature, zero means "absolute zero," where all molecular motion (the basis of heat) stops. So 200o Kelvin is twice as hot as 100o Kelvin. Another example is length. Eight inches is twice as long as 4 inches. Zero inches means "a complete absence of length."
An independent variable or a dependent variable can be either nominal, ordinal, interval, or ratio. The analytical tools you use depend on the type of data you work with.
When statisticians make decisions, they use probability to express their confidence about those decisions. They can never be absolutely certain about what they decide. They can only tell you how probable their conclusions are.
What do I mean by probability? Mathematicians and philosophers might give you complex definitions. In my experience, however, the best way to understand probability is in terms of examples.
Here's a simple example: If you toss a coin, what's the probability that it turns up heads? If the coin is fair, you might figure that you have a 50-50 chance of heads and a 50-50 chance of tails. And you'd be right. In terms of the kinds of numbers associated with probability, that's ½.
Think about rolling a fair die (one member of a pair of dice). What's the probability that you roll a 4? Well, a die has six faces and one of them is 4, so that's ?. Still another example: Select 1 card at random from a standard deck of 52 cards. What's the probability that it's a diamond? A deck of cards has four suits, so that's ¼.
These...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.