Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
Statistics is the art and science of collecting and analyzing data and understanding the nature of variability. Mathematics, especially probability, governs the underlying theory, but statistics is driven by applications to real problems.
In this chapter, we introduce several data sets that we will encounter throughout the text in the examples and exercises. These data sets are available in the R package resampledata3 or at the textbook website https://github.com/lchihara/MathStatsResamplingR.
R
If you have ever traveled by air, you probably have experienced the frustration of flight delays. The Bureau of Transportation Statistics maintains data on all aspects of air travel, including flight delays at departure and arrival.1
LaGuardia Airport (LGA) is one of three major airports that serves the New York City metropolitan area. In 2008, over 23?million passengers and over 375?000 planes flew in or out of LGA. United Airlines and American Airlines are two major airlines that schedule services at LGA. The data set FlightDelays contains information on all 4029 departures of these two airlines from LGA during May and June 2009 (Tables 1.1 and 1.2).
FlightDelays
Table 1.1 Partial view of FlightDelays data.
Table 1.2 Variables in data set FlightDelays.
Each row of the data set is an observation. Each column represents a variable?-?some characteristic that is obtained for each observation. For instance, on the first observation listed, the flight was a United Airlines plane, flight number 403, destined for Denver, and departing on Friday between 4 and 8 a.m. This data set consists of 4029 observations and 9 variables.
Questions we might ask include the following: Are flight delay times different between the two airlines? Are flight delay times different depending on the day of the week? Are flights scheduled in the morning less likely to be delayed by more than 15 min?
The birth weight of a baby is of interest to health officials since many studies have shown possible links between this weight and conditions in later life, such as obesity or diabetes. Researchers look for possible relationships between the birth weight of a baby and the age of the mother or whether or not she smoked cigarettes or drank alcohol during her pregnancy. The Centers for Disease Control and Prevention (CDC) maintains a database on all babies born in a given year,2 incorporating data provided by the US Department of Health and Human Services, the National Center for Health Statistics, and the Division of Vital Statistics. We will investigate different samples taken from the CDC's database of births.
One data set that we will investigate consists of a random sample of 1009 babies born in North Carolina during 2004 (Table 1.3). The babies in the sample had a gestation period of at least 37?weeks and were single births (i.e. not a twin or triplet).
Table 1.3 Variables in data set NCBirths2004.
NCBirths2004
In addition, we will also investigate a data set, Girls2004, consisting of a random sample of 40 baby girls born in Alaska and 40 baby girls born in Wyoming. These babies also had a gestation period of at least 37?weeks and were single births.
Girls2004
The data set TXBirths2004 contains a random sample of 1587 babies born in Texas in 2004. In this case, the sample was not restricted to single births, nor to a gestation period of at least 37?weeks. The numeric variable Number indicates whether the baby was a single birth, or one of a twin, triplet, and so on. The variable Multiple is a factor variable indicating whether or not the baby was a multiple birth.
TXBirths2004
Number
Multiple
Verizon is the primary local telephone company (incumbent local exchange carrier (ILEC)) for a large area of the Eastern United States. As such, it is responsible for providing repair service for the customers of other telephone companies known as competing local exchange carriers (CLECs) in this region. Verizon is subject to fines if the repair times (the time it takes to fix a problem) for CLEC customers are substantially worse than those for Verizon customers.
The data set Verizon contains a sample of repair times for 1664 ILEC and 23 CLEC customers (Table 1.4). The mean repair times are 8.4?h for ILEC customers and 16.5?h for CLEC customers. Could a difference this large be easily explained by chance?
Verizon
Table 1.4 Variables in data set Verizon.
When a person is released from prison, will he or she relapse into criminal behavior and be sent back? The state of Iowa tracks offenders over a 3-year period, and records the number of days until recidivism for those who are readmitted to prison. The Department of Corrections uses this recidivism data to determine whether or not their strategies for preventing offenders from relapsing into criminal behavior are effective.
The data set Recidivism contains all offenders convicted of either a misdemeanor or felony who were released from an Iowa prison during the 2010 fiscal year (ending in June) (Table 1.5). There were 17 022 people released in that period, of whom 5386 were sent back to prison in the following 3?years (through the end of the 2013 fiscal year).3
Recidivism
Table 1.5 Variables in data set Iowa Recidivism.
The recidivism rate for those under the age of 25?years was 36.5% compared to 30.6% for those 25?years or older. Does this indicate a real difference in the behavior of those in these age groups, or could this be explained by chance variability?
In analyzing data, we need to determine whether the data represent a population or a sample. A population represents all the individual cases, whether they are babies, fish, cars, or coin flips. The data from the flight delays case study in Section 1.1 are all the flight departures of United Airlines and...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.