Statistics with JMP

Name: Statistics with JMP | Hypothesis Tests, ANOVA and Regression
Brand: Wiley
Price: 65.99 EUR
Availability: OnlineOnly

Hypothesis Tests, ANOVA and Regression

Peter Goos David Meintrup(Author)

Wiley (Publisher)

Published on 29. March 2016

648 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-09716-7 (ISBN)

€65.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Persons

Content

Dedication iii Preface xiii Acknowledgements xvii Part One Estimators and tests 1 1 Estimating population parameters 3 2 Interval estimators 37 3 Hypothesis tests 71 Part Two One population 103 4 Hypothesis tests for a population mean, proportion or variance 105 5 Two hypothesis tests for the median of a population 149 6 Hypothesis tests for the distribution of a population 175 Part Three Two populations 7 Independent versus paired samples 213 8 Hypothesis tests for means, proportions and variances of two independent samples 219 9 A nonparametric hypothesis test for the medians of two independent samples 263 10 Hypothesis tests for the population mean of two paired samples 285 11 Two nonparametric hypothesis tests for paired samples 305 Part Four More than two populations 325 12 Hypothesis tests for more than two population means: one-way analysis of variance 327 13 Nonparametric alternatives to an analysis of variance 375 14 Hypothesis tests for more than two population variances 401 Part Five More useful tests and procedures 417 15 Design of experiments and data collection 419 16 Testing equivalence 427 17 Estimation and testing of correlation and association 445 18 An introduction to regression modeling 481 19 Simple linear regression 493 A Binomial distribution 589 B Standard normal distribution 593 C X2-distribution 595 D Student's t-distribution 597 E Wilcoxon signed-rank test 599 F Critical values for the Shapiro-Wilk test 605 G Fisher's F-distribution 607 H Wilcoxon rank-sum test 615 I Studentized range or Q-distribution 625 J Two-sided Dunnett test 629 K One-sided Dunnett test 633 L Kruskal-Wallis-Test 637 M Rank correlation test 641 Index 643

1
Estimating Population Parameters

I don't know how long I stand there. I don't believe I've ever stood there mourning faithfully in a downpour, but statistically speaking it must have been spitting now and then, there must have been a bit of a drizzle once or twice.

(from The Misfortunates, Dimitri Verhulst, pp. 125-126)

A major goal in statistics is to make statements about populations or processes. Often, the interest is in specific parameters of the distributions or densities of the populations or processes under study. For instance, researchers in political science want to make statements about the proportion of a population that votes for a certain political party. Industrial engineers want to make statements about the proportion of defective smartphones produced by a production process. Bioscience engineers are interested in comparing the mean amounts of growth resulting from applying two or more different fertilizers. Economists are interested in income inequality and may want to compare the variance in income across different groups.

To be able to make such statements, the proportions, means, and variances under study need to be quantified. In statistical jargon, we say that these parameters need to be estimated. It is also important to quantify how reliable each of the estimates is, in order to judge the confidence we can have in any statement we make. This chapter discusses the properties of the most important sample statistics that are used to make statements about population and process means, proportions, and variances.

1.1 Introduction: Estimators Versus Estimates

In practice, population parameters such as µ, s2, p, and ? (see our book Statistics with JMP: Graphs, Descriptive Statistics and Probability) are rarely known. For example, if we study the arrival times of the customers of a bank, we know that the number of arrivals per unit of time often follows a Poisson1 distribution. However, we do not know the exact value of the distribution's parameter ?. One way or another, we therefore need to estimate this parameter. This estimate will be based on a number of measurements or observations, x1, x2, ., xn, that we perform in the bank; in other words, on the sample data we collect.

The estimate for the unknown ? will be a function of the sample values x1, x2, ., xn; for example, the sample mean . Every researcher who faces the same problem, studying the arrival pattern of customers, will obtain different sample values, and thus a different sample mean and another estimate. The reason for this is that the number of arrivals in the bank in a given time interval is a random variable. We can express this explicitly by using uppercase letters X1, X2, ., Xn for the sample observations. The fact that each researcher obtains another estimate for ? can also be made more explicit by using a capital letter to denote the sample mean: . The sample mean is interpreted as a random variable, and then it is called an estimator instead of an estimate. In short, an estimate is always a real number, while an estimator is a random variable the value of which is not yet known.

The sample mean is, of course, only one of many possible functions of the sample observations X1, X2, ., Xn, and thus only one of many possible estimators. Obviously, a researcher is not interested in an arbitrary function of the sample observations, but he wants to get a good idea of the unknown parameter. In other words, the researcher wishes to obtain an estimate that, on average, is equal to the unknown parameter, and that, ideally, is guaranteed to be close to the unknown parameter. Statisticians translate these requirements into "the estimator should be unbiased" and "the estimator should have a small variance". These requirements will be clarified in the next section.

1.2 Estimating a Mean Value

The requirements for a good estimator can best be illustrated by means of two simulation studies. The first study simulates data from a normally distributed population, while the second one simulates data from an exponentially distributed population.

1.2.1 The Mean of a Normally Distributed Population

We first assume that a normally distributed population with mean µ = 3000 and standard deviation s = 100 is studied by 1000 (fictitious) students. The students are unaware of the µ value and wish to estimate it. To this end, each of these students performs five measurements. A first option to estimate the unknown value µ is to calculate the sample mean. In this way, we obtain 1000 sample means, shown in the histogram in Figure 1.1, at the top left. The mean of these 1000 sample means is 2998.33, while the standard deviation is 43.38.

Figure 1.1 Histograms and descriptive statistics for 1000 sample means and medians calculated based on samples of five observations from a normally distributed population with mean 3000 and standard deviation 100.

Another possibility to estimate the unknown µ is to calculate the median. For a normally distributed population, both the median and the expected value are equal to the parameter µ, so that this makes sense. Based on the samples that the students have gathered, the 1000 medians can also be calculated and displayed in a histogram. The resulting histogram is shown in Figure 1.1, at the top right2. The attentive reader will notice immediately that the second histogram is just a bit wider than the first. Among other things, this is reflected by the fact that the standard deviation of the 1000 medians is 53.43. The mean of the 1000 medians is equal to 2999.08. In Figure 1.1, it can also be seen that the minimum (2841.78) and the first quartile (2962.22) of the sample medians are smaller than the minimum (2867.56) and the first quartile (2969.25) of the sample means. Also, the maximum (3161.64) and the third quartile (3033.51) of the sample medians are greater than the maximum (3140.35) and the third quartile (3027.80) of the sample means. This suggests that the sample medians are, in general, further away from the population mean µ = 3000 than the sample means.

It is striking that both the mean of the 1000 sample means (2998.33) and that of the 1000 medians (2999.08) are very close to 3000. If the number of samples is raised significantly (theoretically, an infinite number of samples could be taken), the mean of the sample means and that of the sample medians will converge to the unknown µ = 3000. Therefore, both the sample mean and the sample median are called unbiased estimators of the mean of a normally distributed population.

The fact that the range, the interquartile range, the standard deviation, and the variance of the 1000 sample means are smaller than those of the 1000 sample medians means that the sample mean is a more reliable estimator of the unknown population mean than the sample median. The larger variance of the medians indicates that the medians are generally further away from µ = 3000 than the sample means. In short, a researcher should have more confidence in the sample mean because it is usually closer to the unknown µ. In such a case, we say that one estimator (here, the sample mean) is more efficient or precise than the other (here, the median).

1.2.2 The Mean of an Exponentially Distributed Population

We now investigate an exponentially distributed population with parameter ? = 1/100. The "unknown" population mean is therefore µ = 1/? = 100 (see Statistics with JMP: Graphs, Descriptive Statistics and Probability). Each of the 1000 fictitious students performs five measurements. A first option to estimate the unknown value µ is again to calculate the sample mean. A histogram of the 1000 sample means is shown in Figure 1.2, at the top left. The mean of these 1000 sample means is 99.2417, while the standard deviation is 44.10.

Figure 1.2 Histograms and descriptive statistics for 1000 sample means and sample medians calculated based on samples of five observations from an exponentially distributed population with parameter ? = 1/100.

Based on the samples that the students have gathered, the 1000 medians can also be calculated and displayed in a histogram. This histogram is shown in Figure 1.2, at the top right. The mean of the 1000 medians is only 77.0114.

These calculations indicate that the population mean µ = 1/? = 100 can be approximated fairly well by using the sample means, with a mean of 99.2417. This is not the case for the medians, the mean value of which is far away from µ. This remains the case if the number of samples is increased. In this example, for an exponentially distributed population, the median is not an unbiased but a biased estimator of the population mean.

In addition, Figure 1.2 also shows that the standard deviation of the sample medians (46.13) is greater than that of the sample means (44.10).

1.3 Criteria for Estimators

Key properties of estimators are their expected values and their variances. These statistics are related to the concepts of bias and efficiency, respectively.

1.3.1 Unbiased Estimators

An ideal estimator that always produces the exact value of an...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Statistics with JMP

Description

Reviews / Votes

More details

Other editions

Additional editions

Persons

Content

1
Estimating Population Parameters

1.1 Introduction: Estimators Versus Estimates

1.2 Estimating a Mean Value

1.2.1 The Mean of a Normally Distributed Population

1.2.2 The Mean of an Exponentially Distributed Population

1.3 Criteria for Estimators

1.3.1 Unbiased Estimators

System requirements

Schweitzer Fachinformationen

Statistics with JMP

Description

Reviews / Votes

More details

Other editions

Additional editions

Persons

Content

1 Estimating Population Parameters

1.1 Introduction: Estimators Versus Estimates

1.2 Estimating a Mean Value

1.2.1 The Mean of a Normally Distributed Population

1.2.2 The Mean of an Exponentially Distributed Population

1.3 Criteria for Estimators

1.3.1 Unbiased Estimators

System requirements

1
Estimating Population Parameters