
Statistics with JMP
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Reviews / Votes
"Masters and advanced students in applied statistics, industrial engineering, business engineering, civil engineering and bio-science engineering will find this book beneficial. It also provides a useful resource for teachers of statistics particularly in the area of engineering." (Zentralblatt MATH 2016)More details
Other editions
Additional editions

Persons
David?Meintrup, Department of Mathematics, Statistics and Actuarial Sciences, Faculty of Applied Economics of the University of Antwerp, Belgium.
Content
1
Estimating Population Parameters
I don't know how long I stand there. I don't believe I've ever stood there mourning faithfully in a downpour, but statistically speaking it must have been spitting now and then, there must have been a bit of a drizzle once or twice.
(from The Misfortunates, Dimitri Verhulst, pp. 125-126)
A major goal in statistics is to make statements about populations or processes. Often, the interest is in specific parameters of the distributions or densities of the populations or processes under study. For instance, researchers in political science want to make statements about the proportion of a population that votes for a certain political party. Industrial engineers want to make statements about the proportion of defective smartphones produced by a production process. Bioscience engineers are interested in comparing the mean amounts of growth resulting from applying two or more different fertilizers. Economists are interested in income inequality and may want to compare the variance in income across different groups.
To be able to make such statements, the proportions, means, and variances under study need to be quantified. In statistical jargon, we say that these parameters need to be estimated. It is also important to quantify how reliable each of the estimates is, in order to judge the confidence we can have in any statement we make. This chapter discusses the properties of the most important sample statistics that are used to make statements about population and process means, proportions, and variances.
1.1 Introduction: Estimators Versus Estimates
In practice, population parameters such as µ, s2, p, and ? (see our book Statistics with JMP: Graphs, Descriptive Statistics and Probability) are rarely known. For example, if we study the arrival times of the customers of a bank, we know that the number of arrivals per unit of time often follows a Poisson1 distribution. However, we do not know the exact value of the distribution's parameter ?. One way or another, we therefore need to estimate this parameter. This estimate will be based on a number of measurements or observations, x1, x2, ., xn, that we perform in the bank; in other words, on the sample data we collect.
The estimate for the unknown ? will be a function of the sample values x1, x2, ., xn; for example, the sample mean . Every researcher who faces the same problem, studying the arrival pattern of customers, will obtain different sample values, and thus a different sample mean and another estimate. The reason for this is that the number of arrivals in the bank in a given time interval is a random variable. We can express this explicitly by using uppercase letters X1, X2, ., Xn for the sample observations. The fact that each researcher obtains another estimate for ? can also be made more explicit by using a capital letter to denote the sample mean: . The sample mean is interpreted as a random variable, and then it is called an estimator instead of an estimate. In short, an estimate is always a real number, while an estimator is a random variable the value of which is not yet known.
The sample mean is, of course, only one of many possible functions of the sample observations X1, X2, ., Xn, and thus only one of many possible estimators. Obviously, a researcher is not interested in an arbitrary function of the sample observations, but he wants to get a good idea of the unknown parameter. In other words, the researcher wishes to obtain an estimate that, on average, is equal to the unknown parameter, and that, ideally, is guaranteed to be close to the unknown parameter. Statisticians translate these requirements into "the estimator should be unbiased" and "the estimator should have a small variance". These requirements will be clarified in the next section.
1.2 Estimating a Mean Value
The requirements for a good estimator can best be illustrated by means of two simulation studies. The first study simulates data from a normally distributed population, while the second one simulates data from an exponentially distributed population.
1.2.1 The Mean of a Normally Distributed Population
We first assume that a normally distributed population with mean µ = 3000 and standard deviation s = 100 is studied by 1000 (fictitious) students. The students are unaware of the µ value and wish to estimate it. To this end, each of these students performs five measurements. A first option to estimate the unknown value µ is to calculate the sample mean. In this way, we obtain 1000 sample means, shown in the histogram in Figure 1.1, at the top left. The mean of these 1000 sample means is 2998.33, while the standard deviation is 43.38.
Figure 1.1 Histograms and descriptive statistics for 1000 sample means and medians calculated based on samples of five observations from a normally distributed population with mean 3000 and standard deviation 100.
Another possibility to estimate the unknown µ is to calculate the median. For a normally distributed population, both the median and the expected value are equal to the parameter µ, so that this makes sense. Based on the samples that the students have gathered, the 1000 medians can also be calculated and displayed in a histogram. The resulting histogram is shown in Figure 1.1, at the top right2. The attentive reader will notice immediately that the second histogram is just a bit wider than the first. Among other things, this is reflected by the fact that the standard deviation of the 1000 medians is 53.43. The mean of the 1000 medians is equal to 2999.08. In Figure 1.1, it can also be seen that the minimum (2841.78) and the first quartile (2962.22) of the sample medians are smaller than the minimum (2867.56) and the first quartile (2969.25) of the sample means. Also, the maximum (3161.64) and the third quartile (3033.51) of the sample medians are greater than the maximum (3140.35) and the third quartile (3027.80) of the sample means. This suggests that the sample medians are, in general, further away from the population mean µ = 3000 than the sample means.
It is striking that both the mean of the 1000 sample means (2998.33) and that of the 1000 medians (2999.08) are very close to 3000. If the number of samples is raised significantly (theoretically, an infinite number of samples could be taken), the mean of the sample means and that of the sample medians will converge to the unknown µ = 3000. Therefore, both the sample mean and the sample median are called unbiased estimators of the mean of a normally distributed population.
The fact that the range, the interquartile range, the standard deviation, and the variance of the 1000 sample means are smaller than those of the 1000 sample medians means that the sample mean is a more reliable estimator of the unknown population mean than the sample median. The larger variance of the medians indicates that the medians are generally further away from µ = 3000 than the sample means. In short, a researcher should have more confidence in the sample mean because it is usually closer to the unknown µ. In such a case, we say that one estimator (here, the sample mean) is more efficient or precise than the other (here, the median).
1.2.2 The Mean of an Exponentially Distributed Population
We now investigate an exponentially distributed population with parameter ? = 1/100. The "unknown" population mean is therefore µ = 1/? = 100 (see Statistics with JMP: Graphs, Descriptive Statistics and Probability). Each of the 1000 fictitious students performs five measurements. A first option to estimate the unknown value µ is again to calculate the sample mean. A histogram of the 1000 sample means is shown in Figure 1.2, at the top left. The mean of these 1000 sample means is 99.2417, while the standard deviation is 44.10.
Figure 1.2 Histograms and descriptive statistics for 1000 sample means and sample medians calculated based on samples of five observations from an exponentially distributed population with parameter ? = 1/100.
Based on the samples that the students have gathered, the 1000 medians can also be calculated and displayed in a histogram. This histogram is shown in Figure 1.2, at the top right. The mean of the 1000 medians is only 77.0114.
These calculations indicate that the population mean µ = 1/? = 100 can be approximated fairly well by using the sample means, with a mean of 99.2417. This is not the case for the medians, the mean value of which is far away from µ. This remains the case if the number of samples is increased. In this example, for an exponentially distributed population, the median is not an unbiased but a biased estimator of the population mean.
In addition, Figure 1.2 also shows that the standard deviation of the sample medians (46.13) is greater than that of the sample means (44.10).
1.3 Criteria for Estimators
Key properties of estimators are their expected values and their variances. These statistics are related to the concepts of bias and efficiency, respectively.
1.3.1 Unbiased Estimators
An ideal estimator that always produces the exact value of an...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.