
Statistics
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions


Person
Content
Preface xiii
About the Author xix
Supplementary Material xxi
Part 1: The Basics 1
1. Introduction 3
2. Application to Process Control 5
2.1 Benefit Estimation 5
2.2 Inferential Properties 7
2.3 Controller Performance Monitoring 7
2.4 Event Analysis 8
2.5 Time Series Analysis 9
3. Process Examples 11
3.1 Debutaniser 11
3.2 De-ethaniser 11
3.3 LPG Splitter 12
3.4 Propane Cargoes 17
3.5 Diesel Quality 17
3.6 Fuel Gas Heating Value 18
3.7 Stock Level 19
3.8 Batch Blending 22
4. Characteristics of Data 23
4.1 Data Types 23
4.2 Memory 24
4.3 Use of Historical Data 24
4.4 Central Value 25
4.5 Dispersion 32
4.6 Mode 33
4.7 Standard Deviation 35
4.8 Skewness and Kurtosis 37
4.9 Correlation 46
4.10 Data Conditioning 47
5. Probability Density Function 51
5.1 Uniform Distribution 55
5.2 Triangular Distribution 57
5.3 Normal Distribution 59
5.4 Bivariate Normal Distribution 62
5.5 Central Limit Theorem 65
5.6 Generating a Normal Distribution 69
5.7 Quantile Function 70
5.8 Location and Scale 71
5.9 Mixture Distribution 73
5.10 Combined Distribution 73
5.11 Compound Distribution 75
5.12 Generalised Distribution 75
5.13 Inverse Distribution 76
5.14 Transformed Distribution 76
5.15 Truncated Distribution 77
5.16 Rectified Distribution 78
5.17 Noncentral Distribution 78
5.18 Odds 79
5.19 Entropy 80
6. Presenting the Data 83
6.1 Box and Whisker Diagram 83
6.2 Histogram 84
6.3 Kernel Density Estimation 90
6.4 Circular Plots 95
6.5 Parallel Coordinates 97
6.6 Pie Chart 98
6.7 Quantile Plot 98
7. Sample Size 105
7.1 Mean 105
7.2 Standard Deviation 106
7.3 Skewness and Kurtosis 107
7.4 Dichotomous Data 108
7.5 Bootstrapping 110
8. Significance Testing 113
8.1 Null Hypothesis 113
8.2 Confidence Interval 116
8.3 Six-Sigma 118
8.4 Outliers 119
8.5 Repeatability 120
8.6 Reproducibility 121
8.7 Accuracy 122
8.8 Instrumentation Error 123
9. Fitting a Distribution 127
9.1 Accuracy of Mean and Standard Deviation 130
9.2 Fitting a CDF 131
9.3 Fitting a QF 134
9.4 Fitting a PDF 135
9.5 Fitting to a Histogram 138
9.6 Choice of Penalty Function 141
10. Distribution of Dependent Variables 147
10.1 Addition and Subtraction 147
10.2 Division and Multiplication 148
10.3 Reciprocal 153
10.4 Logarithmic and Exponential Functions 153
10.5 Root Mean Square 162
10.6 Trigonometric Functions 164
11. Commonly Used Functions 165
11.1 Euler's Number 165
11.2 Euler-Mascheroni Constant 166
11.3 Logit Function 166
11.4 Logistic Function 167
11.5 Gamma Function 168
11.6 Beta Function 174
11.7 Pochhammer Symbol 174
11.8 Bessel Function 176
11.9 Marcum Q-Function 178
11.10 Riemann Zeta Function 180
11.11 Harmonic Number 180
11.12 Stirling Approximation 182
11.13 Derivatives 183
12. Selected Distributions 185
12.1 Lognormal 186
12.2 Burr 189
12.3 Beta 191
12.4 Hosking 195
12.5 Student t 204
12.6 Fisher 208
12.7 Exponential 210
12.8 Weibull 213
12.9 Chi-Squared 216
12.10 Gamma 221
12.11 Binomial 225
12.12 Poisson 231
13. Extreme Value Analysis 235
14. Hazard Function 245
15. Cusum 253
16. Regression Analysis 259
16.1 F Test 275
16.2 Adjusted R 2 278
16.3 Akaike Information Criterion 279
16.4 Artificial Neural Networks 281
16.5 Performance Index 286
17. Autocorrelation 291
18. Data Reconciliation 299
19. Fourier Transform 305
Part 2: Catalogue of Distributions 315
20. Normal Distribution 317
20.1 Skew-Normal 317
20.2 Gibrat 320
20.3 Power Lognormal 320
20.4 Logit-Normal 321
20.5 Folded Normal 321
20.6 Lévy 323
20.7 Inverse Gaussian 325
20.8 Generalised Inverse Gaussian 329
20.9 Normal Inverse Gaussian 330
20.10 Reciprocal Inverse Gaussian 332
20.11 Q-Gaussian 334
20.12 Generalised Normal 338
20.13 Exponentially Modified Gaussian 345
20.14 Moyal 347
21. Burr Distribution 349
21.1 Type I 349
21.2 Type II 349
21.3 Type III 349
21.4 Type IV 350
21.5 Type V 351
21.6 Type VI 351
21.7 Type VII 353
21.8 Type VIII 354
21.9 Type IX 354
21.10 Type X 355
21.11 Type XI 356
21.12 Type XII 356
21.13 Inverse 357
22. Logistic Distribution 361
22.1 Logistic 361
22.2 Half-Logistic 364
22.3 Skew-Logistic 365
22.4 Log-Logistic 367
22.5 Paralogistic 369
22.6 Inverse Paralogistic 370
22.7 Generalised Logistic 371
22.8 Generalised Log-Logistic 375
22.9 Exponentiated Kumaraswamy-Dagum 376
23. Pareto Distribution 377
23.1 Pareto Type I 377
23.2 Bounded Pareto Type I 378
23.3 Pareto Type II 379
23.4 Lomax 381
23.5 Inverse Pareto 381
23.6 Pareto Type III 382
23.7 Pareto Type IV 383
23.8 Generalised Pareto 383
23.9 Pareto Principle 385
24. Stoppa Distribution 389
24.1 Type I 389
24.2 Type II 389
24.3 Type III 391
24.4 Type IV 391
24.5 Type V 392
25. Beta Distribution 393
25.1 Arcsine 393
25.2 Wigner Semicircle 394
25.3 Balding-Nichols 395
25.4 Generalised Beta 396
25.5 Beta Type II 396
25.6 Generalised Beta Prime 399
25.7 Beta Type IV 400
25.8 Pert 401
25.9 Beta Rectangular 403
25.10 Kumaraswamy 404
25.11 Noncentral Beta 407
26. Johnson Distribution 409
26.1 S N 409
26.2 S U 410
26.3 S l 412
26.4 S B 412
26.5 Summary 413
27. Pearson Distribution 415
27.1 Type I 416
27.2 Type II 416
27.3 Type III 417
27.4 Type IV 418
27.5 Type V 424
27.6 Type VI 425
27.7 Type VII 429
27.8 Type VIII 433
27.9 Type IX 433
27.10 Type X 433
27.11 Type XI 434
27.12 Type XII 434
28. Exponential Distribution 435
28.1 Generalised Exponential 435
28.2 Gompertz-Verhulst 435
28.3 Hyperexponential 436
28.4 Hypoexponential 437
28.5 Double Exponential 438
28.6 Inverse Exponential 439
28.7 Maxwell-Jüttner 439
28.8 Stretched Exponential 440
28.9 Exponential Logarithmic 441
28.10 Logistic Exponential 442
28.11 Q-Exponential 442
28.12 Benktander 445
29. Weibull Distribution 447
29.1 Nukiyama-Tanasawa 447
29.2 Q-Weibull 447
30. Chi Distribution 451
30.1 Half-Normal 451
30.2 Rayleigh 452
30.3 Inverse Rayleigh 454
30.4 Maxwell 454
30.5 Inverse Chi 458
30.6 Inverse Chi-Squared 459
30.7 Noncentral Chi-Squared 460
31. Gamma Distribution 463
31.1 Inverse Gamma 463
31.2 Log-Gamma 463
31.3 Generalised Gamma 467
31.4 Q-Gamma 468
32. Symmetrical Distributions 471
32.1 Anglit 471
32.2 Bates 472
32.3 Irwin-Hall 473
32.4 Hyperbolic Secant 475
32.5 Arctangent 476
32.6 Kappa 477
32.7 Laplace 478
32.8 Raised Cosine 479
32.9 Cardioid 481
32.10 Slash 481
32.11 Tukey Lambda 483
32.12 Von Mises 486
33. Asymmetrical Distributions 487
33.1 Benini 487
33.2 Birnbaum-Saunders 488
33.3 Bradford 490
33.4 Champernowne 491
33.5 Davis 492
33.6 Fréchet 494
33.7 Gompertz 496
33.8 Shifted Gompertz 497
33.9 Gompertz-Makeham 498
33.10 Gamma-Gompertz 499
33.11 Hyperbolic 499
33.12 Asymmetric Laplace 502
33.13 Log-Laplace 504
33.14 Lindley 506
33.15 Lindley-Geometric 507
33.16 Generalised Lindley 509
33.17 Mielke 509
33.18 Muth 510
33.19 Nakagami 512
33.20 Power 513
33.21 Two-Sided Power 514
33.22 Exponential Power 516
33.23 Rician 517
33.24 Topp-Leone 517
33.25 Generalised Tukey Lambda 519
33.26 Wakeby 521
34. Amoroso Distribution 525
35. Binomial Distribution 529
35.1 Negative-Binomial 529
35.2 P¿lya 531
35.3 Geometric 531
35.4 Beta-Geometric 535
35.5 Yule-Simon 536
35.6 Beta-Binomial 538
35.7 Beta-Negative Binomial 540
35.8 Beta-Pascal 541
35.9 Gamma-Poisson 542
35.10 Conway-Maxwell-Poisson 543
35.11 Skellam 546
36. Other Discrete Distributions 549
36.1 Benford 549
36.2 Borel-Tanner 552
36.3 Consul 555
36.4 Delaporte 556
36.5 Flory-Schulz 558
36.6 Hypergeometric 559
36.7 Negative Hypergeometric 561
36.8 Logarithmic 561
36.9 Discrete Weibull 563
36.10 Zeta 564
36.11 Zipf 565
36.12 Parabolic Fractal 567
Appendix 1 Data Used in Examples 569
Appendix 2 Summary of Distributions 577
References 591
Index 593
Preface
There are those that have a very cynical view of statistics. One only has to search the Internet to find quotations such as those from the author Mark Twain:
There are three kinds of lies: lies, damned lies, and statistics.
Facts are stubborn, but statistics are more pliable.
From the American humourist Evan Esar:
Statistics is the science of producing unreliable facts from reliable figures.
From the UK's shortest-serving prime minister George Canning:
I can prove anything by statistics except the truth.
And my personal favourite, attributed to many - all quoting different percentages!
76.3% of statistics are made up.
However, in the hands of a skilled process control engineer, statistics are an invaluable tool. Despite advanced control technology being well established in the process industry, the majority of site managers still do not fully appreciate its potential to improve process profitability. An important part of the engineer's job is to present strong evidence that such improvements are achievable or have been achieved. Perhaps one of the most insightful quotations is that from the physicist Ernest Rutherford.
If your experiment needs statistics, you ought to have done a better experiment.
Paraphrasing for the process control engineer:
If you need statistics to demonstrate that you have improved control
of the process, you ought to have installed a better control scheme.
Statistics is certainly not an exact science. Like all the mathematical techniques that are applied to process control, or indeed to any branch of engineering, they need to be used alongside good engineering judgement. The process control engineer has a responsibility to ensure that statistical methods are properly applied. Misapplied they can make a sceptical manager even more sceptical about the economic value of improved control. Properly used they can turn a sceptic into a champion. The engineer needs to be well versed in their application. This book should help ensure so.
After writing the first edition of Process Control: A Practical Approach, it soon became apparent that not enough attention was given to the subject. Statistics are applied extensively at every stage of a process control project from estimation of potential benefits, throughout control design and finally to performance monitoring. In the second edition this was partially addressed by the inclusion of an additional chapter. However, in writing this, it quickly became apparent that the subject is huge. In much the same way that the quantity of published process control theory far outstrips more practical texts, the same applies to the subject of statistics - but to a much greater extent. For example, the publisher of this book currently offers over 2,000 titles on the subject but fewer than a dozen covering process control. Like process control theory, most published statistical theory has little application to the process industry, but within it are hidden a few very valuable techniques.
Of course, there are already many statistical methods routinely applied by control engineers - often as part of a software product. While many use these methods quite properly, there are numerous examples where the resulting conclusion later proves to be incorrect. This typically arises because the engineer is not fully aware of the underlying (incorrect) assumptions behind the method. There are also too many occasions where the methods are grossly misapplied or where licence fees are unnecessarily incurred for software that could easily be replicated by the control engineer using a spreadsheet package.
This book therefore has two objectives. The first is to ensure that the control engineer properly understands the techniques with which he or she might already be familiar. With the rapidly widening range of statistical software products (and the enthusiastic marketing of their developers), the risk of misapplication is growing proportionately. The user will reach the wrong conclusion about, for example, the economic value of a proposed control improvement or whether it is performing well after commissioning. The second objective is to extract, from the vast array of less well-known statistical techniques, those that a control engineer should find of practical value. They offer the opportunity to greatly improve the benefits captured by improved control.
A key intent in writing this book was to avoid unnecessarily taking the reader into theoretical detail. However the reader is encouraged to brave the mathematics involved. A deeper understanding of the available techniques should at least be of interest and potentially of great value in better understanding services and products that might be offered to the control engineer. While perhaps daunting to start with, the reader will get the full value from the book by reading it from cover to cover. A first glance at some of the mathematics might appear complex. There are symbols with which the reader may not be familiar. The reader should not be discouraged. The mathematics involved should be within the capabilities of a high school student. Chapters 4 to 6 take the reader through a step-by-step approach introducing each term and explaining its use in context that should be familiar to even the least experienced engineer. Chapter 11 specifically introduces the commonly used mathematical functions and their symbology. Once the reader's initial apprehension is overcome, all are shown to be quite simple. And, in any case, almost all exist as functions in the commonly used spreadsheet software products.
It is the nature of almost any engineering subject that the real gems of useful information get buried among the background detail. Listed here are the main items worthy of special attention by the engineer because of the impact they can have on the effectiveness of control design and performance.
- Control engineers use the terms 'accuracy' and 'precision' synonymously when describing the confidence they might have in a process measurement or inferential property. As explained in Chapter 4, not understanding the difference between these terms is probably the most common cause of poorly performing quality control schemes.
- The histogram is commonly used to help visualise the variation of a process measurement. For this, both the width of the bins and the starting point for the first bin must be chosen. Although there are techniques (described in this book) that help with the initial selection, they provide only a guide. Some adjustment by trial and error is required to ensure the resulting chart shows what is required. Kernel density estimation, described in Chapter 6, is a simple-to-apply, little-known technique that removes the need for this selection. Further it generates a continuous curve rather than the discontinuous staircase shape of a histogram. This helps greatly in determining whether the data fit a particular continuous distribution.
- Control engineers typically use a few month's historical data for statistical analysis. While adequate for some applications, the size of the sample can be far too small for others. For example, control schemes are often assessed by comparing the average operation post-commissioning to that before. Small errors in each of the averages will cause much larger errors in the assessed improvement. Chapter 7 provides a methodology for assessing the accuracy of any conclusion arrived at with the chosen sample size.
- While many engineers understand the principles of significance testing, it is commonly misapplied. Chapter 8 takes the reader through the subject from first principles, describing the problems in identifying outliers and properly explaining the impact of repeatability and reproducibility of measurements.
- In assessing process behaviour it is quite common for the engineer to simply calculate, using standard formulae, the mean and standard deviation of process data. Even if the data are normally distributed, plotting the distribution of the actual data against that assumed will often reveal a poor fit. A single data point, well away from the mean, will cause the standard deviation to be substantially overestimated. Excluding such points as outliers is very subjective and risks the wrong conclusion being drawn from the analysis. Curve fitting, using all the data, produces a much more reliable estimate of mean and standard deviation. There are a range of methods of doing this, described in Chapter 9.
- Engineers tend to judge whether a distribution fits the data well by superimposing the continuous distribution on the discontinuous histogram. Such comparison can be very unreliable. Chapter 6 describes the use of quantile-quantile plots, as a much more effective alternative that is simple to apply.
- The assumption that process data follows the normal (Gaussian) distribution has become the de facto standard used in the estimation of the benefits of improved control. While valid for many datasets, there are many examples where there is a much better choice of distribution. Choosing the wrong distribution can result in the benefit estimate being easily half or double the true value. This can lead to poor decisions about the scope of an improved control project or indeed about whether it should be progressed or not. Chapter 10...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.