Engineering Biostatistics

Name: Engineering Biostatistics | An Introduction using MATLAB and WinBUGS
Brand: Wiley
Price: 115.99 EUR
Availability: OnlineOnly

An Introduction using MATLAB and WinBUGS

Brani Vidakovic(Author)

Wiley (Publisher)

1st Edition

Published on 7. December 2017

984 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-16898-0 (ISBN)

€115.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Person

Content

Preface v

1 Introduction 1

Chapter References 7

2 The Sample and Its Properties 9

2.1 Introduction 9

2.2 A MATLAB Session on Univariate Descriptive Statistics 10

2.3 Location Measures 12

2.4 Variability Measures 15

2.4.1 Ranks 24

2.5 Displaying Data 25

2.6 Multidimensional Samples: Fisher's Iris Data and Body Fat Data 29

2.7 Multivariate Samples and Their Summaries 35

2.8 Principal Components of Data 40

2.9 Visualizing Multivariate Data 45

2.10 Observations as Time Series 49

2.11 About Data Types 52

2.12 Big Data Paradigm 53

2.13 Exercises 55

Chapter References 70

3 Probability, Conditional Probability, and Bayes' Rule 73

3.1 Introduction 73

3.2 Events and Probability 74

3.3 Odds 85

3.4 Venn Diagrams 86

3.5 Counting Principles 88

3.6 Conditional Probability and Independence 92

3.6.1 Pairwise and Global Independence 97

3.7 Total Probability 97

3.8 Reassesing Probabilities: Bayes' Rule 100

3.9 Bayesian Networks 105

3.10 Exercises 111

Chapter References 130

4 Sensitivity, Specificity, and Relatives 133

4.1 Introduction 133

4.2 Notation 134

4.2.1 Conditional Probability Notation 138

4.3 Combining Two or More Tests 141

4.4 ROC Curves 144

4.5 Exercises 149

Chapter References 157

5 Random Variables 159

5.1 Introduction 159

5.2 Discrete Random Variables 161

5.2.1 Jointly Distributed Discrete Random Variables 166

5.3 Some Standard Discrete Distributions 169

5.3.1 Discrete Uniform Distribution 169

5.3.2 Bernoulli and Binomial Distributions 170

5.3.3 Hypergeometric Distribution 174

5.3.4 Poisson Distribution 177

5.3.5 Geometric Distribution 180

5.3.6 Negative Binomial Distribution 183

5.3.7 Multinomial Distribution 184

5.3.8 Quantiles 186

5.4 Continuous Random Variables 187

5.4.1 Joint Distribution of Two Continuous Random Variables 192

5.4.2 Conditional Expectation 193

5.5 Some Standard Continuous Distributions 195

5.5.1 Uniform Distribution 196

5.5.2 Exponential Distribution 198

5.5.3 Normal Distribution 200

5.5.4 Gamma Distribution 201

5.5.5 Inverse Gamma Distribution 203

5.5.6 Beta Distribution 203

5.5.7 Double Exponential Distribution 205

5.5.8 Logistic Distribution 206

5.5.9 Weibull Distribution 207

5.5.10 Pareto Distribution 208

5.5.11 Dirichlet Distribution 209

5.6 Random Numbers and Probability Tables 210

5.7 Transformations of Random Variables 211

5.8 Mixtures 214

5.9 Markov Chains 215

5.10 Exercises 219

Chapter References 232

6 Normal Distribution 235

6.1 Introduction 235

6.2 Normal Distribution 236

6.2.1 Sigma Rules 240

6.2.2 Bivariate Normal Distribution 241

6.3 Examples with a Normal Distribution 243

6.4 Combining Normal Random Variables 246

6.5 Central Limit Theorem 249

6.6 Distributions Related to Normal 253

6.6.1 Chi-square Distribution 254

6.6.2 t-Distribution 258

6.6.3 Cauchy Distribution 259

6.6.4 F-Distribution 260

6.6.5 Noncentral ¿2, t, and F Distributions 262

6.6.6 Lognormal Distribution 263

6.7 Delta Method and Variance Stabilizing Transformations 265

6.8 Exercises 268

Chapter References 274

7 Point and Interval Estimators 277

7.1 Introduction 277

7.2 Moment Matching and Maximum Likelihood Estimators 278

7.2.1 Unbiasedness and Consistency of Estimators 285

7.3 Estimation of a Mean, Variance, and Proportion 288

7.3.1 Point Estimation of Mean 288

7.3.2 Point Estimation of Variance 290

7.3.3 Point Estimation of Population Proportion 294

7.4 Confidence Intervals 295

7.4.1 Confidence Intervals for the Normal Mean 296

7.4.2 Confidence Interval for the Normal Variance 299

7.4.3 Confidence Intervals for the Population Proportion . . . 302

7.4.4 Confidence Intervals for Proportions When X = 0 306

7.4.5 Designing the Sample Size with Confidence Intervals 307

7.5 Prediction and Tolerance Intervals 309

7.6 Confidence Intervals for Quantiles 311

7.7 Confidence Intervals for the Poisson Rate 312

7.8 Exercises 315

Chapter References 328

8 Bayesian Approach to Inference 331

8.1 Introduction 331

8.2 Ingredients for Bayesian Inference 334

8.3 Conjugate Priors 338

8.4 Point Estimation 340

8.4.1 Normal-Inverse Gamma Conjugate Analysis 343

8.5 Prior Elicitation 345

8.6 Bayesian Computation and Use of WinBUGS 348

8.6.1 Zero Tricks in WinBUGS 351

8.7 Bayesian Interval Estimation: Credible Sets 353

8.8 Learning by Bayes' Theorem 357

8.9 Bayesian Prediction 358

8.10 Consensus Means 362

8.11 Exercises 365

Chapter References 372

9 Testing Statistical Hypotheses 375

9.1 Introduction 375

9.2 Classical Testing Problem 377

9.2.1 Choice of Null Hypothesis 377

9.2.2 Test Statistic, Rejection Regions, Decisions, and Errors in Testing 379

9.2.3 Power of the Test 380

9.2.4 Fisherian Approach: p-Values 381

9.3 Bayesian Approach to Testing 382

9.3.1 Criticism and Calibration of p-Values 386

9.4 Testing the Normal Mean 388

9.4.1 z-Test 389

9.4.2 Power Analysis of a z-Test 389

9.4.3 Testing a Normal Mean When the Variance Is Not Known: t-Test 391

9.4.4 Power Analysis of t-Test 394

9.5 Testing Multivariate Mean: T-Square Test* 397

9.5.1 T-Square Test 397

9.5.2 Test for Symmetry 401

9.6 Testing the Normal Variances 402

9.7 Testing the Proportion 404

9.7.1 Exact Test for Population Proportions 406

9.7.2 Bayesian Test for Population Proportions 409

9.8 Multiplicity in Testing, Bonferroni Correction, and False Discovery Rate 412

9.9 Exercises 415

Chapter References 425

10 Two Samples 427

10.1 Introduction 427

10.2 Means and Variances in Two Independent Normal Populations 428

10.2.1 Confidence Interval for the Difference of Means 433

10.2.2 Power Analysis for Testing Two Means 434

10.2.3 More Complex Two-Sample Designs 438

10.2.4 A Bayesian Test for Two Normal Means 439

10.3 Testing the Equality of Normal Means When Samples Are Paired 443

10.3.1 Sample Size in Paired t-Test 448

10.3.2 Difference-in-Differences (DiD) Tests 449

10.4 Two Multivariate Normal Means 451

10.4.1 Confidence Intervals for Arbitrary Linear Combinations of Mean Differences 453

10.4.2 Profile Analysis With Two Independent Groups 454

10.4.3 Paired Multivariate Samples 456

10.5 Two Normal Variances 459

10.6 Comparing Two Proportions 463

10.6.1 The Sample Size 465

10.7 Risk Differences, Risk Ratios, and Odds Ratios 466

10.7.1 Risk Differences 466

10.7.2 Risk Ratio 467

10.7.3 Odds Ratios 469

10.7.4 Two Proportions from a Single Sample 473

10.8 Two Poisson Rates 476

10.9 Equivalence Tests 479

10.10 Exercises 483

Chapter References 500

11 ANOVA and Elements of Experimental Design 503

11.1 Introduction 503

11.2 One-Way ANOVA 504

11.2.1 ANOVA Table and Rationale for F-Test 506

11.2.2 Testing Assumption of Equal Population Variances . . . 509

11.2.3 The Null Hypothesis Is Rejected. What Next? 511

11.2.4 Bayesian Solution 516

11.2.5 Fixed- and Random-Effect ANOVA 518

11.3 Welch's ANOVA 518

11.4 Two-Way ANOVA and Factorial Designs 521

11.4.1 Two-way ANOVA: One Observation Per Cell 527

11.5 Blocking 529

11.6 Repeated Measures Design 531

11.6.1 Sphericity Tests 534

11.7 Nested Designs 535

11.8 Power Analysis in ANOVA 539

11.9 Functional ANOVA 545

11.10 Analysis of Means (ANOM) 548

11.11 Gauge R&R ANOVA 550

11.12 Testing Equality of Several Proportions 556

11.13 Testing the Equality of Several Poisson Means 557

11.14 Exercises 559

Chapter References 582

12 Models for Tables 585

12.1 Introduction 586

12.2 Contingency Tables: Testing for Independence 586

12.2.1 Measuring Association in Contingency Tables 591

12.2.2 Power Analysis for Contingency Tables 593

12.2.3 Cohen's Kappa 594

12.3 Three-Way Tables 596

12.4 Fisher's Exact Test 600

12.5 Stratified Tables: Mantel-Haenszel Test 603

12.5.1 Testing Conditional Independence or Homogeneity . . . 604

12.5.2 Odds Ratio from Stratified Tables 607

12.6 Paired Tables: McNemar's Test 608

12.7 Risk Differences, Risk Ratios, and Odds Ratios for Paired Tables 610

12.7.1 Risk Differences 610

12.7.2 Risk Ratios 611

12.7.3 Odds Ratios 612

12.7.4 Liddell's Procedure 617

12.7.5 Garth Test 619

12.7.6 Stuart-Maxwell Test 620

12.7.7 Cochran's Q Test* 626

12.8 Exercises 628

Chapter References 643

13 Correlation 647

13.1 Introduction 647

13.2 The Pearson Coefficient of Correlation 648

13.2.1 Inference About ¿ 650

13.2.2 Bayesian Inference for Correlation Coefficients 663

13.3 Spearman's Coefficient of Correlation 665

13.4 Kendall's Tau 667

13.5 Cum hoc ergo propter hoc 670

13.6 Exercises 671

Chapter References 677

14 Regression 679

14.1 Introduction 679

14.2 Simple Linear Regression 680

14.2.1 Inference in Simple Linear Regression 688

14.3 Calibration 697

14.4 Testing the Equality of Two Slopes 699

14.5 Multiple Regression 702

14.5.1 Matrix Notation 703

14.5.2 Residual Analysis, Influential Observations, Multicollinearity, and Variable Selection 709

14.6 Sample Size in Regression 720

14.7 Linear Regression That Is Nonlinear in Predictors 720

14.8 Errors-In-Variables Linear Regression 723

14.9 Analysis of Covariance 724

14.9.1 Sample Size in ANCOVA 728

14.9.2 Bayesian Approach to ANCOVA 729

14.10 Exercises 731

Chapter References 748

15 Regression for Binary and Count Data 751

15.1 Introduction 751

15.2 Logistic Regression 752

15.2.1 Fitting Logistic Regression 753

15.2.2 Assessing the Logistic Regression Fit 758

15.2.3 Probit and Complementary Log-Log Links 769

15.3 Poisson Regression 773

15.4 Log-linear Models 779

15.5 Exercises 783

Chapter References 798

16 Inference for Censored Data and Survival Analysis 801

16.1 Introduction 801

16.2 Definitions 802

16.3 Inference with Censored Observations 807

16.3.1 Parametric Approach 807

16.3.2 Nonparametric Approach: Kaplan-Meier or Product-Limit Estimator 809

16.3.3 Comparing Survival Curves 815

16.4 The Cox Proportional Hazards Model 818

16.5 Bayesian Approach 822

16.5.1 Survival Analysis in WinBUGS 823

16.6 Exercises 829

Chapter References 835

17 Goodness of Fit Tests 837

17.1 Introduction 837

17.2 Probability Plots 838

17.2.1 Q-Q Plots 838

17.2.2 P-P Plots 841

17.2.3 Poissonness Plots 842

17.3 Pearson's Chi-Square Test 843

17.4 Kolmogorov-Smirnov Tests 852

17.4.1 Kolmogorov's Test 852

17.4.2 Smirnov's Test to Compare Two Distributions 854

17.5 Cramér-von Mises and Watson's Tests 858

17.5.1 Rosenblatt's Test 860

17.6 Moran's Test 862

17.7 Departures from Normality 863

17.7.1 Ellimination of Unknown Parameters by Transformations 866

17.8 Exercises 867

Chapter References 876

18 Distribution-Free Methods 879

18.1 Introduction 879

18.2 Sign Test 880

18.3 Wilcoxon Signed-Rank Test 884

18.4 Wilcoxon Sum Rank Test and Mann-Whitney Test 887

18.5 Kruskal-Wallis Test 890

18.6 Friedman's Test 894

18.7 Resampling Methods 898

18.7.1 The Jackknife 898

18.7.2 Bootstrap 901

18.7.3 Bootstrap Versions of Some Popular Tests 908

18.7.4 Randomization and Permutation Tests 916

18.7.5 Discussion 919

18.8 Exercises 919

Chapter References 929

19 Bayesian Inference Using Gibbs Sampling - BUGS Project 931

19.1 Introduction 931

19.2 Step-by-Step Session 932

19.3 Built-in Functions and Common Distributions in WinBUGS 937

19.4 MATBUGS: A MATLAB Interface to WinBUGS 938

19.5 Exercises 942

Chapter References 943

Index 945

Chapter 1
Introduction

Many people were at first surprised at my using the new words "Statistics" and "Statistical," as it was supposed that some term in our own language might have expressed the same meaning. But in the course of a very extensive tour through the northern parts of Europe, which I happened to take in 1786, I found that in Germany they were engaged in a species of political inquiry to which they had given the name of "Statistics".. I resolved on adopting it, and I hope that it is now completely naturalised and incorporated with our language.

- Sinclair, 1791; Vol XX

WHAT IS COVERED IN THIS CHAPTER

What is the subject of statistics?
Population, sample, data
Appetizer examples

The problems confronting health professionals today often involve fundamental aspects of device and system analysis, and their design and application. As such they are of extreme importance to engineers and scientists.

Due to many aspects of engineering and scientific practice involving nondeterministic outcomes, understanding and knowledge of statistics is important to any engineer and scientist. Statistics is a guide to the unknown. It is a science that deals with designing experimental protocols; collecting, summarizing, and presenting data; and, most important, making inferences and aiding decisions in the presence of variability and uncertainty. For example, R. A. Fisher's 1943 elucidation of the human blood-group system Rhesus in terms of the three linked loci C, D, and E, as described in Fisher (1947) or Edwards (2007), is a brilliant example of building a coherent structure of new knowledge guided by a statistical analysis of available experimental data.

The uncertainty that statistical science addresses derives mainly from two sources: (1) from observing only a part of an existing, fixed, but large population or (2) from having a process that results in nondeterministic outcomes. At least a part of the process needs to be either a black box or inherently stochastic, so the outcomes cannot be predicted with certainty.

A population is a statistical universe. It is defined as a collection of existing attributes of some natural phenomenon or a collection of potential attributes when a process is involved. In the case of a process, the underlying population is called hypothetical, for obvious reasons. Thus, populations can be either finite or infinite. A subset of a population selected by some relevant criteria is called a subpopulation.

Often we think about a population as an assembly of people, animals, items, events, times, etc., in which the attribute of interest is measurable. For example, the population of all US citizens older than 21 is an example of a population for which many attributes can be assessed. Attributes might be a history of heart disease, weight, political affiliation, level of blood sugar, etc.

A sample is an observed part of a population. Selection of a sample is a rich methodology in itself, but, unless otherwise specified, it is assumed that the sample is selected at random. The randomness ensures that the sample is representative of its population.

The sampling process depends on the nature of the problem and the population. For example, a sample may be obtained via a retrospective study (usually existing historical outcomes over some period of time), an observational study (an observer monitors the process or population in real time), a sample survey (a researcher administers a questionnaire to measure the characteristics and/or attitudes of subjects), or a designed study (a researcher makes deliberate changes in controllable variables to induce a cause/effect relationship), to name just a few.

Example 1.1. Ohm's Law Measurements. A student constructed a simple electric circuit in which the resistance R and voltage E were controllable. The output of interest is current I, and according to Ohm's law it is

This is a mechanistic, theoretical model. In a finite number of measurements under an identical R, E setting, the measured current varies. The population here is hypothetical - an infinite collection of all potentially obtainable measurements of its attribute, current I. The observed sample is finite. In the presence of sample variability, one establishes an empirical (statistical) model for currents from the population as either (statistical) model for currents from the population as either

On the basis of a sample, one may first select the model and then proceed with the inference about the nature of the discrepancy, ?.

Example 1.2. Cell Counts. In a quantitative engineering physiology laboratory, a team of four students was asked to make a LabVIEW© program to automatically count MC3T3-E1 cells in a hemocytometer (Fig. 1.1). This automatic count was to be compared with the manual count collected through an inverted bright field microscope. The manual count is considered the gold standard.

Fig. 1.1 Cells on a hemocytometer plate.

The experiment consisted of placing 10 µL of cell solutions at two levels of cell confluency: 20% and 70%. There were n1 =12 pairs of measurements (automatic and manual counts) at 20% and n2 = 10 pairs at 70%, as in the table below.

20% confluency Automated 34 44 40 62 53 51 30 33 38 51 26 48 Manual 30 43 34 53 49 39 37 42 30 50 35 54 70% confluency Automated 72 82 100 94 83 94 73 87 107 102 Manual 76 51 92 77 74 81 72 87 100 104

The students wish to answer the following questions:

Are the automated and manual counts significantly different for a fixed confluency level? What are the confidence intervals for the population differences if normality of the measurements is assumed?
If the difference between automated and manual counts constitutes an error, are the errors comparable for the two confluency levels?

We will revisit this example later in the book (Exercise 10.20) and see that for the 20% confluency level there is no significant difference between the automated and manual counts, whereas for the 70% level the difference is significant. We will also see that the errors for the two confluency levels significantly differ. The statistical design for comparison of errors is called a difference in differences (DiD) and is quite common in biomedical data analysis.

Example 1.3. Rana Pipiens. Students in a quantitative engineering physiology laboratory were asked to expose the gastrocnemius muscle of the northern leopard frog (Rana pipiens, and stimulate the sciatic nerve to observe contractions in the skeletal muscle. Students were interested in modeling the length-tension relationship. The force used was the active force, calculated by subtracting the measured passive force (no stimulation) from the total force (with stimulation).

The active force represents the dependent variable. The length of the muscle begins at 35 mm and stretches in increments of 0.5 mm, until a maximum length of 42.5 mm is achieved. The velocity at which the muscle was stretched was held constant at 0.5 mm/s.

Reading Change in Length (in %) Passive force Total...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Engineering Biostatistics

Description

More details

Other editions

Additional editions

Person

Content

Chapter 1
Introduction

WHAT IS COVERED IN THIS CHAPTER

System requirements

Schweitzer Fachinformationen

Engineering Biostatistics

Description

More details

Other editions

Additional editions

Person

Content

Chapter 1 Introduction

WHAT IS COVERED IN THIS CHAPTER

System requirements

Chapter 1
Introduction