Testing Statistical Assumptions in Research

Name: Testing Statistical Assumptions in Research
Brand: Wiley
Price: 99.99 EUR
Availability: OnlineOnly

J. P. Verma Abdel-Salam G. Abdel-Salam(Author)

Wiley (Publisher)

1st Edition

Published on 4. March 2019

224 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-52839-5 (ISBN)

€99.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Comprehensively teaches the basics of testing statistical assumptions in research and the importance in doing so

This book facilitates researchers in checking the assumptions of statistical tests used in their research by focusing on the importance of checking assumptions in using statistical methods, showing them how to check assumptions, and explaining what to do if assumptions are not met.

Testing Statistical Assumptions in Research discusses the concepts of hypothesis testing and statistical errors in detail, as well as the concepts of power, sample size, and effect size. It introduces SPSS functionality and shows how to segregate data, draw random samples, file split, and create variables automatically. It then goes on to cover different assumptions required in survey studies, and the importance of designing surveys in reporting the efficient findings. The book provides various parametric tests and the related assumptions and shows the procedures for testing these assumptions using SPSS software. To motivate readers to use assumptions, it includes many situations where violation of assumptions affects the findings. Assumptions required for different non-parametric tests such as Chi-square, Mann-Whitney, Kruskal Wallis, and Wilcoxon signed-rank test are also discussed. Finally, it looks at assumptions in non-parametric correlations, such as bi-serial correlation, tetrachoric correlation, and phi coefficient.

* An excellent reference for graduate students and research scholars of any discipline in testing assumptions of statistical tests before using them in their research study

* Shows readers the adverse effect of violating the assumptions on findings by means of various illustrations

* Describes different assumptions associated with different statistical tests commonly used by research scholars

* Contains examples using SPSS, which helps facilitate readers to understand the procedure involved in testing assumptions

* Looks at commonly used assumptions in statistical tests, such as z, t and F tests, ANOVA, correlation, and regression analysis

Testing Statistical Assumptions in Research is a valuable resource for graduate students of any discipline who write thesis or dissertation for empirical studies in their course works, as well as for data analysts.

More details

Other editions

Persons

Content

Preface ix

Acknowledgments xi

About the Companion Website xii

1 Importance of Assumptions in Using Statistical Techniques 1

1.1 Introduction 1

1.2 Data Types 2

1.2.1 Nonmetric Data 2

1.2.2 Metric Data 2

1.3 Assumptions About Type of Data 3

1.4 Statistical Decisions in Hypothesis Testing Experiments 4

1.4.1 Type I and Type II Errors 5

1.4.2 Understanding Power of Test 6

1.4.3 Relationship Between Type I and Type II Errors 7

1.4.4 One-Tailed and Two-Tailed Tests 8

1.5 Sample Size in Research Studies 8

1.6 Effect of Violating Assumptions 11

Exercises 12

Answers 16

2 Introduction of SPSS and Segregation of Data 17

2.1 Introduction 17

2.2 Introduction to SPSS 17

2.2.1 Data File Preparation 19

2.2.2 Importing the Data Set from Excel 21

2.3 Data Cleaning 23

2.3.1 Interpreting Descriptive Statistics Output 26

2.3.2 Interpreting Frequency Statistic Output 27

2.4 Data Management 27

2.4.1 Sorting Data 28

2.4.1.1 Sort Cases 28

2.4.1.2 Sort Variables 29

2.4.2 Selecting Cases Using Condition 31

2.4.2.1 Selecting Data of Males with Agree Response 32

2.4.3 Drawing Random Sample of Cases 34

2.4.4 Splitting File 36

2.4.5 Computing Variable 36

Exercises 40

Answers 42

3 Assumptions in Survey Studies 45

3.1 Introduction 45

3.2 Assumptions in Survey Research 46

3.2.1 Data Cleaning 46

3.2.2 About Instructions in Questionnaire 46

3.2.3 Respondent's Willingness to Answer 47

3.2.4 Receiving Correct Information 47

3.2.5 Seriousness of the Respondents 47

3.2.6 Prior Knowledge of the Respondents 48

3.2.7 Clarity About Items in the Questionnaire 48

3.2.8 Ensuring Survey Feedback 48

3.2.9 Nonresponse Error 48

3.3 Questionnaire's Reliability 49

3.3.1 Temporal Stability 49

3.3.1.1 Test-Retest Method 49

3.3.2 Internal Consistency 50

3.3.2.1 Split-Half Test 50

3.3.2.2 Kuder-Richardson Test 52

3.3.2.3 Cronbach's Alpha 55

Exercise 60

Answers 63

4 Assumptions in Parametric Tests 65

4.1 Introduction 65

4.2 Common Assumptions in Parametric Tests 66

4.2.1 Normality 66

4.2.1.1 Testing Normality with SPSS 67

4.2.1.2 What if the Normality Assumption Is Violated? 71

4.2.1.3 Using Transformations for Normality 72

4.2.2 Randomness 74

4.2.2.1 Runs Test for Testing Randomness 75

4.2.3 Outliers 76

4.2.3.1 Identifying Outliers with SPSS 77

4.2.4 Homogeneity of Variances 79

4.2.4.1 Testing Homogeneity with Levene's Test 79

4.2.5 Independence of Observations 82

4.2.6 Linearity 82

4.3 Assumptions in Hypothesis Testing Experiments 82

4.3.1 Comparing Means with t-Test 83

4.3.2 One Sample t-Test 83

4.3.2.1 Testing Assumption of Randomness 84

4.3.2.2 Testing Normality Assumption in t-Test 85

4.3.2.3 What if the Normality Assumption Is Violated? 88

4.3.3 Sign Test 88

4.3.4 Paired t-Test 88

4.3.4.1 Effect of Violating Normality Assumption in Paired t-Test 91

4.3.5 Rank Test 91

4.3.6 Independent Two-Sample t-Test 92

4.3.6.1 Two-Sample t-Test with SPSS and Testing Assumptions 92

4.3.6.2 Effect of Violating Assumption of Homogeneity 96

4.4 F-test For Comparing Variability 97

4.4.1 Analysis of Variance (ANOVA) 98

4.4.2 ANOVA Assumptions 99

4.4.2.1 Checking Assumptions Using SPSS 99

4.4.3 One-Way ANOVA Using SPSS 105

4.4.4 What to Do if Assumption Violates? 109

4.4.5 What if the Assumptions in ANOVA Are Violated? 109

4.5 Correlation Analysis 118

4.5.1 Karl Pearson's Coefficient of Correlation 118

4.5.2 Testing Assumptions with SPSS 119

4.5.2.1 Testing for Linearity 119

4.5.3 Coefficient of Determination 122

4.6 Regression Analysis 125

4.6.1 Simple Linear Regression 126

4.6.2 Assumptions in Linear Regression Analysis 128

4.6.2.1 Testing Assumptions with SPSS 128

Exercises 136

Answers 139

5 Assumptions in Nonparametric Tests 141

5.1 Introduction 141

5.2 Common Assumptions in Nonparametric Tests 141

5.2.1 Randomness 142

5.2.2 Independence 142

5.2.2.1 Testing Assumptions Using SPSS 142

5.2.2.2 Runs Test for Randomness Using SPSS 143

5.3 Chi-square Tests 144

5.3.1 Goodness-of-Fit Test 145

5.3.1.1 Assumptions About Data 145

5.3.1.2 Performing Chi-square Goodness-of-Fit Test Using SPSS 146

5.3.2 Testing for Independence 148

5.3.2.1 Assumptions About Data 148

5.3.2.2 Performing Chi-square Test of Independence Using SPSS 148

5.3.3 Testing for Homogeneity 152

5.3.3.1 Assumptions About Data 153

5.3.3.2 Performing Chi-square Test of Homogeneity Using SPSS 153

5.3.4 What to Do if Assumption Violates? 155

5.4 Mann-Whitney U Test 156

5.4.1 Assumption About Data 157

5.4.2 Mann-Whitney Test Using SPSS 157

5.4.3 What to Do if Assumption Violates? 159

5.5 Kruskal-Wallis Test 161

5.5.1 Assumptions About Data 162

5.5.2 Kruskal-Wallis H Test Using SPSS 162

5.5.3 Dealing with Data When Assumption Is Violated 166

5.6 Wilcoxon Signed-Rank Test 168

5.6.1 Assumptions About Data 168

5.6.2 Wilcoxon Signed-Rank Test Using SPSS 168

5.6.3 Remedy if Assumption Violates 172

Exercises 172

Answers 174

6 Assumptions in Nonparametric Correlations 175

6.1 Introduction 175

6.2 Spearman Rank-Order Correlation 175

6.3 Biserial Correlation 178

6.4 Tetrachoric Correlation 182

6.4.1 Assumptions for Tetrachoric Correlation Coefficient 182

6.4.1.1 Testing Significance 183

6.5 Phi Coefficient (F) 184

6.6 Assumptions About Data 188

6.7 What if the Assumptions Are Violated? 188

Exercises 188

Answers 190

Appendix Statistical Tables 193

Bibliography 203

Index 209

1
Importance of Assumptions in Using Statistical Techniques

1.1 Introduction

All researches are conducted under certain assumptions. Validity and accuracy of findings depends upon whether we have fulfilled all the assumptions of data and statistical techniques used in the analysis. For instance, in drawing a sample, simple random sampling requires the population to be homogeneous while stratified sampling assumes it to be heterogeneous. In any research, certain research questions are framed that we try to answer by conducting the study. In solving these questions, we frame hypotheses that are tested with the help of the data generated in the study. These hypotheses are tested using some statistical tests, but these tests depend upon whether the data is nonmetric or metric. Different statistical tests are used for nonmetric and metric data for answering same research questions. More specifically, we use nonparametric tests for nonmetric data and parametric tests for metric data. Thus, it is essential for the researchers to understand the type of data generated in their studies. Parametric tests no doubt provide more accurate findings than the nonparametric tests, but they are based upon one common assumption of normality besides some specific assumptions associated with each test. If normality assumption is severely violated, the parametric tests may distort the findings. Thus, in research studies, assumptions are focused on two spheres: data and statistical tests besides methodological issues. Nowadays, many statistical packages such as IBM SPSS® Statistics software ("SPSS"),1 Minitab, Statistica, and Statistical Analysis System () are available for analyzing both nonmetric and metric data, but they do not check the assumptions automatically. However, these software do provide outputs for testing associated assumptions with the statistical tests. We shall now discuss different types of data that can be generated in research studies. By knowing this, one can decide the relevant strategy for answering their research questions.

1.2 Data Types

Data are classified into two categories: nonmetric and metric. Nonmetric data are also termed as qualitative and metric as quantitative. Nonmetric data are further classified as nominal and ordinal. Nonmetric data are a categorical measurement and are expressed by means of a natural language description. It is often known as "categorical" data. The data such as Student's Specialization = "Economics", Response = "Agree", Gender = "Male", etc. are examples of nonmetric data. These data can be measured on two different scales, i.e. nominal and ordinal.

1.2.1 Nonmetric Data

Nominal data are obtained by categorizing an individual or object into two or more categories, but these categories are not graded. For example, an individual can be classified into male or female category, but we cannot say whether male is higher or female is higher based on the frequency of the data set. Another example of nominal data is the color of the eye. One can be classified into blue, black, or brown eye categories. With this type of data, one can only compute percentage and proportion to know the characteristics of the data. Furthermore, mode is an appropriate measure of central tendency for such a data.

On the other hand, in the ordinal data, categories are graded. The order of items is often defined by assigning numbers to them to show their relative position. Here also, we classify a person, response, or object into one of the many categories, but we can rank them in some order. For example, variables that assess performance (excellent, very good, good, etc.) are ordinal variables. Similarly, attitude (agree, can't say, disagree) and nature (very good, good, bad, etc.) are also ordinal variables. On the basis of the order of an ordinal variable, one may not be sure as to which value is the best or worst on the measured phenomenon. Moreover, the distance between ordered categories is also not measurable. No mathematical operation can be done in the ordinal data. Median and quartile deviation are the appropriate measures of central tendency and variability, respectively, in such data.

1.2.2 Metric Data

Metric data are always associated with a scale measure, and therefore, it is also known as scale data. Such type of data are obtained by measuring some phenomena. Metric data can be measured on two different types of scale, i.e. interval and ratio. The data measured on interval and ratio scales are also termed as interval data and ratio data, respectively. Interval data are obtained by measuring a phenomenon along a scale where each position is equidistant from one another. In this scale, the distance between the two pairs are equivalent in some way. The only problem with this scale is that the doubling principle breaks down as there is no real zero on the scale. For instance, the eight marks given to an individual on the basis of his or her creativity do not explain that his or her creativity is twice as good as the person with four marks on a 10-point scale. Thus, variables measured on an interval scale have values in which differences are uniform and meaningful but ratios are not. Interval data may be obtained if the parameters such as motivation or level of adjustment is rated on a scale of 1-10.

The data measured on ratio scale has a meaningful zero and has an equidistant measure (i.e. the difference between 30 and 40 is the same as the difference between 60 and 70). Because zero exists in ratio data, 80 marks obtained by person A on a skill test may be considered twice the 40 marks obtained by another person B on the same test. In other words, doubling principle holds in ratio data. All types of mathematical operations can be performed with such kind of data. Examples of ratio data are weight, height, distance, salary, etc.

1.3 Assumptions About Type of Data

We know that for metric data, the parametric statistics are calculated while for nonmetric the nonparametric statistics are used. If we violate these assumptions, the findings may be misleading. We shall show this by means of an example. Before that let us elaborate data assumptions little more. If the data are nominal, we find mode as a suitable measure of central tendency, and if the data are ordinal, we compute median. Since both nominal and ordinal data are nonmetric, we use nonparametric statistics (mode and median). On the other hand, if the data are metric (interval/ratio), we should use parametric statistics such as mean and standard deviation. But we can calculate parametric statistics for the metric data only when the assumption of normality holds. In case the normality violates, we should use nonparametric statistics like median and quartile deviation. Assumptions of data in using measures of central tendency are summarized in Table 1.1.

Table 1.1 Assumptions about data in computing measures of central tendency.

Data type Nature of variable Appropriate measure of central tendency Nonmetric Nominal data Mode Ordinal data Median Metric Interval/ratio (if symmetrical or nearly symmetrical) Mean Interval/ratio (if skewed) Median

Let us see what happens if we violate the assumption for the metric data. Consider the marks obtained by the students in an examination as shown in Table 1.2. This is a metric data; hence, without bothering about the normality assumption, let us compute the parametric statistic, mean. Here, the mean of the data set is 46. Can we say that the class average is 46 and report this finding in our research report? Certainly not, as most of the data are less than 46.

Table 1.2 Marks for the students in an examination.

Student 1 2 3 4 5 6 7 8 9 10 Marks 35 40 30 32 35 39 33 32 91 93

Let us see why this situation has arisen. If we look at the distribution of the data, it is skewed toward the positive side of the distribution as shown in Figure 1.1. Since the distribution of data is positively skewed, we can conclude that the normality assumption has been severely violated.

Figure 1.1 Showing the distribution of data.

In a situation where the normality assumption is violated, we can very well use the nonparametric statistic such as median, as shown in Table 1.1 . The median of this data set is 35, which can rightly be claimed as an average as most of the scores are around 35 in comparison to 46. Thus, if the data are skewed, then one should report median and quartile deviation as the measures of central tendency and variability, respectively, instead of mean and standard deviation in their project report.

1.4 Statistical Decisions in Hypothesis Testing Experiments

In hypotheses testing experiments, since population parameter is tested for some of its characteristics on the basis of the sample obtained from the population of interest, some errors are bound to happen. These errors are known as statistical errors. We shall...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Testing Statistical Assumptions in Research

Description

More details

Other editions

Additional editions

Persons

Content

1
Importance of Assumptions in Using Statistical Techniques

1.1 Introduction

1.2 Data Types

1.2.1 Nonmetric Data

1.2.2 Metric Data

1.3 Assumptions About Type of Data

1.4 Statistical Decisions in Hypothesis Testing Experiments

System requirements