Under the data analysis section of the research project, the researcher performs data analysis. As the name suggests, the researcher describes the data that are gathered during data collection. During data analysis, the researcher could describe data both visually and statistically. Data could be visually displayed to reveal distribution of data, trends, anomalies and outliers. Visual displays of data could take the form of graphs, histograms, tables, plots, and other diagrams. This stage is a precursor any statistical procedures, that a researcher could use to test the research hypotheses. This raises the question of why the researcher should not jump in and immediately commence testing hypotheses using statistical analysis. In this post we would explain the importance of descriptive statistics to test data to ensure assumptions are met before using a parametric test.
Assumptions: The Importance of Describing Data
There are numerous advantages of describing data. One of the most important benefits is to determine if the data meet the assumptions that are required for the use of parametric statistical procedures. Parametric procedures include, but are not limited to, regression, correlation, ANOVA and t test. Parametric tests have different assumptions that data must meet depending on which test is being considered. Most parametric tests demand that the data should meet assumption of normality. Normality means a normal distribution of data which, when graphed as frequencies, bear a resemblance to a bell shape (as in the image to the right/Left). Other common assumptions that should be met, depending on the statistical procedure employed, include levels-of-measurement, sample size, homogeneity of variance, independence, absence of outliers and linearity, etc. (Field, 2005). It is Important that the researcher understands the assumptions for any parametric statistical procedure being considered to determine if they are met before employing the procedure in a research study. The researcher should not use parametric statistical if the data does not meet the assumptions. Their use would result in erroneous results. Fortunately, there are corresponding non-parametric tests that the researcher could use when the data do not meet assumptions for parametric tests.
Non-parametric tests also have assumptions that data must meet, but they are fewer and less rigid. An example of a parametric procedure for correlation would be Pearson’s correlation coefficient (Pearson’s r), while a parallel non-parametric test for correlation would be Spearman’s rank correlation coefficient (Spearman’s rho). An example of a causal-comparative parametric procedure would be ANOVA, while a corresponding non-parametric causal-comparative test would be Kruskal-Wallis. Given that non-parametric tests do not require that as many assumptions are met, some students question why non-parametric tests are not always used. This is because parametric tests are superior to and more powerful than non-parametric tests and that should be used if the assumptions are met. A parametric test is more likely to find a true effect when one exists, therefore rejecting the null hypothesis, than a non-parametric test. It is recommendable that researchers conduct both parametric and non-parametric tests if they are not certain as to which is most appropriate to use. If the test results are the same, there is nothing more to worry about. If the test results are statistically significant for the parametric test, and non-significant for the non-parametric test, the researcher should take a closer look at whether the assumptions were met or not. Assumption of Normality Assumptions are evaluated both visually and statistically.
As mentioned earlier, a normal distribution of data is the most commonly required assumption for parametric statistical tests. The following would explain how the assumption of normality could be described and tested. A normal distribution of data exhibits the characteristics of a bell-shaped curve, as shown below. In a perfect normal curve, the frequency distribution is symmetrical about the center; the mean, median, and mode are all equal;and the tails of the curve approach but do not touch the x-axis. These are all preliminary indicators that a curve may represent a normal distribution, but there are additional factors to consider.
Statistical procedures used to test hypotheses have unique assumptions about the scales on which the data are measured. Data could be measured on nominal, ordinal, interval, or ratio scales. It is important to determine the assumption of measurement scales for any statistical procedure being considered to test the data. For instance, an assumption of Pearson’s r is that data be measured at the interval or ratio level. It is critical that researchers ensure that assumptions are met to have certainty that their results reflect the integrity of validity and reliability.