Araştırma Makalesi
BibTex RIS Kaynak Göster

The power and type I error of Wilcoxon-Mann-Whitney, Welch's t, and student's t tests for Likert-type data

Yıl 2023, Cilt: 10 Sayı: 1, 114 - 128, 20.03.2023
https://doi.org/10.21449/ijate.1183622

Öz

Likert-type item is the most popular response format for collecting data in social, educational, and psychological studies through scales or questionnaires. However, there is no consensus on whether parametric or non-parametric tests should be preferred when analyzing Likert-type data. This study examined the statistical power of parametric and non-parametric tests when each Likert-type item was analyzed independently in survey studies. The main purpose of the study is to examine the statistical power of Wilcoxon-Mann-Whitney, Welch's t, and Student's t tests for Likert-type data, which are pairwise comparison tests. For this purpose, a Monte Carlo simulation study was conducted. The statistical significance of the selected tests was examined under the conditions of sample size, group size ratio, and effect size. The results showed that the Wilcoxon-Mann-Whitney test was superior to its counterparts, especially for small samples and unequal group sizes. However, the Student's t-test for Likert-type data had similar statistical power to the Wilcoxon-Mann-Whitney test under conditions of equal group sizes when the sample size was 200 or more. Consistent with the empirical results, practical recommendations were provided for researchers on what to consider when collecting and analyzing Likert-type data.

Kaynakça

  • Ahad, N.A., & Yahaya, S.S.S. (2014). Sensitivity analysis of Welch's t-test. AIP Conference Proceedings, 1605(February 2015), 888–893. https://doi.org/10.1063/1.4887707
  • Bindak, R. (2014). Comparison Mann-Whitney U Test and Students’ t Test in Terms of Type I Error Rate and Test Power: A Monte Carlo Sımulation Study. Afyon Kocatepe University Journal of Sciences and Engineering, 14, 5-11. https://doi.org/10.5578/fmbd.7380
  • Boneau, C.A. (1962). A comparison of the power of the U and t-tests. Psychological Review, 69, 246-256. https://doi.org/10.1037/h0047269
  • Boone, H.N., Boone, D.A. 2012. Analyzing Likert data. Journal of Extension, 50(2), 1-5. Retrieved February 20, 2023, from https://eric.ed.gov/?id=EJ1042448
  • Bridge, P.D., & Sawilowsky, S.S. (1999). Increasing physicians' awareness of the impact of statistics on research outcomes: comparative power of the t-test and Wilcoxon Rank-Sum test in small samples applied research. Journal of clinical epidemiology, 52(3), 229-35. https://doi.org/10.1016/S0895-4356(98)00168-1
  • Bulus, M. (2021). Sample size determination and optimal design of randomized/non-equivalent pretest-posttest control-group designs. Adiyaman Univesity Journal of Educational Sciences, 11(1), 48-69. https://doi.org/10.17984/adyuebd.941434
  • Bulus, M. (2022). Minimum detectable effect size computations for cluster-level regression discontinuity: Specifications beyond the linear functional form. Journal of Research on Education Effectiveness, 15(1), 151 177. https://doi.org/10.1080/19345747.2021.1947425
  • Bulus, M., & Dong, N. (2021). Bound-constrained optimization of sample sizes subject to monetary restrictions in planning multilevel randomized trials and regression discontinuity studies. The Journal of Experimental Education, 89(2), 379-401. https://doi.org/10.1080/00220973.2019.1636197
  • Calver, M., & Fletcher, D. (2020). When ANOVA isn't ideal: Analyzing ordinal data from practical work in biology. The American Biology Teacher, 82(5), 289-294. https://doi.org/10.1525/abt.2020.82.5.289
  • Carifio, J., & Perla, R. (2008). Resolving the 50-year debate around using and misusing Likert scales. Medical education, 42(12), 1150–1152. https://doi.org/10.1111/j.1365-2923.2008.03172.x
  • Champagne, C.A., & Curran, P.J. (2017). Using Monte Carlo simulations to demonstrate the importance of statistical power. The Journal of Educational Research, 110(6), 524-532. https://doi.org/10.1080/00220671.2015.1079697
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
  • de Winter, J.F., & Dodou, D. (2010). Five-point Likert items: t-test versus Mann-Whitney-Wilcoxon. Practical Assessment, Research, and Evaluation, 15(1), 11. https://doi.org/10.7275/bj1p-ts64
  • de Winter, J.F. (2013) Using the Student's t-test with extremely small sample sizes. Practical Assessment, Research, and Evaluation, 18, 10. https://doi.org/10.7275/e4r6-dj05
  • Delacre, M., Lakens, D., & Leys, C. (2017). Why Psychologists Should by Default Use Welch’s t-test Instead of Student’s t-test. International Review of Social Psychology, 30(1), 92. https://www.rips-irsp.com/articles/10.5334/irsp.661/
  • Derrick, B., & White, P. (2017). Comparing two samples from an individual Likert question. International Journal of Mathematics and Statistics, 18(3). Retrieved February 20, 2023, from http://www.ceser.in/ceserp/index.php/ijms/article/view/4997
  • Dong, N., & Maynard, R. (2013). PowerUp!: A tool for calculating minimum detectable effect sizes and minimum required sample sizes for experimental and quasi-experimental design studies. Journal of Research on Educational Effectiveness, 6(1), 24-67. https://doi.org/10.1080/19345747.2012.673143
  • Dwivedi, A.K., Mallawaarachchi, I., & Alvarado, L.A. (2017). Analysis of small sample size studies using non-parametric bootstrap test with pooled sampling method. Statistics in Medicine, 36, 2187 - 2205. https://doi.org/10.1002/sim.7263
  • Field, A. (2009). Discovering statistics using SPSS (3rd ed.). Sage publications.
  • Glass, G., Peckham, P., & Sanders, J. (1972). Consequences of failure to meet assumptions underlying the fixed effects analysis of variance and covariance. Review of Educational Research, 42, 237-288. https://doi.org/10.3102/00346543042003237
  • Harpe, S.E. (2015). How to analyze Likert and other rating scale data. Currents in Pharmacy Teaching and Learning, 7, 836-850. https://doi.org/10.1016/j.cptl.2015.08.001
  • Heeren, T., & D'Agostino, R.B. (1987). Robustness of the two independent samples t-test when applied to ordinal scaled data. Statistics in Medicine, 6(1), 79 90. https://doi.org/10.1002/sim.4780060110
  • Jamieson S. (2004). Likert scales: how to (ab)use them. Medical education, 38(12), 1217–1218. https://doi.org/10.1111/j.1365-2929.2004.02012.x
  • Kim, T.K., & Park, J.H. (2019). More about the basic assumptions of t-test: normality and sample size. Korean Journal of Anesthesiology, 72(4), 331 335. https://doi.org/10.4097/kja.d.18.00292
  • Liddell, T.M., & Kruschke, J.K. (2018). Analyzing ordinal data with metric models: What could possibly go wrong?. Journal of Experimental Social Psychology, 79, 328-348. https://doi.org/10.1016/j.jesp.2018.08.009
  • Ma, C., Wang, X., Xia, L., Cheng, X., & Qiu, L. (2021). Effect of sample size and the traditional parametric, non-parametric, and robust methods on the establishment of reference intervals: Evidence from real-world data. Clinical Biochemistry, 92, 67–70. https://doi.org/10.1016/j.clinbiochem.2021.03.006
  • Nanna, M.J., & Sawilowsky, S.S. (1998). Analysis of Likert scale data in disability and medical rehabilitation research. Psychological Methods, 3(1), 55 67. https://doi.org/10.1037/1082-989X.3.1.55
  • Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in Health Sciences Education, 15, 625-632. https://doi.org/10.1007/s10459-010-9222-y
  • Ruxton, G.D. (2006). The unequal variance Student’s t testis an underused alternative to Student’s t test and the Mann–Whitney U test. Behavioral Ecology, 17(4), 688–690. https://doi.org/10.1093/beheco/ark016
  • Sangthong, M. (2020). The Effect of the Likert Point Scale and Sample Size on the Efficiency of Parametric and Non-parametric Tests. Thailand Statistician, 18(1), 55–64.
  • Schrum, M.L., Johnson, M., Ghuy, M., & Gombolay, M.C. (2020). Four years in review: Statistical practices of Likert scales in human-robot interaction studies. In Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction (pp. 43-52). https://doi.org/10.1145/3371382.3380739
  • Wiedermann, W., & von Eye, A. (2013). Robustness and power of the parametric t-test and the non-parametric Wilcoxon test under non-independence of observations. Psychological Test and Assessment Modeling, 55(1), 39-61.
  • Wilcox, R.R. (2012). Introduction to robust estimation and hypothesis testing (3rd ed.). Academic Press.
  • Wu, H., & Leung, S.O. (2017). Can Likert scales be treated as interval scales? Simulation study. Journal of Social Service Research, 43(4), 527 532. https://doi.org/10.1080/01488376.2017.1329775
  • Zimmerman D. W. (2004). A note on preliminary tests of equality of variances. British Journal of Mathematical and Statistical Psychology, 57, 173 181. https://doi.org/10.1348/000711004849222
  • Zimmerman, D.W. & Zumbo, B.D. (1990) The Relative Power of the Wilcoxon-Mann-Whitney Test and Student t Test Under Simple Bounded Transformations. The Journal of General Psychology, 117(4), 425-436, https://doi.org/10.1080/00221309.1990.9921148
  • Zimmerman, D.W. (1985). Power Functions of the t-test and Mann-Whitney U Test Under Violation of Parametric Assumptions. Perceptual and Motor Skills, 61, 467 - 470. https://doi.org/10.2466/pms.1985.61.2.467

The power and type I error of Wilcoxon-Mann-Whitney, Welch's t, and student's t tests for Likert-type data

Yıl 2023, Cilt: 10 Sayı: 1, 114 - 128, 20.03.2023
https://doi.org/10.21449/ijate.1183622

Öz

Likert-type item is the most popular response format for collecting data in social, educational, and psychological studies through scales or questionnaires. However, there is no consensus on whether parametric or non-parametric tests should be preferred when analyzing Likert-type data. This study examined the statistical power of parametric and non-parametric tests when each Likert-type item was analyzed independently in survey studies. The main purpose of the study is to examine the statistical power of Wilcoxon-Mann-Whitney, Welch's t, and Student's t tests for Likert-type data, which are pairwise comparison tests. For this purpose, a Monte Carlo simulation study was conducted. The statistical significance of the selected tests was examined under the conditions of sample size, group size ratio, and effect size. The results showed that the Wilcoxon-Mann-Whitney test was superior to its counterparts, especially for small samples and unequal group sizes. However, the Student's t-test for Likert-type data had similar statistical power to the Wilcoxon-Mann-Whitney test under conditions of equal group sizes when the sample size was 200 or more. Consistent with the empirical results, practical recommendations were provided for researchers on what to consider when collecting and analyzing Likert-type data.

Kaynakça

  • Ahad, N.A., & Yahaya, S.S.S. (2014). Sensitivity analysis of Welch's t-test. AIP Conference Proceedings, 1605(February 2015), 888–893. https://doi.org/10.1063/1.4887707
  • Bindak, R. (2014). Comparison Mann-Whitney U Test and Students’ t Test in Terms of Type I Error Rate and Test Power: A Monte Carlo Sımulation Study. Afyon Kocatepe University Journal of Sciences and Engineering, 14, 5-11. https://doi.org/10.5578/fmbd.7380
  • Boneau, C.A. (1962). A comparison of the power of the U and t-tests. Psychological Review, 69, 246-256. https://doi.org/10.1037/h0047269
  • Boone, H.N., Boone, D.A. 2012. Analyzing Likert data. Journal of Extension, 50(2), 1-5. Retrieved February 20, 2023, from https://eric.ed.gov/?id=EJ1042448
  • Bridge, P.D., & Sawilowsky, S.S. (1999). Increasing physicians' awareness of the impact of statistics on research outcomes: comparative power of the t-test and Wilcoxon Rank-Sum test in small samples applied research. Journal of clinical epidemiology, 52(3), 229-35. https://doi.org/10.1016/S0895-4356(98)00168-1
  • Bulus, M. (2021). Sample size determination and optimal design of randomized/non-equivalent pretest-posttest control-group designs. Adiyaman Univesity Journal of Educational Sciences, 11(1), 48-69. https://doi.org/10.17984/adyuebd.941434
  • Bulus, M. (2022). Minimum detectable effect size computations for cluster-level regression discontinuity: Specifications beyond the linear functional form. Journal of Research on Education Effectiveness, 15(1), 151 177. https://doi.org/10.1080/19345747.2021.1947425
  • Bulus, M., & Dong, N. (2021). Bound-constrained optimization of sample sizes subject to monetary restrictions in planning multilevel randomized trials and regression discontinuity studies. The Journal of Experimental Education, 89(2), 379-401. https://doi.org/10.1080/00220973.2019.1636197
  • Calver, M., & Fletcher, D. (2020). When ANOVA isn't ideal: Analyzing ordinal data from practical work in biology. The American Biology Teacher, 82(5), 289-294. https://doi.org/10.1525/abt.2020.82.5.289
  • Carifio, J., & Perla, R. (2008). Resolving the 50-year debate around using and misusing Likert scales. Medical education, 42(12), 1150–1152. https://doi.org/10.1111/j.1365-2923.2008.03172.x
  • Champagne, C.A., & Curran, P.J. (2017). Using Monte Carlo simulations to demonstrate the importance of statistical power. The Journal of Educational Research, 110(6), 524-532. https://doi.org/10.1080/00220671.2015.1079697
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
  • de Winter, J.F., & Dodou, D. (2010). Five-point Likert items: t-test versus Mann-Whitney-Wilcoxon. Practical Assessment, Research, and Evaluation, 15(1), 11. https://doi.org/10.7275/bj1p-ts64
  • de Winter, J.F. (2013) Using the Student's t-test with extremely small sample sizes. Practical Assessment, Research, and Evaluation, 18, 10. https://doi.org/10.7275/e4r6-dj05
  • Delacre, M., Lakens, D., & Leys, C. (2017). Why Psychologists Should by Default Use Welch’s t-test Instead of Student’s t-test. International Review of Social Psychology, 30(1), 92. https://www.rips-irsp.com/articles/10.5334/irsp.661/
  • Derrick, B., & White, P. (2017). Comparing two samples from an individual Likert question. International Journal of Mathematics and Statistics, 18(3). Retrieved February 20, 2023, from http://www.ceser.in/ceserp/index.php/ijms/article/view/4997
  • Dong, N., & Maynard, R. (2013). PowerUp!: A tool for calculating minimum detectable effect sizes and minimum required sample sizes for experimental and quasi-experimental design studies. Journal of Research on Educational Effectiveness, 6(1), 24-67. https://doi.org/10.1080/19345747.2012.673143
  • Dwivedi, A.K., Mallawaarachchi, I., & Alvarado, L.A. (2017). Analysis of small sample size studies using non-parametric bootstrap test with pooled sampling method. Statistics in Medicine, 36, 2187 - 2205. https://doi.org/10.1002/sim.7263
  • Field, A. (2009). Discovering statistics using SPSS (3rd ed.). Sage publications.
  • Glass, G., Peckham, P., & Sanders, J. (1972). Consequences of failure to meet assumptions underlying the fixed effects analysis of variance and covariance. Review of Educational Research, 42, 237-288. https://doi.org/10.3102/00346543042003237
  • Harpe, S.E. (2015). How to analyze Likert and other rating scale data. Currents in Pharmacy Teaching and Learning, 7, 836-850. https://doi.org/10.1016/j.cptl.2015.08.001
  • Heeren, T., & D'Agostino, R.B. (1987). Robustness of the two independent samples t-test when applied to ordinal scaled data. Statistics in Medicine, 6(1), 79 90. https://doi.org/10.1002/sim.4780060110
  • Jamieson S. (2004). Likert scales: how to (ab)use them. Medical education, 38(12), 1217–1218. https://doi.org/10.1111/j.1365-2929.2004.02012.x
  • Kim, T.K., & Park, J.H. (2019). More about the basic assumptions of t-test: normality and sample size. Korean Journal of Anesthesiology, 72(4), 331 335. https://doi.org/10.4097/kja.d.18.00292
  • Liddell, T.M., & Kruschke, J.K. (2018). Analyzing ordinal data with metric models: What could possibly go wrong?. Journal of Experimental Social Psychology, 79, 328-348. https://doi.org/10.1016/j.jesp.2018.08.009
  • Ma, C., Wang, X., Xia, L., Cheng, X., & Qiu, L. (2021). Effect of sample size and the traditional parametric, non-parametric, and robust methods on the establishment of reference intervals: Evidence from real-world data. Clinical Biochemistry, 92, 67–70. https://doi.org/10.1016/j.clinbiochem.2021.03.006
  • Nanna, M.J., & Sawilowsky, S.S. (1998). Analysis of Likert scale data in disability and medical rehabilitation research. Psychological Methods, 3(1), 55 67. https://doi.org/10.1037/1082-989X.3.1.55
  • Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in Health Sciences Education, 15, 625-632. https://doi.org/10.1007/s10459-010-9222-y
  • Ruxton, G.D. (2006). The unequal variance Student’s t testis an underused alternative to Student’s t test and the Mann–Whitney U test. Behavioral Ecology, 17(4), 688–690. https://doi.org/10.1093/beheco/ark016
  • Sangthong, M. (2020). The Effect of the Likert Point Scale and Sample Size on the Efficiency of Parametric and Non-parametric Tests. Thailand Statistician, 18(1), 55–64.
  • Schrum, M.L., Johnson, M., Ghuy, M., & Gombolay, M.C. (2020). Four years in review: Statistical practices of Likert scales in human-robot interaction studies. In Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction (pp. 43-52). https://doi.org/10.1145/3371382.3380739
  • Wiedermann, W., & von Eye, A. (2013). Robustness and power of the parametric t-test and the non-parametric Wilcoxon test under non-independence of observations. Psychological Test and Assessment Modeling, 55(1), 39-61.
  • Wilcox, R.R. (2012). Introduction to robust estimation and hypothesis testing (3rd ed.). Academic Press.
  • Wu, H., & Leung, S.O. (2017). Can Likert scales be treated as interval scales? Simulation study. Journal of Social Service Research, 43(4), 527 532. https://doi.org/10.1080/01488376.2017.1329775
  • Zimmerman D. W. (2004). A note on preliminary tests of equality of variances. British Journal of Mathematical and Statistical Psychology, 57, 173 181. https://doi.org/10.1348/000711004849222
  • Zimmerman, D.W. & Zumbo, B.D. (1990) The Relative Power of the Wilcoxon-Mann-Whitney Test and Student t Test Under Simple Bounded Transformations. The Journal of General Psychology, 117(4), 425-436, https://doi.org/10.1080/00221309.1990.9921148
  • Zimmerman, D.W. (1985). Power Functions of the t-test and Mann-Whitney U Test Under Violation of Parametric Assumptions. Perceptual and Motor Skills, 61, 467 - 470. https://doi.org/10.2466/pms.1985.61.2.467
Toplam 37 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Eğitim Üzerine Çalışmalar
Bölüm Makaleler
Yazarlar

Ahmet Salih Şimşek 0000-0002-9764-3285

Yayımlanma Tarihi 20 Mart 2023
Gönderilme Tarihi 3 Ekim 2022
Yayımlandığı Sayı Yıl 2023 Cilt: 10 Sayı: 1

Kaynak Göster

APA Şimşek, A. S. (2023). The power and type I error of Wilcoxon-Mann-Whitney, Welch’s t, and student’s t tests for Likert-type data. International Journal of Assessment Tools in Education, 10(1), 114-128. https://doi.org/10.21449/ijate.1183622

23824         23823             23825