Anova1way ========================================== `Anova1way` performs a single factor between analysis of variance. The analysis automatically calculates descriptive, performs the O'Brien test for heterosphericity (H0 is that the variances are equal), performs post-hoc power analyses, and post-hoc pairwise comparisons. This ANOVA method is robust to non-equivalent sample sizes. The observed power estimates have been validated against `G*Power `_. Post-hoc comparisons can be made using the Tukey Test or the Newman-Keuls Test. If you are unfamiliar with these post-hoc multiple comparisons the idea is that the ANOVA will tell you if the data suggests that something is going on between the groups (not from the same population), but it doesn't tell you which groups are different from one another. The post-hoc comparisons compare the groups to one another and try and identify which pairs are different. Using the Anova1way object directly ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Example data from .. _Abdi, H. & Williams, L. J. (2010):http://www.utdallas.edu/~herve/abdi-NewmanKeuls2010-pretty.pdf. By default Anova1way will use the Tukey test for pairwise comparisons. .. sourcecode:: python from pyvttbl.stats import Anova1way d = [[21.0, 20.0, 26.0, 46.0, 35.0, 13.0, 41.0, 30.0, 42.0, 26.0], [23.0, 30.0, 34.0, 51.0, 20.0, 38.0, 34.0, 44.0, 41.0, 35.0], [35.0, 35.0, 52.0, 29.0, 54.0, 32.0, 30.0, 42.0, 50.0, 21.0], [44.0, 40.0, 33.0, 45.0, 45.0, 30.0, 46.0, 34.0, 49.0, 44.0], [39.0, 44.0, 51.0, 47.0, 50.0, 45.0, 39.0, 51.0, 39.0, 55.0]] conditions_list = 'Contact Hit Bump Collide Smash'.split() D=Anova1way() D.run(d, conditions_list=conditions_list) print(D) :: Anova: Single Factor on Measure SUMMARY Groups Count Sum Average Variance ========================================== Contact 10 300 30 116.444 Hit 10 350 35 86.444 Bump 10 380 38 122.222 Collide 10 410 41 41.556 Smash 10 460 46 33.333 O'BRIEN TEST FOR HOMOGENEITY OF VARIANCE Source of Variation SS df MS F P-value eta^2 Obs. power ======================================================================================== Treatments 68081.975 4 17020.494 1.859 0.134 0.142 0.498 Error 412050.224 45 9156.672 ======================================================================================== Total 480132.199 49 ANOVA Source of Variation SS df MS F P-value eta^2 Obs. power ============================================================================ Treatments 1460 4 365 4.562 0.004 0.289 0.837 Error 3600 45 80 ============================================================================ Total 5060 49 POSTHOC MULTIPLE COMPARISONS Tukey HSD: Table of q-statistics Bump Collide Contact Hit Smash ========================================================== Bump 0 1.061 ns 2.828 ns 1.061 ns 2.828 ns Collide 0 3.889 + 2.121 ns 1.768 ns Contact 0 1.768 ns 5.657 ** Hit 0 3.889 + Smash 0 ========================================================== + p < .10 (q-critical[5, 45] = 3.59038343675) * p < .05 (q-critical[5, 45] = 4.01861178004) ** p < .01 (q-critical[5, 45] = 4.89280842987) Using the Newman-Keuls Test ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. sourcecode:: python from pyvttbl.stats import Anova1way d = [[21.0, 20.0, 26.0, 46.0, 35.0, 13.0, 41.0, 30.0, 42.0, 26.0], [23.0, 30.0, 34.0, 51.0, 20.0, 38.0, 34.0, 44.0, 41.0, 35.0], [35.0, 35.0, 52.0, 29.0, 54.0, 32.0, 30.0, 42.0, 50.0, 21.0], [44.0, 40.0, 33.0, 45.0, 45.0, 30.0, 46.0, 34.0, 49.0, 44.0], [39.0, 44.0, 51.0, 47.0, 50.0, 45.0, 39.0, 51.0, 39.0, 55.0]] conditions_list = 'Contact Hit Bump Collide Smash'.split() D=Anova1way() D.run(d, conditions_list=conditions_list, posthoc='SNK') print(D) :: Anova: Single Factor on Measure SUMMARY Groups Count Sum Average Variance ========================================== Contact 10 300 30 116.444 Hit 10 350 35 86.444 Bump 10 380 38 122.222 Collide 10 410 41 41.556 Smash 10 460 46 33.333 O'BRIEN TEST FOR HOMOGENEITY OF VARIANCE Source of Variation SS df MS F P-value eta^2 Obs. power ======================================================================================== Treatments 68081.975 4 17020.494 1.859 0.134 0.142 0.498 Error 412050.224 45 9156.672 ======================================================================================== Total 480132.199 49 ANOVA Source of Variation SS df MS F P-value eta^2 Obs. power ============================================================================ Treatments 1460 4 365 4.562 0.004 0.289 0.837 Error 3600 45 80 ============================================================================ Total 5060 49 POSTHOC MULTIPLE COMPARISONS SNK: Step-down table of q-statistics Pair i |diff| q range df p Sig. ===================================================================== Contact vs. Smash 1 16.000 5.657 5 45 0.002 ** Collide vs. Contact 2 11.000 3.889 4 45 0.041 * Hit vs. Smash 3 11.000 3.889 4 45 0.041 * Bump vs. Smash 4 8.000 2.828 3 45 0.124 ns Bump vs. Contact 5 8.000 2.828 3 45 0.124 ns Collide vs. Hit 6 6.000 2.121 2 45 0.141 ns Collide vs. Smash 7 5.000 - - - - ns Contact vs. Hit 8 5.000 - - - - ns Bump vs. Collide 9 3.000 - - - - ns Bump vs. Hit 10 3.000 - - - - ns + p < .10, * p < .05, ** p < .01, *** p < .001 Running Single Factor ANOVA with :class:`DataFrame` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The examples above pass a list of lists to :class:`Anova1Way`. The :class:`DataFrame` object also has a wrapper method for running a single factor ANOVA. It assumes data is in the stacked format with one observation per row. Let's begin by making up some data. >>> from pyvttbl import DataFrame >>> from random import random >>> sample = lambda mult, N : [random()*mult for i in xrange(N)] >>> df = DataFrame(zip(['IV','DV'], [['A']*10, sample(1, 10)])) >>> df.attach(DataFrame(zip(['IV','DV'], [['B']*10, sample(2, 10)]))) >>> df.attach(DataFrame(zip(['IV','DV'], [['C']*10, sample(3, 10)]))) >>> print(df) IV DV ========== A 0.779 A 0.706 A 0.418 A 0.388 A 0.542 A 0.014 A 0.941 A 0.058 A 0.830 A 0.110 B 1.263 B 1.559 B 1.069 B 1.524 B 1.700 B 1.187 B 1.980 B 1.657 B 1.145 B 0.103 C 2.264 C 1.863 C 2.374 C 0.972 C 2.257 C 0.467 C 1.077 C 1.001 C 2.984 C 2.422 Now we can run the analysis >>> aov = df.anova1way('DV', 'IV') >>> print(aov) :: Anova: Single Factor on DV SUMMARY Groups Count Sum Average Variance ============================================ A 10 4.785 0.478 0.114 B 10 13.185 1.319 0.265 C 10 17.681 1.768 0.685 O'BRIEN TEST FOR HOMOGENEITY OF VARIANCE Source of Variation SS df MS F P-value eta^2 Obs. power =============================================================================== Treatments 1.749 2 0.875 3.697 0.038 0.215 0.566 Error 6.388 27 0.237 =============================================================================== Total 8.137 29 ANOVA Source of Variation SS df MS F P-value eta^2 Obs. power =================================================================================== Treatments 8.569 2 4.285 12.083 1.787e-04 0.472 0.900 Error 9.574 27 0.355 =================================================================================== Total 18.143 29 POSTHOC MULTIPLE COMPARISONS Tukey HSD: Table of q-statistics A B C =========================== A 0 2.443 ns 3.751 * B 0 1.308 ns C 0 =========================== + p < .10 (q-critical[3, 27] = 3.0301664694) * p < .05 (q-critical[3, 27] = 3.50576984879) ** p < .01 (q-critical[3, 27] = 4.49413305084)