Anova1way¶
Anova1way performs a single factor between analysis of variance. The analysis automatically calculates descriptive, performs the O’Brien test for heterosphericity (H0 is that the variances are equal), performs post-hoc power analyses, and post-hoc pairwise comparisons.
This ANOVA method is robust to non-equivalent sample sizes. The observed power estimates have been validated against G*Power.
Post-hoc comparisons can be made using the Tukey Test or the Newman-Keuls Test. If you are unfamiliar with these post-hoc multiple comparisons the idea is that the ANOVA will tell you if the data suggests that something is going on between the groups (not from the same population), but it doesn’t tell you which groups are different from one another. The post-hoc comparisons compare the groups to one another and try and identify which pairs are different.
Using the Anova1way object directly¶
Example data from .. _Abdi, H. & Williams, L. J. (2010):http://www.utdallas.edu/~herve/abdi-NewmanKeuls2010-pretty.pdf. By default Anova1way will use the Tukey test for pairwise comparisons.
from pyvttbl.stats import Anova1way
d = [[21.0, 20.0, 26.0, 46.0, 35.0, 13.0, 41.0, 30.0, 42.0, 26.0],
[23.0, 30.0, 34.0, 51.0, 20.0, 38.0, 34.0, 44.0, 41.0, 35.0],
[35.0, 35.0, 52.0, 29.0, 54.0, 32.0, 30.0, 42.0, 50.0, 21.0],
[44.0, 40.0, 33.0, 45.0, 45.0, 30.0, 46.0, 34.0, 49.0, 44.0],
[39.0, 44.0, 51.0, 47.0, 50.0, 45.0, 39.0, 51.0, 39.0, 55.0]]
conditions_list = 'Contact Hit Bump Collide Smash'.split()
D=Anova1way()
D.run(d, conditions_list=conditions_list)
print(D)
Anova: Single Factor on Measure
SUMMARY
Groups Count Sum Average Variance
==========================================
Contact 10 300 30 116.444
Hit 10 350 35 86.444
Bump 10 380 38 122.222
Collide 10 410 41 41.556
Smash 10 460 46 33.333
O'BRIEN TEST FOR HOMOGENEITY OF VARIANCE
Source of Variation SS df MS F P-value eta^2 Obs. power
========================================================================================
Treatments 68081.975 4 17020.494 1.859 0.134 0.142 0.498
Error 412050.224 45 9156.672
========================================================================================
Total 480132.199 49
ANOVA
Source of Variation SS df MS F P-value eta^2 Obs. power
============================================================================
Treatments 1460 4 365 4.562 0.004 0.289 0.837
Error 3600 45 80
============================================================================
Total 5060 49
POSTHOC MULTIPLE COMPARISONS
Tukey HSD: Table of q-statistics
Bump Collide Contact Hit Smash
==========================================================
Bump 0 1.061 ns 2.828 ns 1.061 ns 2.828 ns
Collide 0 3.889 + 2.121 ns 1.768 ns
Contact 0 1.768 ns 5.657 **
Hit 0 3.889 +
Smash 0
==========================================================
+ p < .10 (q-critical[5, 45] = 3.59038343675)
* p < .05 (q-critical[5, 45] = 4.01861178004)
** p < .01 (q-critical[5, 45] = 4.89280842987)
Using the Newman-Keuls Test¶
from pyvttbl.stats import Anova1way
d = [[21.0, 20.0, 26.0, 46.0, 35.0, 13.0, 41.0, 30.0, 42.0, 26.0],
[23.0, 30.0, 34.0, 51.0, 20.0, 38.0, 34.0, 44.0, 41.0, 35.0],
[35.0, 35.0, 52.0, 29.0, 54.0, 32.0, 30.0, 42.0, 50.0, 21.0],
[44.0, 40.0, 33.0, 45.0, 45.0, 30.0, 46.0, 34.0, 49.0, 44.0],
[39.0, 44.0, 51.0, 47.0, 50.0, 45.0, 39.0, 51.0, 39.0, 55.0]]
conditions_list = 'Contact Hit Bump Collide Smash'.split()
D=Anova1way()
D.run(d, conditions_list=conditions_list, posthoc='SNK')
print(D)
Anova: Single Factor on Measure
SUMMARY
Groups Count Sum Average Variance
==========================================
Contact 10 300 30 116.444
Hit 10 350 35 86.444
Bump 10 380 38 122.222
Collide 10 410 41 41.556
Smash 10 460 46 33.333
O'BRIEN TEST FOR HOMOGENEITY OF VARIANCE
Source of Variation SS df MS F P-value eta^2 Obs. power
========================================================================================
Treatments 68081.975 4 17020.494 1.859 0.134 0.142 0.498
Error 412050.224 45 9156.672
========================================================================================
Total 480132.199 49
ANOVA
Source of Variation SS df MS F P-value eta^2 Obs. power
============================================================================
Treatments 1460 4 365 4.562 0.004 0.289 0.837
Error 3600 45 80
============================================================================
Total 5060 49
POSTHOC MULTIPLE COMPARISONS
SNK: Step-down table of q-statistics
Pair i |diff| q range df p Sig.
=====================================================================
Contact vs. Smash 1 16.000 5.657 5 45 0.002 **
Collide vs. Contact 2 11.000 3.889 4 45 0.041 *
Hit vs. Smash 3 11.000 3.889 4 45 0.041 *
Bump vs. Smash 4 8.000 2.828 3 45 0.124 ns
Bump vs. Contact 5 8.000 2.828 3 45 0.124 ns
Collide vs. Hit 6 6.000 2.121 2 45 0.141 ns
Collide vs. Smash 7 5.000 - - - - ns
Contact vs. Hit 8 5.000 - - - - ns
Bump vs. Collide 9 3.000 - - - - ns
Bump vs. Hit 10 3.000 - - - - ns
+ p < .10, * p < .05, ** p < .01, *** p < .001
Running Single Factor ANOVA with DataFrame
¶
The examples above pass a list of lists to Anova1Way
. The DataFrame
object
also has a wrapper method for running a single factor ANOVA. It assumes data is in the
stacked format with one observation per row.
Let’s begin by making up some data.
>>> from pyvttbl import DataFrame
>>> from random import random
>>> sample = lambda mult, N : [random()*mult for i in xrange(N)]
>>> df = DataFrame(zip(['IV','DV'], [['A']*10, sample(1, 10)]))
>>> df.attach(DataFrame(zip(['IV','DV'], [['B']*10, sample(2, 10)])))
>>> df.attach(DataFrame(zip(['IV','DV'], [['C']*10, sample(3, 10)])))
>>> print(df)
IV DV
==========
A 0.779
A 0.706
A 0.418
A 0.388
A 0.542
A 0.014
A 0.941
A 0.058
A 0.830
A 0.110
B 1.263
B 1.559
B 1.069
B 1.524
B 1.700
B 1.187
B 1.980
B 1.657
B 1.145
B 0.103
C 2.264
C 1.863
C 2.374
C 0.972
C 2.257
C 0.467
C 1.077
C 1.001
C 2.984
C 2.422
Now we can run the analysis
>>> aov = df.anova1way('DV', 'IV')
>>> print(aov)
Anova: Single Factor on DV
SUMMARY
Groups Count Sum Average Variance
============================================
A 10 4.785 0.478 0.114
B 10 13.185 1.319 0.265
C 10 17.681 1.768 0.685
O'BRIEN TEST FOR HOMOGENEITY OF VARIANCE
Source of Variation SS df MS F P-value eta^2 Obs. power
===============================================================================
Treatments 1.749 2 0.875 3.697 0.038 0.215 0.566
Error 6.388 27 0.237
===============================================================================
Total 8.137 29
ANOVA
Source of Variation SS df MS F P-value eta^2 Obs. power
===================================================================================
Treatments 8.569 2 4.285 12.083 1.787e-04 0.472 0.900
Error 9.574 27 0.355
===================================================================================
Total 18.143 29
POSTHOC MULTIPLE COMPARISONS
Tukey HSD: Table of q-statistics
A B C
===========================
A 0 2.443 ns 3.751 *
B 0 1.308 ns
C 0
===========================
+ p < .10 (q-critical[3, 27] = 3.0301664694)
* p < .05 (q-critical[3, 27] = 3.50576984879)
** p < .01 (q-critical[3, 27] = 4.49413305084)