Anova1way

Anova1way performs a single factor between analysis of variance. The analysis automatically calculates descriptive, performs the O’Brien test for heterosphericity (H0 is that the variances are equal), performs post-hoc power analyses, and post-hoc pairwise comparisons.

This ANOVA method is robust to non-equivalent sample sizes. The observed power estimates have been validated against G*Power.

Post-hoc comparisons can be made using the Tukey Test or the Newman-Keuls Test. If you are unfamiliar with these post-hoc multiple comparisons the idea is that the ANOVA will tell you if the data suggests that something is going on between the groups (not from the same population), but it doesn’t tell you which groups are different from one another. The post-hoc comparisons compare the groups to one another and try and identify which pairs are different.

Using the Anova1way object directly

Example data from .. _Abdi, H. & Williams, L. J. (2010):http://www.utdallas.edu/~herve/abdi-NewmanKeuls2010-pretty.pdf. By default Anova1way will use the Tukey test for pairwise comparisons.

from pyvttbl.stats import Anova1way
d = [[21.0, 20.0, 26.0, 46.0, 35.0, 13.0, 41.0, 30.0, 42.0, 26.0],
     [23.0, 30.0, 34.0, 51.0, 20.0, 38.0, 34.0, 44.0, 41.0, 35.0],
     [35.0, 35.0, 52.0, 29.0, 54.0, 32.0, 30.0, 42.0, 50.0, 21.0],
     [44.0, 40.0, 33.0, 45.0, 45.0, 30.0, 46.0, 34.0, 49.0, 44.0],
     [39.0, 44.0, 51.0, 47.0, 50.0, 45.0, 39.0, 51.0, 39.0, 55.0]]
conditions_list = 'Contact Hit Bump Collide Smash'.split()
D=Anova1way()
D.run(d, conditions_list=conditions_list)
print(D)
Anova: Single Factor on Measure

SUMMARY
Groups    Count   Sum   Average   Variance
==========================================
Contact      10   300        30    116.444
Hit          10   350        35     86.444
Bump         10   380        38    122.222
Collide      10   410        41     41.556
Smash        10   460        46     33.333

O'BRIEN TEST FOR HOMOGENEITY OF VARIANCE
Source of Variation       SS       df      MS         F     P-value   eta^2   Obs. power
========================================================================================
Treatments             68081.975    4   17020.494   1.859     0.134   0.142        0.498
Error                 412050.224   45    9156.672
========================================================================================
Total                 480132.199   49

ANOVA
Source of Variation    SS    df   MS      F     P-value   eta^2   Obs. power
============================================================================
Treatments            1460    4   365   4.562     0.004   0.289        0.837
Error                 3600   45    80
============================================================================
Total                 5060   49

POSTHOC MULTIPLE COMPARISONS

Tukey HSD: Table of q-statistics
          Bump   Collide    Contact      Hit       Smash
==========================================================
Bump      0      1.061 ns   2.828 ns   1.061 ns   2.828 ns
Collide          0          3.889 +    2.121 ns   1.768 ns
Contact                     0          1.768 ns   5.657 **
Hit                                    0          3.889 +
Smash                                             0
==========================================================
  + p < .10 (q-critical[5, 45] = 3.59038343675)
  * p < .05 (q-critical[5, 45] = 4.01861178004)
 ** p < .01 (q-critical[5, 45] = 4.89280842987)

Using the Newman-Keuls Test

from pyvttbl.stats import Anova1way
d = [[21.0, 20.0, 26.0, 46.0, 35.0, 13.0, 41.0, 30.0, 42.0, 26.0],
     [23.0, 30.0, 34.0, 51.0, 20.0, 38.0, 34.0, 44.0, 41.0, 35.0],
     [35.0, 35.0, 52.0, 29.0, 54.0, 32.0, 30.0, 42.0, 50.0, 21.0],
     [44.0, 40.0, 33.0, 45.0, 45.0, 30.0, 46.0, 34.0, 49.0, 44.0],
     [39.0, 44.0, 51.0, 47.0, 50.0, 45.0, 39.0, 51.0, 39.0, 55.0]]
conditions_list = 'Contact Hit Bump Collide Smash'.split()
D=Anova1way()
D.run(d, conditions_list=conditions_list, posthoc='SNK')
print(D)
Anova: Single Factor on Measure

SUMMARY
Groups    Count   Sum   Average   Variance
==========================================
Contact      10   300        30    116.444
Hit          10   350        35     86.444
Bump         10   380        38    122.222
Collide      10   410        41     41.556
Smash        10   460        46     33.333

O'BRIEN TEST FOR HOMOGENEITY OF VARIANCE
Source of Variation       SS       df      MS         F     P-value   eta^2   Obs. power
========================================================================================
Treatments             68081.975    4   17020.494   1.859     0.134   0.142        0.498
Error                 412050.224   45    9156.672
========================================================================================
Total                 480132.199   49

ANOVA
Source of Variation    SS    df   MS      F     P-value   eta^2   Obs. power
============================================================================
Treatments            1460    4   365   4.562     0.004   0.289        0.837
Error                 3600   45    80
============================================================================
Total                 5060   49

POSTHOC MULTIPLE COMPARISONS

SNK: Step-down table of q-statistics
       Pair           i    |diff|     q     range   df     p     Sig.
=====================================================================
Contact vs. Smash      1   16.000   5.657       5   45   0.002   **
Collide vs. Contact    2   11.000   3.889       4   45   0.041   *
Hit vs. Smash          3   11.000   3.889       4   45   0.041   *
Bump vs. Smash         4    8.000   2.828       3   45   0.124   ns
Bump vs. Contact       5    8.000   2.828       3   45   0.124   ns
Collide vs. Hit        6    6.000   2.121       2   45   0.141   ns
Collide vs. Smash      7    5.000       -       -    -       -   ns
Contact vs. Hit        8    5.000       -       -    -       -   ns
Bump vs. Collide       9    3.000       -       -    -       -   ns
Bump vs. Hit          10    3.000       -       -    -       -   ns
  + p < .10,   * p < .05,   ** p < .01,   *** p < .001

Running Single Factor ANOVA with DataFrame

The examples above pass a list of lists to Anova1Way. The DataFrame object also has a wrapper method for running a single factor ANOVA. It assumes data is in the stacked format with one observation per row.

Let’s begin by making up some data.

>>> from pyvttbl import DataFrame
>>> from random import random
>>> sample = lambda mult, N : [random()*mult for i in xrange(N)]
>>> df = DataFrame(zip(['IV','DV'], [['A']*10, sample(1, 10)]))
>>> df.attach(DataFrame(zip(['IV','DV'], [['B']*10, sample(2, 10)])))
>>> df.attach(DataFrame(zip(['IV','DV'], [['C']*10, sample(3, 10)])))
>>> print(df)
IV    DV
==========
A    0.779
A    0.706
A    0.418
A    0.388
A    0.542
A    0.014
A    0.941
A    0.058
A    0.830
A    0.110
B    1.263
B    1.559
B    1.069
B    1.524
B    1.700
B    1.187
B    1.980
B    1.657
B    1.145
B    0.103
C    2.264
C    1.863
C    2.374
C    0.972
C    2.257
C    0.467
C    1.077
C    1.001
C    2.984
C    2.422

Now we can run the analysis

>>> aov = df.anova1way('DV', 'IV')
>>> print(aov)
Anova: Single Factor on DV

SUMMARY
Groups   Count    Sum     Average   Variance
============================================
A           10    4.785     0.478      0.114
B           10   13.185     1.319      0.265
C           10   17.681     1.768      0.685

O'BRIEN TEST FOR HOMOGENEITY OF VARIANCE
Source of Variation    SS     df    MS       F     P-value   eta^2   Obs. power
===============================================================================
Treatments            1.749    2   0.875   3.697     0.038   0.215        0.566
Error                 6.388   27   0.237
===============================================================================
Total                 8.137   29

ANOVA
Source of Variation     SS     df    MS       F       P-value    eta^2   Obs. power
===================================================================================
Treatments             8.569    2   4.285   12.083   1.787e-04   0.472        0.900
Error                  9.574   27   0.355
===================================================================================
Total                 18.143   29

POSTHOC MULTIPLE COMPARISONS

Tukey HSD: Table of q-statistics
    A      B          C
===========================
A   0   2.443 ns   3.751 *
B       0          1.308 ns
C                  0
===========================
  + p < .10 (q-critical[3, 27] = 3.0301664694)
  * p < .05 (q-critical[3, 27] = 3.50576984879)
 ** p < .01 (q-critical[3, 27] = 4.49413305084)