Multivariate Analysis of Variance (MANOVA)

In MANOVA, there are in general g groups of observations, of sizes \({n_1},{n_2},...,{n_g}\), and p variables \({X_1},{X_2},...,{X_p}\) that describe the observations. It is useful to express the variables as deviations, x’s, from the grand mean or centroid (over all groups). The vector of observations of the p variables, for the ith observation in the kth group is \({{\bf{x}}_{ki}},\)and these values can be decomposed into two components:

\[{{\bf{x}}_{ki}} = \left( {{{\bf{m}}_k} - {\bf{m}}} \right) + \left( {{{\bf{X}}_{ki}} - {{\bf{m}}_k}} \right)\]

where \(\left( {{{\bf{m}}_k} - {\bf{m}}} \right)\) is the deviation between the centroid of the kth group and the grand centroid, and \(\left( {{{\bf{X}}_{ki}} - {{\bf{m}}_k}} \right)\) is the deviation between the ith observation in the kth group and the centroid for that group. The first term here could be thought of as analogous to the systematic component of some data, while the second term can be though of as the irregular or unpredictable component. As in univariate analysis of variance, the total sum of squares of the dependent variables (the x’s) can be decomposed into two parts:

\[\sum\limits_{k = 1}^g {\sum\limits_{i = 1}^{{n_k}} {{{\bf{x}}_{ki}}{{{\bf{x'}}}_{ki}}} } = \sum\limits_{k = 1}^g {\sum\limits_{i = 1}^{{n_k}} {\left( {{{\bf{m}}_k} - {\bf{m}}} \right){{\left( {{{\bf{m}}_k} - {\bf{m}}} \right)}^\prime }} } + \sum\limits_{k = 1}^g {\sum\limits_{i = 1}^{{n_k}} {\left( {{{\bf{X}}_{ki}} - {{\bf{m}}_k}} \right){{\left( {{{\bf{X}}_{ki}} - {{\bf{m}}_k}} \right)}^\prime }} } \]

Each of the individual terms is a matrix, e.g.: \[{\bf{T}} = \sum\limits_{k = 1}^g {\sum\limits_{i = 1}^{{n_k}} {{{\bf{x}}_{ki}}{{{\bf{x'}}}_{ki}}} } \]

is the “total” sums-of-squares matrix,

\[{\bf{A}} = \sum\limits_{k = 1}^g {\sum\limits_{i = 1}^{{n_k}} {\left( {{{\bf{m}}_k} - {\bf{m}}} \right){{\left( {{{\bf{m}}_k} - {\bf{m}}} \right)}^\prime }} } \]

is the “among-groups” sum-of-squares matrix, and

\[{\bf{W}} = \sum\limits_{k = 1}^g {\sum\limits_{i = 1}^{{n_k}} {\left( {{{\bf{X}}_{ki}} - {{\bf{m}}_k}} \right){{\left( {{{\bf{X}}_{ki}} - {{\bf{m}}_k}} \right)}^\prime }} } \]

is the “within-groups” sum-of squares matrix, and so

\[{\bf{T}} = {\bf{A}} + {\bf{W}}\]

A statistic that can be used to test the null hypothesis that the individual group centroids (the \({{\bf{m}}_k}\)’s ) are all equal is Wilk’s Lambda,

\[\Lambda = \frac{{\left| {\bf{W}} \right|}}{{\left| {\bf{T}} \right|}}\]

where \(\left| {\bf{W}} \right|\) and \(\left| {\bf{T}} \right|\) are the determinants of the within-group and total sums-of-squares matrices, respectively. As the within-groups sums-of-squares gets smaller relative to the total sums-of-squares, the value of \(\Lambda \)decreases, which in practice also signals a decrease in the P-value of \(\Lambda .\) In other words, as \(\Lambda \) decreases we should be more inclined to reject the null hypothesis that the individual group centroids (the \({{\bf{m}}_k}\)’s) are all equal.