The covariance matrix generalizes the concept of variance to multiple dimensions and captures how all the variables in a dataset may change together.
In statistics, it’s defined as the square matrix containing the covariance between each pair of elements of a vector. This matrix is symmetric, positive semi-definite, and its main diagonal has the variances of each element of the vector. Figure 1 shows it:
Figure 1 – Covariance matrix
V is a covariance matrix
N is the number of scores in each of the n data sets,
n is the number of data sets,
X and Y are the means of the N scores in the jth data sets respectively,
Xi and Yi are the ith raw scores from the jth data sets respectively,
xi and yi are the ith deviation scores from the jth data sets respectively,
i = 1 … N and j = 1 … n.
The sign of the covariance indicates whether the two variables increase together (positive) or decrease together (negative). Although the magnitude of the covariance is difficult to interpret, in general, a value close to zero indicates that the two variables are independent.
The covariance matrix is a basic tool that is applied in many different areas. In statistics, it is used in the principal component analysis method and the Karhunen–Loève transform, which are techniques employed in image processing and data analysis.
It is widely applied in finance, particularly in portfolio theory. And in machine learning, it’s used to decorrelate variables, as a transform to variables, and to generate correlated random variables.
Conceptually speaking, covariance matrices are easy to understand and calculate. However, when the data set includes a large number of features and rows of data, calculations can become quite complex. Added to it, we may need to estimate covariance matrices several times in order to find the correct model for a problem.
These difficulties are solved by LogicPlum’s platform through automation, where users don’t need to intervene as the machine does all the calculations for them.
For those who want to understand the covariance variable and its calculation in Python:
Janakiev, N. Understanding the Covariance Matrix. Available at https://datascienceplus.com/understanding-the-covariance-matrix/
© 2020 LogicPlum, Inc. All rights reserved.