In statistics, a power transform is a set of functions that are used to stabilize data, usually by transforming it into a closer to normally distributed set. Among these functions, a well-known one is the Box Cox transformation, which was developed by statisticians George Box and David Roxbee Cox in 1964.
Mathematically, the Box Cox transformation depends on the exponent lambda (λ), which can vary from -5 to 5. It was originally defined as:
One parameter transformation:
This transformation is valid for yi ≥ 0.
Similarly, there is a two-parameter transformation that is valid for yi < 0:
Where yi > λ2.
Common Box Cox transformations are:
In order to calculate the optimal transformation, usually several λ values are considered and the function that best approximates to a normal distribution curve is selected.
The Box Cox transformation is widely used when the dataset is not normally distributed but is close to being. For example, when it presents outliers or skewness. As a result, better performances on a wide range of machine learning and deep learning algorithms can be achieved by transforming the dataset to have a more-Gaussian distribution. However, this transformation doesn’t always work, particularly when the data distribution is very different from a Gaussian function.
The Box Cox transformation provides an important tool for modeling. However, trying different coefficients to find the optimal transformation may require time and resources. LogicPlum’s platform resolves this problem through automation, where all calculations are done without human intervention. This allows any user to create models without the need for expertise in mathematics or programming.
For those wanting to read the original paper by Box and Cox:
Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations, Journal of the Royal Statistical Society, Series B, 26, 211-252. Available online here. https://www.ime.usp.br/~abe/lista/pdfQWaCMboK68.pdf
© 2020 LogicPlum, Inc. All rights reserved.