Redalyc.Positively Skewed Data: Revisiting the Box-Cox Power









Data Analysis Toolkit #3: Tools for Transforming Data Page 1

data are right-skewed (clustered at lower values) move down the ladder of powers (that is try square root
Toolkit


Acces PDF Transforming Variables For Normality And Sas Support

il y a 6 jours How To Log Transform Data In SPSS ... Data Transformation for Skewed Variables ... Transforming a right skewed distribution. (log and square ...


Preferring Box-Cox transformation instead of log transformation to

14 avr. 2022 Log-transformed data may not be normally distributed or the previously right-skewed data may end up as left- skewed.48 In such a situation


Redalyc.Positively Skewed Data: Revisiting the Box-Cox Power

For instance a logarithmic transformation is recommended for positively skewed data





Log-transformation and its implications for data analysis

15 mai 2014 Thus the log-transformation actually exacerbated the problem of skewness in this particular example. In general


Log-transformation and its implications for data analysis

15 mai 2014 Thus the log-transformation actually exacerbated the problem of skewness in this particular example. In general


Computing the central location of immunofluorescence distributions

At the same time they effect a log transformation of the data. This contain distributions with left skewness


Positively Skewed Data: Revisiting the Box-Cox Power

for choosing the most appropriate transformation. For instance a logarithmic transformation is recommended for positively skewed data





Data pre-processing for k- means clustering

Symmetric distribution of variables (not skewed) Skewed variables. Left-skewed. Right-skewed ... Logarithmic transformation (positive values only).
chapter


Modeling Length of Stay in Hospital and Other Right Skewed Data

mance of OLS regression with a log-transformation and gamma regression with a log-link function on nonzero and right skewed data.
pdf?md = bbdc dc f bd b c cd fc efac&pid= s . S main


213356 Redalyc.Positively Skewed Data: Revisiting the Box-Cox Power

International Journal of Psychological

Research

ISSN:

2011-2084

ijpr@usbmed.edu.co

Universidad de San Buenaventura

Colombia

Olivier, Jake; Norberg, Melissa M.

Positively Skewed Data: Revisiting the Box-Cox Power Transformation.

International Journal of Psychological Research,

vol. 3, núm. 1 , 2010 , pp. 68-95

Universidad de San Buenaventura

Medellín, Colombia

Available in: http://www.redalyc.org/articulo.oa?id=299023509016

Scientific Information System

Network of Scientific Journals from Latin America, the Caribbean, Spain and Portugal Non-profit academic project, developed under the open access initiative International Journal of Psychological Research, 2010.Vol. 3.No. 1.

ISSN impresa (printed) 2011-2084

ISSN electrónica (electronic) 2011-2079 Olivier J., Norberg, M. M., (2010). Positively Skewed Data: Revisiting the

Box-Cox Power Transformation. International Journal of Psychological

Research, 3(1), 68-75.

68 International Journal of Psychological Research

Positively Skewed Data: Revisiting the Box-Cox Power Transformation. Datos positivamente asimétricos: revisando la transformación Box-Cox.

Jake Olivier

University of New South Wales

Melissa M. Norberg

University of New South Wales

ABSTRACT

Although the normal probability distribution is the cornerstone of applying statistical methodology; data do not

always meet the necessary normal distribution assumptions. In these cases, researchers often transform non-normal data to a

distribution that is approximately normal. Power transformations constitute a family of transformations, which include

logarithmic and fractional exponent transforms. The Box-Cox method offers a simple method for choosing the most

appropriate power transformation. Another option for data that is positively skewed, often used when measuring reaction

times, is the Ex-Gaussian distribution which is a combination of the exponential and normal distributions. In this paper, the

Box-Cox power transformation and Ex-Gaussian distribution will be discussed and compared in the context of positively

skewed data. This discussion will demonstrate that the Box-Cox power transformation is simpler to apply and easier to

interpret than the Ex-Gaussian distribution.

Key words:Logarithmic transformations, geometric mean analysis, ex-Gaussian distribution, log-normal

distribution.

RESUMEN

Aunque la distribución normal es la piedra angular de las aplicaciones estadísticas, los datos no siempre se ajustan

a los criterios de la distribución normal. En tales casos, los investigadores a menudo transforman los datos no normales en

datos que siguen una distribución aproximadamente normal. Las transformaciones de potencia constituyen una familia de

transformaciones que incluye las transformaciones logarítmicas y fraccional exponente. El método de Box-Cox ofrece un

método simple para elegir la transformación de potencia más apropiada. Otra opción que usa cuando los datos son

positivamente asimétricos, e.g., los tiempos de reacción, es la distribución Ex-Gaussiana que es una combinación de las

distribuciones exponenciales y normal. En este artículo, se discuten la transformación de potencia Box-Cox y la distribución

Ex-Gaussiana en relación con datos positivamente asimétricos. La discusión demuestra que la transformación Box-Cox es

más sencilla de aplicar e interpretar que la distribución Ex-Gaussiana.

Palabras clave:transformaciones logarítmicas, análisis de la media geométrica, distribución exponencial Gaussiana,

distribución logarítmica normal.

Article received/Artículo recibido: December 15, 2009/Diciembre15, 2009, Article accepted/Artículo aceptado: March 15, 2009/Marzo15/2009

Dirección correspondencia/Mail Address: j.olivier@unsw.edu.au

Jake Olivier, School of Mathematics and Statistics,NSWInjury Risk Management Research Centre, University of New South Wales, Sydney NSW 2052, Australia,Email:

j.olivier@unsw.edu.au

Melissa M. Norberg,National Cannabis Prevention and Information Centre, University of New South Wales, Randwick NSW 2031, Australia

INTERNATIONAL JOURNAL OF PSYCHOLOGICAL RESEARCH esta incluida en PSERINFO, CENTRO DE INFORMACION PSICOLOGICA DE COLOMBIA,

OPEN JOURNAL SYSTEM, BIBLIOTECA VIRTUAL DE PSICOLOGIA (ULAPSY-BIREME), DIALNET y GOOGLE SCHOLARS. Algunos de sus articulos aparecen en

SOCIAL SCIENCE RESEARCH NETWORK y está en proceso de inclusion en diversas fuentes y bases de datos internacionales.

INTERNATIONAL JOURNAL OF PSYCHOLOGICAL RESEARCH is included in PSERINFO, CENTRO DE INFORMACIÓN PSICOLÓGICA DE COLOMBIA, OPEN

JOURNAL SYSTEM, BIBLIOTECA VIRTUAL DE PSICOLOGIA (ULAPSY-BIREME ), DIALNET and GOOGLE SCHOLARS. Some of its articles are in SOCIAL

SCIENCE RESEARCH NETWORK, and it is in the process of inclusion in a variety of sources and international databases.

International Journal of Psychological Research, 2010.Vol. 3.No. 1.

ISSN impresa (printed) 2011-2084

ISSN electrónica (electronic) 2011-2079 Olivier J., Norberg, M. M., (2010). Positively Skewed Data: Revisiting the Box-

Cox Power Transformation. International Journal of Psychological Research,

3(1), 68-75.

International Journal of Psychological Research 69

INTRODUCTION

Numerous significance tests assume data are

normally distributed such as t-tests, chi-square tests and F- tests. This is often reasonable, as many real-world measurements/observations follow a normal distribution; however, there are several situations in which the normal distribution assumption is not appropriate (e.g., immunologic and reaction time data). In these cases, data transformation is a common technique used to modify non- normal data to a distribution that makes the normal assumption more reasonable and, in turn, makes significance tests based on a normal assumption more appropriate (Olivier, Johnson & Marshall, 2008).

The normal distribution, sometimes called a

Gaussian distribution, is characterised by a symmetric, bell-like shape. Significance tests based on a normal assumption are not appropriate for asymmetric data. This is because skewness is most reflected in the tails of distributions, which are where p-values are calculated. This usually results in p-values that are less than expected (and thus more likely to be incorrectly significant). When a data set does not follow the shape of a normal distribution, an appropriate function is sometimes chosen that transforms the data to a distribution that is reasonably normal-shaped.

A common misconception in statistics is that data

must be sampled from a normal distribution for significance tests based on a normal assumption to be appropriate. In truth, the normal assumption applies to the distribution of the sample mean , called a sampling distribution, and not the distribution from which the data are sampled. There is partial truth in the misconception because data that are sampled from a normal distribution implies that is also normally distributed. When data is not sampled from a normal distribution, the central limit theorem (CLT) ensures that is approximately normally distributed for a large enough sample size. Although many texts mention sample sizes above constitutes a large enough sample size for the CLT to apply, there exists no "magic" sample size for every situation. There are instances where data sampled from a highly skewed data set would require a sample of size of 1000 for to be approximately normally distributed. In practice, a data analyst would be wise to check for normality with an understanding that data need only look reasonably normal to be suitable. A visual method for testing normality will be discussed later in the paper. There are a few broad, well-established guidelines for choosing the most appropriate transformation. For instance, a logarithmic transformation is recommended for

positively skewed data, while negatively skewed data is often transformed using the square root function. However,

these guidelines are not appropriate for every situation. Box and Cox (1964) introduced a method for choosing the best transformation from a family of power transformations. In this instance, the data is raised to a power chosen to best approximate a normal distribution. If the data has been transformed to a distribution that is reasonably normal, the data analyst would then perform significance test(s) on the transformed data using methods based on a normal assumption. The ex-Gaussian distribution is another method, often used for non-negative, positively skewed data. This distribution is defined by three parameters; and, an ex- Gaussian analysis involves the estimation of these parameters usually by either method of moments or maximum likelihood estimation (Heathcote, 1996). It should be noted that this is only a method for estimating a probability distribution and does not lend itself easily to statistical inference.

Another option is to utilise nonparametric

statistical techniques such as Mann-Whitney or Kruskal- Wallis tests. Nonparametric methods do not make explicit distributional assumptions and are less powerful than parametric tests when a distributional assumption is reasonable (nonparametric tests are more powerful when distributional assumptions are inappropriate for parametric analysis). A full discussion of nonparametric statistics is beyond the scope of this paper. Conover (1998) provides a good overview of nonparametric statistics. This paper will first introduce basic statistical methodology that is essential in understanding the transformation of data and the ex-Gaussian distribution, then the Box-Cox family of transformations and methods

International Journal of Psychological

Research

ISSN:

2011-2084

ijpr@usbmed.edu.co

Universidad de San Buenaventura

Colombia

Olivier, Jake; Norberg, Melissa M.

Positively Skewed Data: Revisiting the Box-Cox Power Transformation.

International Journal of Psychological Research,

vol. 3, núm. 1 , 2010 , pp. 68-95

Universidad de San Buenaventura

Medellín, Colombia

Available in: http://www.redalyc.org/articulo.oa?id=299023509016

Scientific Information System

Network of Scientific Journals from Latin America, the Caribbean, Spain and Portugal Non-profit academic project, developed under the open access initiative International Journal of Psychological Research, 2010.Vol. 3.No. 1.

ISSN impresa (printed) 2011-2084

ISSN electrónica (electronic) 2011-2079 Olivier J., Norberg, M. M., (2010). Positively Skewed Data: Revisiting the

Box-Cox Power Transformation. International Journal of Psychological

Research, 3(1), 68-75.

68 International Journal of Psychological Research

Positively Skewed Data: Revisiting the Box-Cox Power Transformation. Datos positivamente asimétricos: revisando la transformación Box-Cox.

Jake Olivier

University of New South Wales

Melissa M. Norberg

University of New South Wales

ABSTRACT

Although the normal probability distribution is the cornerstone of applying statistical methodology; data do not

always meet the necessary normal distribution assumptions. In these cases, researchers often transform non-normal data to a

distribution that is approximately normal. Power transformations constitute a family of transformations, which include

logarithmic and fractional exponent transforms. The Box-Cox method offers a simple method for choosing the most

appropriate power transformation. Another option for data that is positively skewed, often used when measuring reaction

times, is the Ex-Gaussian distribution which is a combination of the exponential and normal distributions. In this paper, the

Box-Cox power transformation and Ex-Gaussian distribution will be discussed and compared in the context of positively

skewed data. This discussion will demonstrate that the Box-Cox power transformation is simpler to apply and easier to

interpret than the Ex-Gaussian distribution.

Key words:Logarithmic transformations, geometric mean analysis, ex-Gaussian distribution, log-normal

distribution.

RESUMEN

Aunque la distribución normal es la piedra angular de las aplicaciones estadísticas, los datos no siempre se ajustan

a los criterios de la distribución normal. En tales casos, los investigadores a menudo transforman los datos no normales en

datos que siguen una distribución aproximadamente normal. Las transformaciones de potencia constituyen una familia de

transformaciones que incluye las transformaciones logarítmicas y fraccional exponente. El método de Box-Cox ofrece un

método simple para elegir la transformación de potencia más apropiada. Otra opción que usa cuando los datos son

positivamente asimétricos, e.g., los tiempos de reacción, es la distribución Ex-Gaussiana que es una combinación de las

distribuciones exponenciales y normal. En este artículo, se discuten la transformación de potencia Box-Cox y la distribución

Ex-Gaussiana en relación con datos positivamente asimétricos. La discusión demuestra que la transformación Box-Cox es

más sencilla de aplicar e interpretar que la distribución Ex-Gaussiana.

Palabras clave:transformaciones logarítmicas, análisis de la media geométrica, distribución exponencial Gaussiana,

distribución logarítmica normal.

Article received/Artículo recibido: December 15, 2009/Diciembre15, 2009, Article accepted/Artículo aceptado: March 15, 2009/Marzo15/2009

Dirección correspondencia/Mail Address: j.olivier@unsw.edu.au

Jake Olivier, School of Mathematics and Statistics,NSWInjury Risk Management Research Centre, University of New South Wales, Sydney NSW 2052, Australia,Email:

j.olivier@unsw.edu.au

Melissa M. Norberg,National Cannabis Prevention and Information Centre, University of New South Wales, Randwick NSW 2031, Australia

INTERNATIONAL JOURNAL OF PSYCHOLOGICAL RESEARCH esta incluida en PSERINFO, CENTRO DE INFORMACION PSICOLOGICA DE COLOMBIA,

OPEN JOURNAL SYSTEM, BIBLIOTECA VIRTUAL DE PSICOLOGIA (ULAPSY-BIREME), DIALNET y GOOGLE SCHOLARS. Algunos de sus articulos aparecen en

SOCIAL SCIENCE RESEARCH NETWORK y está en proceso de inclusion en diversas fuentes y bases de datos internacionales.

INTERNATIONAL JOURNAL OF PSYCHOLOGICAL RESEARCH is included in PSERINFO, CENTRO DE INFORMACIÓN PSICOLÓGICA DE COLOMBIA, OPEN

JOURNAL SYSTEM, BIBLIOTECA VIRTUAL DE PSICOLOGIA (ULAPSY-BIREME ), DIALNET and GOOGLE SCHOLARS. Some of its articles are in SOCIAL

SCIENCE RESEARCH NETWORK, and it is in the process of inclusion in a variety of sources and international databases.

International Journal of Psychological Research, 2010.Vol. 3.No. 1.

ISSN impresa (printed) 2011-2084

ISSN electrónica (electronic) 2011-2079 Olivier J., Norberg, M. M., (2010). Positively Skewed Data: Revisiting the Box-

Cox Power Transformation. International Journal of Psychological Research,

3(1), 68-75.

International Journal of Psychological Research 69

INTRODUCTION

Numerous significance tests assume data are

normally distributed such as t-tests, chi-square tests and F- tests. This is often reasonable, as many real-world measurements/observations follow a normal distribution; however, there are several situations in which the normal distribution assumption is not appropriate (e.g., immunologic and reaction time data). In these cases, data transformation is a common technique used to modify non- normal data to a distribution that makes the normal assumption more reasonable and, in turn, makes significance tests based on a normal assumption more appropriate (Olivier, Johnson & Marshall, 2008).

The normal distribution, sometimes called a

Gaussian distribution, is characterised by a symmetric, bell-like shape. Significance tests based on a normal assumption are not appropriate for asymmetric data. This is because skewness is most reflected in the tails of distributions, which are where p-values are calculated. This usually results in p-values that are less than expected (and thus more likely to be incorrectly significant). When a data set does not follow the shape of a normal distribution, an appropriate function is sometimes chosen that transforms the data to a distribution that is reasonably normal-shaped.

A common misconception in statistics is that data

must be sampled from a normal distribution for significance tests based on a normal assumption to be appropriate. In truth, the normal assumption applies to the distribution of the sample mean , called a sampling distribution, and not the distribution from which the data are sampled. There is partial truth in the misconception because data that are sampled from a normal distribution implies that is also normally distributed. When data is not sampled from a normal distribution, the central limit theorem (CLT) ensures that is approximately normally distributed for a large enough sample size. Although many texts mention sample sizes above constitutes a large enough sample size for the CLT to apply, there exists no "magic" sample size for every situation. There are instances where data sampled from a highly skewed data set would require a sample of size of 1000 for to be approximately normally distributed. In practice, a data analyst would be wise to check for normality with an understanding that data need only look reasonably normal to be suitable. A visual method for testing normality will be discussed later in the paper. There are a few broad, well-established guidelines for choosing the most appropriate transformation. For instance, a logarithmic transformation is recommended for

positively skewed data, while negatively skewed data is often transformed using the square root function. However,

these guidelines are not appropriate for every situation. Box and Cox (1964) introduced a method for choosing the best transformation from a family of power transformations. In this instance, the data is raised to a power chosen to best approximate a normal distribution. If the data has been transformed to a distribution that is reasonably normal, the data analyst would then perform significance test(s) on the transformed data using methods based on a normal assumption. The ex-Gaussian distribution is another method, often used for non-negative, positively skewed data. This distribution is defined by three parameters; and, an ex- Gaussian analysis involves the estimation of these parameters usually by either method of moments or maximum likelihood estimation (Heathcote, 1996). It should be noted that this is only a method for estimating a probability distribution and does not lend itself easily to statistical inference.

Another option is to utilise nonparametric

statistical techniques such as Mann-Whitney or Kruskal- Wallis tests. Nonparametric methods do not make explicit distributional assumptions and are less powerful than parametric tests when a distributional assumption is reasonable (nonparametric tests are more powerful when distributional assumptions are inappropriate for parametric analysis). A full discussion of nonparametric statistics is beyond the scope of this paper. Conover (1998) provides a good overview of nonparametric statistics. This paper will first introduce basic statistical methodology that is essential in understanding the transformation of data and the ex-Gaussian distribution, then the Box-Cox family of transformations and methods
  1. log transformation for negatively skewed data
  2. log transform right skewed data
  3. log transform left skewed data
  4. log transformation for left skewed data
  5. log transform negatively skewed data