Outliers

What is an outlier in data?

Intuitively, an outlier can be seen as a value that is much smaller or larger than most of the other values in a data set. More formally, an outlier is an observation that lies at an abnormal distance from the other values in a random sample taken from a population.

There is some subjectivity in this definition, as “abnormal distance” can be different for different people and for different problems. Therefore, before identifying outliers, it is necessary to define what a “normal distance” is.

There are several ways to identify an outlier. From plain observation to more systematic techniques, such as using scatter plots, box plots, the Grubbs’ test (when we know that the distribution is normal), or the Mahalanobis Distance.

It should be noticed that an outlier is different from noise. While an outlier is data that differs from the majority and can be compared to the other values, noise represents a random error.

There are several ways to classify an outlier. Generally, it can be categorized according to the number of variables where it occurs as univariate (one variable) or multivariate (two or more variables). According to the environment, an outlier can be global (a data point considered different from the rest of the data), contextual (differences are defined according to a specific context), and collective (outliers belong to a subset of data that is different from the rest of the data).

Why are outliers in data important?

The identification of outliers is an important step in data analysis. For example, an outlier may indicate a wrong value due to an experimental problem or incorrect coding. But an outlier is not always a negative thing: it may indicate something unexpected that can be confirmed by further studies, or it may reveal errors in our model.

Outliers in data + LogicPlum

Although expert knowledge may not be necessary to identify some outliers, their interpretation may require so. That is why LogicPlum not only provides a platform for modeling but also expert advice in the matter. Our philosophy is “working together” to attain the best results.

Additional Resources

For those wanting to expand their knowledge, a good reference book is

Hawkins, D.M. (1980) Identification of outliers. London: Chapman and Hall

Why LogicPlum?

Demo

Be an AI company

Read ur latest blog

On Purpose Design

Hacker Safe

Interactive User Guides

Origin

Perpetual Process Docs

AI Cloud

Service Blueprint

Collaborative Help Desk

High Impact Development

Momentum Reports

Pattern Library

Swift Project Management

Roadmap Live

Recent Posts

Case Studies

Knowledge Base

Blog

Research

Recent Posts

About Us

Contact Us

Newsroom

Careers

Media Kit

Recent Posts

Outliers

What is an outlier in data?

Why are outliers in data important?

Outliers in data + LogicPlum

Additional Resources

Share This Article :