As a father and an avowed market researcher, it is truly a good day when both these roles overlap. They are currently in conjunction as my youngest daughter Lilly is studying statistics as part of her 8th grade mathematics course. All researchers, including those involved in consumer and B2B marketing research, will come across numeric data as part of our quantitative research efforts. This data may derive from survey questions, financial documents, census data, or transactions from our CRM. Regardless of source we must be able to assess the quality and nature of the data.
Identifying outliers is a critical part of any survey data analysis project. Outliers can lead to unstable results from a number of statistical tests, most notably regression analysis. However, the question remains “How do we identify outliers in our data set?”
There is a simple rule you can follow to identify outliers in your quantitative data. If you have ever seen a box-plot then you have seen this rule in action. It is known as the 1.5 IQR Rule. IQR stands for Inter-Quartile Range. This rule says that any observations below or above the boundaries established by the following formulas is an outlier and should be considered for removal or modification before progressing further with your analysis. The boundaries are established as:
Lower: Quartile 1 – (1.5 x IQR)
Upper: Quartile 3 + (1.5 x IQR)
*IQR is defined as the range created by subtracting Quartile 1 from Quartile 3
For example, let’s take a look at salaries for all sales people working for Acme, Inc.
Q1 - $87,000
Q3 - $127,000
IQR - $40,000
Lower: $87,000 – (1.5* $40,000) = $27,000
Upper: $127,000 + (1.5 * $40,000) = $187,000
Sales people earning above $187,000 or below $27,000 would be considered outliers and would need to be reviewed before proceeding. Outliers can be caused through data entry errors, so this is a starting point for follow up. Other options include deletion or modification via mean or median substitution.
Outliers happen and can seriously degrade the results of your analysis therefore it is wise to dig into your quantitative data and review the distributions before proceeding with the fun stuff.