Detect outliers using boxplot methods. Boxplots are a popular and an easy method for identifying outliers. There are two categories of outlier: (1) outliers and (2) extreme points.

Values above `Q3 + 1.5xIQR`

or below `Q1 - 1.5xIQR`

are considered
as outliers. Values above `Q3 + 3xIQR`

or below `Q1 - 3xIQR`

are
considered as extreme points (or extreme outliers).

Q1 and Q3 are the first and third quartile, respectively. IQR is the interquartile range (IQR = Q3 - Q1).

Generally speaking, data points that are labelled outliers in boxplots are
not considered as troublesome as those considered extreme points and might
even be ignored. Note that, any `NA`

and `NaN`

are automatically removed
before the quantiles are computed.

identify_outliers(data, ..., variable = NULL) is_outlier(x, coef = 1.5) is_extreme(x)

data | a data frame |
---|---|

... | One unquoted expressions (or variable name). Used to select a
variable of interest. Alternative to the argument |

variable | variable name for detecting outliers |

x | a numeric vector |

coef | coefficient specifying how far the outlier should be from the edge of their box. Possible values are 1.5 (for outlier) and 3 (for extreme points only). Default is 1.5 |

`identify_outliers()`

. Returns the input data frame with two additional columns: "is.outlier" and "is.extreme", which hold logical values.`is_outlier() and is_extreme()`

. Returns logical vectors.

`identify_outliers`

: takes a data frame and extract rows suspected as outliers according to a numeric column. The following columns are added "is.outlier" and "is.extreme".`is_outlier`

: detect outliers in a numeric vector. Returns logical vector.`is_extreme`

: detect extreme points in a numeric vector. An alias of`is_outlier()`

, where coef = 3. Returns logical vector.

# Generate a demo data set.seed(123) demo.data <- data.frame( sample = 1:20, score = c(rnorm(19, mean = 5, sd = 2), 50), gender = rep(c("Male", "Female"), each = 10) ) # Identify outliers according to the variable score demo.data %>% identify_outliers(score)#> sample score gender is.outlier is.extreme #> 1 20 50 Female TRUE TRUE#> # A tibble: 2 x 5 #> gender sample score is.outlier is.extreme #> <fct> <int> <dbl> <lgl> <lgl> #> 1 Female 18 1.07 TRUE FALSE #> 2 Female 20 50 TRUE TRUE