Member-only story
How To Create Bins in R Programming
Noting outliers in data subsets often means binning data into groups. Here’s how you can do it in R programming
I wrote about presenting data within a bin, using the cut() and cut2() functions in R programming. Bins help analysts better understand their datasets, profiling trends that can impact decisions based on the data.
Binning is often one of many steps in exploratory data analysis (EDA). An EDA is used as a pre-modeling activity designed to handle outliers, missing values, binning values, and encoding categorical features. These steps add a “signal” to a data model. “Signal” is a shorthand for what the model is trying to learn.
Now I want to explain concepts that can help display binned data so that you and your stakeholders can visualize the insights behind the data easily. The sneaky idea in this explanation — one that is simple yet influential — is how to best address outliers.
Highlighting The Outliers
In mathematical terms, outliers are observation values that are extremely distant from the mean or median of a given dataset. Outliers are not just observations that stick out. They skew the distribution of data, masking how observations are valued. It’s like having that smart kid in…