Member-only story

How To Create Bins in R Programming

Noting outliers in data subsets often means binning data into groups. Here’s how you can do it in R programming

Pierre DeBois
6 min readOct 1, 2023

I wrote about presenting data within a bin, using the cut() and cut2() functions in R programming. Bins help analysts better understand their datasets, profiling trends that can impact decisions based on the data.

Binning is often one of many steps in exploratory data analysis (EDA). An EDA is used as a pre-modeling activity designed to handle outliers, missing values, binning values, and encoding categorical features. These steps add a “signal” to a data model. “Signal” is a shorthand for what the model is trying to learn.

Now I want to explain concepts that can help display binned data so that you and your stakeholders can visualize the insights behind the data easily. The sneaky idea in this explanation — one that is simple yet influential — is how to best address outliers.

Highlighting The Outliers

In mathematical terms, outliers are observation values that are extremely distant from the mean or median of a given dataset. Outliers are not just observations that stick out. They skew the distribution of data, masking how observations are valued. It’s like having that smart kid in…

--

--

Pierre DeBois
Pierre DeBois

Written by Pierre DeBois

#analytics |#datascience |#JS |#rstats |#marketing services for #smallbiz | #retail | #nonprofits Contrib @CMSWire @smallbiztrends #blackbusiness #BLM

No responses yet