How to Use The Cut Function In R Programming

Basic subset functions in R provide ways to categorize your data. Here is a simple function for identifying ranges that your data fits into.

Pierre DeBois
4 min readMar 20

--

Image via MidJourney by author

Cut() is an often overlooked R programming function that is used for creating subsets within a list of R programming object elements. The subsets are bins — a range that you set for convenience.

Cut() differs from the slice() function in the dyplr library. Slice subsets according to the observations, and can even insert functions that determine where the subsets should occur. With the cut() function, you are actually positioning and naming where the bins should occur without requiring a calculation — just the range of numbers you want.

As an example, I am taking the data set GT cars from the GT library, and inserting them into an object in R as a data frame. I plan on creating bins based on miles per gallon rating of the listed vehicles. Below is the dataset — I included a view of the set so you can see the mpg_c column from which the data will be created.

Now I will create objects based on this data frame object using the cut() function on the mpg_c column data, indicating which observations fall into categories of fuel efficiency.

Cut parameters include the actual object followed by a comma, and then “breaks = “. It is here that you add a list of where those breaks occur.

So for the GT cars example, I create an object called mileage, with breaks that divide the set into bins of miles per gallon — 0–15, 15–20, 20–25, and so forth. The code is shown below.

The created object (mileage) displays the observations according to the bin each observation falls within. Here is what it looks like when you run the mileage object.

--

--

Pierre DeBois

#analytics |#datascience |#JS |#rstats |#marketing services for #smallbiz | #retail | #nonprofits Contrib @CMSWire @smallbiztrends #blackbusiness #BLM