Using the Cut2 Function in R Programming
Another function, cut2(), in the Hmsic library will help group the observations of your dataset. Here’s how.
I explained in an earlier post how the cut() function in R programming worked. Yet more than one version of that function is available. That second one, cut2(), is included in the Hmisc library. It provides more nuanced features in identifying bins among a vector of data. This post will explain some of the details so you can appreciate the options and be imaginative in the
The cut2() function has vectors, cut points, the minimum number of observations in a group, a number of quartile groups, and some additional parameters.
One difference is the cutpoints for each bin. The “cutpoints” describe the start and end points of the bin groups.
In cut() you have a starting point, lower, and end point, upper. They appear as (lower, upper] which indicates the cutpoints for a bin range that excludes the lower number.
In cut2(), the bins have inclusive lower endpoints and excludes the upper endpoint. The bins in the data range appears as [lower, upper), with the exception that the last interval is completely inclusive — [lower, upper]. This means cut2() will by default ensure that given cutpoints consider the entire data range.
If cutpoints are not given, the cut2() function will cut the data into quantile groups (g) or groups with a given minimum number of observations (m).
For example, I am recreating the cuts example I made in the post How to Use The Cut Function In R Programming. In that example, I made city gas mileage bins of the vehicles in the dataset.
Below is the same example, but created with the cut2() function.
The cut2() function yields a set of bins similar to the cut example, with the cuts parameter set to a vector. Yet, the [lower, upper)…