Using Count Functions for Better Data Models in R Programming
Counts can be helpful to see categories in our data. There are a lot of ways to do it.
In any data model, it is inevitable to look at the number of fields in the columns and rows, checking that your data table reflects an accurate number of fields before forming a data model. It is a part of exploratory data analysis (I explain the concept in this post; I am planning an updated version of the EDA post based on my CodePalousa 2022 presentation with new data and libraries soon).
R programming offers a lot of ways to conduct this count. So in this post, I will cover a few of the functions available that you can use for counting your fields. The functions highlighted in this post will help you quickly audit what is in your R objects.
In a lot of ways, you have an opportunity to automate the counts as you are using the functions in other parts of your data model. This speeds up the exploratory data analysis for machine learning models such as classification, sentiment analysis, and Markov chain models.
For these examples, a data frame of cars and SUVs will be used, a simple 12-vehicle list. The data frame columns will contain the vehicle make & model, body type, highway mileage, and fuel capacity. This is the data frame being created in R….
…and this is a view of that data frame, called out in RStudio (You can use either view(data frame) in R or click on the table icon in the environment pane).
Now let's look at the functions available.
Within the dplyr library there is a count() function. It is used for quickly identifying the number of unique values.
You can pipe the function, using a column name. In the dataframe example, the Type column is called to note the number of body types in the object.