Member-only story
Datasets and Libraries for Developing Data Models on R (and Python)
You can use the following datasets in R programming, as well as share them in Python using a Reticulate function
In previous posts, I shared a few datasets for practicing functions and creating trial statistical data models. I shared those posts at the end of this one.
But before you get that summary, I am sharing some recent resources. A few of these were introduced when their respective libraries were updated. One is an API wrapper that pull data from an online repository.
No matter the source, these datasets can allow you to get more familiar with programming syntax and can help you plan better data models — regressions, decisions trees, or using visualizations.
Regressions
The carData library holds a vast variety of datasets designed for regression models, so it has a great selection for understanding how parameters for lm and glm functions work. Some examples include the Highway1 dataset, a dataset on highway accidents and safety. Other carData sets includes Arrests (arrest for marijuana possession), CanPop (Canadian population), Davis (self-reports of weight and height), and MplsDemo (Minneapolis Demographics).