Member-only story

Creating Synthetic Datasets with R Programming

Use rep(), seq(), replicate() and other built-in functions to create trial data sequences for your data frames, data tables, and matrices

6 min readOct 26, 2022

Sometimes data is not available for access. But you may have patterns within the data that would be useful for experimenting with a calculation in your R programming script. For example, you can have a dataset that represents metrics that you know have certain qualities, such as miles per gallon, temperature, or rate of spending. You know you can have a negative temperature, but not a negative mile per gallon. Or you may know that a spending rate has a certain interval. The point is you have some ideas to start for a mock set of data to work with.

To address data in which you have some framework in mind, create a synthetic data set using the following built-in functions in R. Each function can provide ways to develop a series of data elements for a vector, matrix, dataframe, or datatable. You can then use those patterns to help understand advanced functions and library choices better.

sample()

A built-in function in R programming is called sample(). You use it to create a number of random samples from a population of numbers. The population size is called in the first argument, while the sample size is indicated in the second argument. The function works only when the sample size is smaller than the population size.

Creating Synthetic Datasets with R Programming

Use rep(), seq(), replicate() and other built-in functions to create trial data sequences for your data frames, data tables, and matrices

sample()

Written by Pierre DeBois

No responses yet