Member-only story
How to Use Stringr Functions in R Programming

Programming often involves the automatic replacement of characters within a text or a body of text. Developers have long applied regular expressions to do this. It is common in programming to help identify and replace text when the code is hundreds of lines long.
For applying filters and regular expressions in R programming, there is a library called Stringr. It contains functions designed to identify characters and character patterns. Stringr can still be intimidating to learn because of the vast number of combinations. There is a whole cheat sheet that shows the regular expressions that are used, but its implementation can be best learned by combining it with the Dplyr functions. Starting with a few basics can make understanding the regular expressions and filters a bit easier against the object structure within R.
In this post, I will show some of the basics of using the Stringr function. These can enhance how your planning of how you filter your data, especially with the Dplyr functions available to you.
The function variants in the Stringr library
The Stringr library has several key functions to detect characters. Each function uses two arguments, the object, and the desired character pattern (string) in that object. Each has arguments that return character counts, indexing where the text is within a data object, and identifying where a series of characters repeat. This is helpful when the script is long, and you need to replace each instance of a section of text or syntax.
For example, if I want to have the word “Chevrolet” as my text string and identify all the places where “Chevrolet” appears, I will write it in the following manner.
There is str_detect() function that detects the presence of the string. The syntax is str_detect(object, pattern) where the object parameter is an object being examined and the pattern parameter is the characters to be identified in the string parameter.
As an example, I read a word document into R, an article I am working on. I am interested in seeing the number of times “SEO” appears in the text. The text is placed in a dataframe, search_df, in a column called text. There are 44 elements, each from the paragraphs in the…