How to Make a Violin Plot in R Programming
Violin plots are not often seen in data visualization but they can be very useful. A violin plot is essentially a boxplot variant, combining a boxplot and a density curve to give a picture of a distribution of data.
In this post, I will explain how violin plots can be made using ggplot and explain where they can be useful within exploratory data analysis.
Violin Plot Basics
First let’s cover the purpose of a violin plot.
To appreciate the violin plot, think about what a standard boxplot does for visualization. A box plot provides quartile statistical information about a set of observations. It reveals how the data is distributed — what is the median, what is the first quartile, the third quartile, and so forth, as well as where the outliers are relative to the distribution.
But your data set may contain a number of modes — observations that are repeating appearing in the distribution with a relatively high frequency. The mode can be bimodal where there are two groupings of high-frequency observations, or multimodal where three or more peaks exist. Regardless of the number, modes can indicate a skew. A box plot can indicate a skew but does not indicate modes that influence where a skew may be forming.