When analyzing a questionnaire, one often wants to view the correlation between two or more Likert questionnaire item’s (for example: two ordered categorical vectors ranging from 1 to 5).
When dealing with several such Likert variable’s, a clear presentation of all the pairwise relation’s between our variable can be achieved by inspecting the (Spearman) correlation matrix (easily achieved in R by using the “cor.test” command on a matrix of variables). Yet, a challenge appears once we wish to plot this correlation matrix. The challenge stems from the fact that the classic presentation for a correlation matrix is a scatter plot matrix – but scatter plots don’t (usually) work well for ordered categorical vectors since the dots on the scatter plot often overlap each other.
There are four solution for the point-overlap problem that I know of:
Jitter the data a bit to give a sense of the “density” of the points
Use a color spectrum to represent when a point actually represent “many points”
Use different points sizes to represent when there are “many points” in the location of that point
Add a LOWESS (or LOESS) line to the scatter plot – to show the trend of the data
In this post I will offer the code for the a solution that uses solution 3-4 (and possibly 2, please read this post comments). Here is the output (click to see a larger image):
I guess this is not the number one post I would like to start with on this blog, but I feel the time is right for it (community-wise).
I’ll move on to the subject matter in a moment, but first a short intro: This blog is written by Tal Galili. I am an aspiring statistician who also loves to use R for his work. At the same time I am also a WordPress blogger, writing mainly at www.TalGalili.com where I can use my native language (Hebrew) for self expression.
This combination of statistics and blogging will lead me to sometimes much less statistical, but more Web/Open-Source oriented posts like this one. So for the statisticians in the audience I extend my apologies and invite you to wait for future posts which will be more fully focused on Statistics and R.