An interesting way to show correlations on pairs of attributes is a correlation matrix. It’s a very simple thing to setup and incredibly easy to understand for the user
Full dashboard on Tableau Public (downloadable):
The dashboard above works as follows. There are a total of 6 attributes. In this case we’re viewing various indicators of Federal Reserve Economic Data (I.E. imports/exports). It’s not really necessary to understand the data itself. Each square displays a number that indicates how well the data correlates between any two pairs of indicators. The closer the data is to 1.00, the more similar the pairs are. In addition, hovering over each pair in the matrix will display the corresponding scatterplot to the right for a more detailed analysis.
The first step is to properly format your data. If I attempt to setup a matrix by simply using my main data (shown below), you’ll notice that I’ll only get the option to add values down the diagonal. This is monthly data over 10 years, it has one row for each date and indicator.
The reason for this is that to form a matrix of every possibility, Tableau essentially needs a cartesian join of every possible pair available in the datasource itself. That’s not a terribly hard thing to do – just join the data to itself within Tableau as an original and duplicate on what will cause those pairs to form. In my case, I join on the date.
Now when we try to form the matrix – we’ll be able to. Just be sure that one of the indicators you add on the columns/rows are from the duplicate.
To calculate the correlation we use a built in function. In the case of SQL Server – this will NOT work in a live connection and must be an extract. This is just one of those funky cases in which SQL Server won’t support the function and you have to pull it out to use it. I’ve hit this with other functions in the past.
Now we can put that onto the text, set the coloring and sizing, and there you go. In terms of the scatterplot, I’ve utilized action filters on the matrix and pass along the indicator and the duplicate indicator. You’ll notice that I put those two dimensions on the detail for the scatterplot to allow this to work.
So – here’s the problem with a correlation matrix in my opinion. After I did all of this work I realized the data itself probably wasn’t the best to use for this. This data relies on dates, so what is the correlation itself telling us? Well – it’s giving us an idea of whether or not all the values over all of time are similar. What it’s NOT taking into account is whether or not they correlate OVER time. I.E. tends to go up in 2010, but down in 2013, etc. So if you’re data relies on dates, be careful and you may want to avoid. I’ve created another example of the same dashboard that takes date into account – but isn’t dealing with statistical correlation. I haven’t found a great way to take dates into account with a correlation function.
Dashboard above can be downloaded at Tableau Public here:
To get the second version to work, you’ll notice that I have to index the data. This concept is simple but unfamiliar to a lot of people. Indexing makes highs/lows of disparate data come closer closer together (similar to using a logarithmic axis). To see how to do this, check out the following link: https://chandoo.org/wp/indexed-charts-in-excel/
Lastly – yes you can add the scatterplots directly to the matrix instead of the number. However – this gets extremely busy for a 6×6 matrix and visually is a bit confusing in my opinion.