|Issue 65:||Heatmaps - Determining correlations between values (aka pivot, aka scatterfacet for non-numeric values)|
|5 people starred this issue and may be notified of changes.||Back to list|
I'd like to be able to get a x% of records which contain valueA in columnX also contain valueB in columnY. This would then allow me to spot high correlations between non-numeric data, and narrow down on outliers for any necessary data cleaning. This can be done manually be faceting on columnX and making a note of the count of valueA. Then filter by valueA and add an additional facet on columnY, making a note of the count of valueB. Unfortunately, this only allows me to look at one value combination at a time. I'd like a representation, similar to the scatterfacet, to display this for all combinations of values in columnX and columnY (plus 'empty'). Along the x-axis are the values of columnX and along the y-axis the values of columnY. As the data is non-numeric, the graph is split into cells. The value of the cell is the percentage of records with count(recordA + recordB)/ count(recordA). This could be done as a heatmap varying the cell brightness between 0 and 255 in scale with the percentage value. Clicking a cell would get me the corresponding rows. (bonus points for also being able to click through to get the inverse - records of valueA which don't contain valueB) An overview of all possible heatmaps could also be generated, similar to the overview of all possible scatterfacets.
Oct 12, 2010
Summary: Heatmaps - Determining correlations between values (aka pivot, aka scatterfacet for non-numeric values)
|► Sign in to add a comment|