One of the challenge of building a tool to index all environmental sensor data
is that we have to work with a huge variety of data formats and structures.
One of these structure is two dimensional data (2D data). We tend to
understand 2D data as any data that can fit on a table, where each cell, being
at the intersection of a row and a column, contains a value. But 2D data can
mean multiple things, and only some 2D datasets are suitable for a heatmap.
Let's give some examples from the environmental sensor data we work with on a
daily basis at Planet OS.
Let's say we have a table showing a temperature value for an array of sensors
for each day of the week. To better visualize this grid of data, we can color
each cell according to its value. This can be called a "color matrix" and is
sometimes called a
heatmap.
And now for the definition. What I call a "cartesian heatmap" is a color
matrix showing a grid of discrete values from a continuous 2D space. What
does that mean?
One way to understand what data is suitable for any chart type is to
describe the dimensions and measures as continuous or discrete. To learn
more about this conceptual framework, maybe this paper can help. Let's take
the example dataset used for the first figure: temperature of 7 sensors over
7 days. Temperature is continuous, sensor ID and day are both discrete
(categorical). This grid data example could fit on a table or on a color
matrix.
Let's take another example: water temperature for 7 sensors at 7 depth
steps. Here temperature is still continuous, sensor ID is still discrete
(categorical) but depth is also discrete, which is confusing since depth is
a continuum, but the depth data we have is a series of temperature readings
taken at each depth step. We treat the temperature dimension as continuous,
because it can take any arbitrary value over a continuum, but the depth
dimension as discrete, as it's a series of discrete depth levels. This data
can be visualized on a color matrix, each cell having a temperature value
encoded as color, at the intersection of a sensor ID on the X axis and of a
depth step on the Y axis. But it could also be visualized as a line chart,
each line representing one sensor, the X axis would be depth and the Y axis
would be temperature.
We tend to see the line chart as multiple lines each showing a measure
varying over a dimension. Comparing values on one axis is different than
comparing on the other. For example, we can see the variation of temperature
for each single sensor over time, but to compare temperature values at the
same time slice value across multiple sensors is not the same perceptual
task.
A heatmap is a bit closer to a "permutation matrix" or to a "stacked
sparklines".
We know how to build a line chart (continuous measure on Y axis, discrete
ordinal on X axis) and a color matrix (continuous measure as color, discrete
categorical/ordinal on both X and Y axes). But how to build a heatmap?
Continuous measure as color, discrete dimensions on both X and Y axes, but
preferably discrete steps from a continuous dimension. One example is
temperature for each latitude/longitude. Another would be temperature for
each depth step for each day of the month. Unlike a line chart, a heatmap
uses the same visual encoding for each axes, so comparing dimensions uses
the same perceptual task: comparing colors. The visual metaphor is closer to
the idea of an homogenous 2D grid of values.
Another visual metaphor that comes for free with the name "heatmap" is the
idea of heat. User often have in mind these rainbow colored thermal
imagery when they look at a heatmap. They will expect to see colored zones
with smooth transitions between them, which is not alway the case in grid
data. Also, we have to pick the right color scheme that will not confuse
the user into thinking, for example, that red means very hot and blue very
cold. Choosing the right color scale for a heatmap is more difficult than
it looks. Maybe it could be the subject of a next blog post?
In the meantime, please enjoy the new heatmap we are sharing today as a
new chart type in Cirrus.js. In conclusion, a heatmap is perfect for
visualizing a 2D grid of data representing discrete values from a
continuous 2D space. It was simple enough to expand the library to add a
grid component with the current datavis pipeline architecture, just adding
a color scale and some minor tweaks like single tooltip. Cirrus.js is
under heavy development, but we hope sharing insights about how we build
it gives you an idea of how we are solving visualization problems at
Planet OS.