Why Graphical Data Analysis is Important
Always Be Graphing
Exploratory data analysis is important, and a key aspect of it should be graphical data analysis. This is because discrete datasets with otherwise similar descriptive statistics can show very different relationships when graphed. Francis Anscombe displayed a quartet of diagrams with identical descriptive statistics that when graphed showed a dramatically different overall relationship between x and Y.
Anscombe's graphs are included as part of the Seaborn Package and loading them up is a helps to illustrate the moral here. Below is a screen grab of a Jupiter Notebook in which I called up the graphs.
Dataset I shows a fairly linear dataset with strong correlation between x and y, with some error. Dataset III is strongly linear, yet the model's fit to the data is off by virtue of an outlier. Few datapoint fit the model in Dataset IV and should make us very cautious about reliance on a linear model in that case. Dataset II shows no linear correlation.
All this is a salutary lesson that understanding the underlying structure of the data is important and that often this can be obtained most readily by humans via graphical representation.
Here's the thing though - Anscombe's 1973 paper 'Graphs in Statistical Analysis' was written at a time when we did not have the graphing packages we have today; more pertinently the choice. We also perhaps also have a richer vernacular of graphs and charts with which to conduct graphical analysis. Given the risks associated with not carrying out graphical data analysis, and the tools at our disposal, we can ill afford to avoid so doing.