Why Graphical Data Analysis is Important

Why Graphical Data Analysis is Important

Always Be Graphing

Exploratory data analysis is important, and a key aspect of it should be graphical data analysis.  This is because discrete datasets with otherwise similar descriptive statistics can show very different relationships when graphed. Francis Anscombe displayed a quartet of diagrams with identical descriptive statistics that when graphed showed a dramatically different overall relationship between x and Y.

Anscombe's graphs are included as part of the Seaborn Package and loading them up is a helps to illustrate the moral here. Below is a screen grab of a Jupiter Notebook in which I called up the graphs.

 

Screen Shot 2018-06-18 at 22.08.33.png

Same Statistics

Different Relationships

Dataset I shows a fairly linear dataset with strong correlation between x and y, with some error.  Dataset III is strongly linear, yet the model's fit to the data is off by virtue of an outlier. Few datapoint fit the model in Dataset IV and should make us very cautious about reliance on a linear model in that case.  Dataset II shows no linear correlation.

All this is a salutary lesson that understanding the underlying structure of the data is important and that often this can be obtained most readily by humans via graphical representation.

Here's the thing though - Anscombe's 1973 paper 'Graphs in Statistical Analysis' was written at a time when we did not have the graphing packages we have today; more pertinently the choice.  We also perhaps also have a richer vernacular of graphs and charts with which to conduct graphical analysis.  Given the risks associated with not carrying out graphical data analysis, and the tools at our disposal, we can ill afford to avoid so doing.

Why I'm Studying Data Science

Why I'm Studying Data Science

5 Newsletters for Data Science News and Views

5 Newsletters for Data Science News and Views