The Art of Data Analysis

Daniel Kahneman‘s recent book Thinking, Fast and Slow has many examples of our difficulties with probabilistic and statistical reasoning.

Statistical software and spreadsheet programs have made it relatively straight forward to carry out the science of data analysis.  The art of data analysis involves answering questions such as: How should I frame my question quantitatively?  What statistics should I use that will provide a convincing qualitative answer? Answers to these questions still challenge data analysis veterans in every field.

Some key questions to consider are:

1. How good are the data?
2. Could chance or bias explain the findings?
3. How do the current results compare with what is already known?
4. What theory or process might account for the findings?
5. What are the business or scientific implications?

Some cautions:

1. Don’t rely on only one statistical measure such as the average.  Examine several measures of central tendency such as the mean and median.
2. Don’t check central tendency without a measure of variability such as standard deviation or interquartile range.  Make sure to examine several measures of variability.
3. Large samples can’t fix poor quality data or data that was not collected.
4. If you don’t find a result in a small sample, even when the data is high quality, then you might need more data to see the result.  Statisticians sometimes say that the data is too noisy to see the signal.
5. Don’t be overconfident in your interpretations and seek counter interpretations.  If you don’t then others might.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s