108 Data Analysis
File: Data Analysis - External data validation

External data validation

In all cases, before analysis starts, we need to have comfort with the data. We need to have a strong confidence that it is clean and valid. The best way to get this confidence is to

Compare the data with another source.

  • E.g. compare revenue numbers with P&L, customer counts with 10Q.
  • If you don't have external sources of data, ask for them

Send summary stats of the data to your project sponsors, the people who know this data better than you do

Ignore this step at your own peril! In data science, once the numbers have been questioned or proven wrong, nothing else matters, nothing can be trusted, all your work is ruined. If this happens, the best that you can do is to come back to this step and verify that the underlying data is correct.

  • Me