External data validation
In all cases, before analysis starts, we need to have comfort with the data. We need to have a strong confidence that it is clean and valid. The best way to get this confidence is to
Compare the data with another source.
- E.g. compare revenue numbers with P&L, customer counts with 10Q.
- If you don't have external sources of data, ask for them
Send summary stats of the data to your project sponsors, the people who know this data better than you do
- First, you should have nurtured your 188.8.131.52 Data Analysis - Client working relationship
- In most data analysis projects there is a person (or people) who knows what the data should look like and roll up to. They might not know the structure of the data, they definitely don't know how to do what we're about to do 108.20 Data Analysis - Customer analysis but they know the data.
- Send the 184.108.40.206 Data Analysis - Raw data understanding and summary statistics to your business sponsor and get their sign-off before proceeding
Ignore this step at your own peril! In data science, once the numbers have been questioned or proven wrong, nothing else matters, nothing can be trusted, all your work is ruined. If this happens, the best that you can do is to come back to this step and verify that the underlying data is correct.