Step 2: Ingest/Clean/Validate
This is the most time consuming, frustrating, unstructured part of the analysis. It’s also where the most can go wrong. Therefore, it is the most important and must be done right.
1. Ingest raw data into database
108.10.20.10 Data Analysis - Ingest raw data into database
2. Understand
108.40.20.20 Data Analysis - Raw data understanding and summary statistics
3. Clean / map
108.10.20.20 Data Analysis - Clean and map data
4. Confirm
Never move to analysis before signing off on data quality. 108.10.20.30 Data Analysis - Summary statistics and confirmation
Graph:
- 108.10.20 Data Analysis - Step 2 ingest clean validate to 108.10.20.10 Data Analysis - Ingest raw data into database
- 108.10.20 Data Analysis - Step 2 ingest clean validate to 108.40.20.20 Data Analysis - Raw data understanding and summary statistics
- 108.10.20 Data Analysis - Step 2 ingest clean validate to 108.10.20.20 Data Analysis - Clean and map data
- 108.10.20 Data Analysis - Step 2 ingest clean validate to 108.10.20.30 Data Analysis - Summary statistics and confirmation
- 108.10 Data Analysis - Phases of data analysis to 108.10.20 Data Analysis - Step 2 ingest clean validate
- 108.10.20.10 Data Analysis - Ingest raw data into database to 108.10.20 Data Analysis - Step 2 ingest clean validate
- 108.10.20.10.10 Data Analysis - Receiving raw data to 108.10.20 Data Analysis - Step 2 ingest clean validate
- 108.10.20.10.20 Data Analysis - Standardized process to ingest raw data to 108.10.20 Data Analysis - Step 2 ingest clean validate
- 108.10.20.10.20.10 Data Analysis - Moving data from disk to S3 to 108.10.20 Data Analysis - Step 2 ingest clean validate
- 108.10.20.10.20.20 Data Analysis - Commands to move data from S3 to Redshift to 108.10.20 Data Analysis - Step 2 ingest clean validate
- 108.10.20.20.20 Data Analysis - Describe the data in words to 108.10.20 Data Analysis - Step 2 ingest clean validate
- 108.10.20.30 Data Analysis - Summary statistics and confirmation to 108.10.20 Data Analysis - Step 2 ingest clean validate
- 108.10.70.20 Data Analysis - Teamwork for data cleaning mapping analysis to 108.10.20 Data Analysis - Step 2 ingest clean validate
- 108.30 Data Analysis - Categories of data analysis to 108.10.20 Data Analysis - Step 2 ingest clean validate