Folder:
108 Data Analysis
File:
108.10.20.10.20 Data Analysis - Standardized process to ingest raw data
Standardize the process of moving data from disk to cloud to database
Have a standardized, documented, procedures to move data into the database. This part is pretty easy but takes a long time for a newbie to figure out and could be done wrong, causing headaches later.
Moving data from local disk (or virtual machine) to S3
108.10.20.10.20.10 Data Analysis - Moving data from disk to S3
Sample commands to move data from s3 to redshift
108.10.20.10.20.20 Data Analysis - Commands to move data from S3 to Redshift
Ingestion tools
(more detail necessary)
- S3
- Redshift COPY
- Spark
Source:
- Me
Graph:
- 108.10.20.10 Data Analysis - Ingest raw data into database >> 108.10.20.10.20 Data Analysis - Standardized process to ingest raw data
- 108.10.20.10.20 Data Analysis - Standardized process to ingest raw data >> 108.10.20.10.20.20 Data Analysis - Commands to move data from S3 to Redshift
- 108.10.20.10.20 Data Analysis - Standardized process to ingest raw data >> 108.10.20.10.20.10 Data Analysis - Moving data from disk to S3
- 108.10.20.10.20 Data Analysis - Standardized process to ingest raw data >> 108.10.20 Data Analysis - Step 2 ingest clean validate