Standardize the process of moving data from disk to cloud to database
Have a standardized, documented, procedures to move data into the database. This part is pretty easy but takes a long time for a newbie to figure out and could be done wrong, causing headaches later.
Moving data from local disk (or virtual machine) to S3
108.10.20.10.20.10 Data Analysis - Moving data from disk to S3
Sample commands to move data from s3 to redshift
108.10.20.10.20.20 Data Analysis - Commands to move data from S3 to Redshift
Ingestion tools
(more detail necessary) - S3 - Redshift COPY - Spark
Graph:
- 108.10.20.10.20 Data Analysis - Standardized process to ingest raw data to 108.10.20 Data Analysis - Step 2 ingest clean validate
- 108.10.20.10.20 Data Analysis - Standardized process to ingest raw data to 108.10.20.10.20.10 Data Analysis - Moving data from disk to S3
- 108.10.20.10.20 Data Analysis - Standardized process to ingest raw data to 108.10.20.10.20.20 Data Analysis - Commands to move data from S3 to Redshift
- 108.10.20.10 Data Analysis - Ingest raw data into database to 108.10.20.10.20 Data Analysis - Standardized process to ingest raw data