Moving data from disk to S3
The first step to getting data into a Redshift database is to put the raw files on S3. From here it can be COPY
‘d onto Redshift. Getting a file onto S3 can be accomplished in a couple of different ways:
- Use the S3 web interface (https://console.aws.amazon.com) for small files
- Use the AWS CLI for large files. This is preferred and easier once you get it.
- E.g.
aws s3 cp raw_data.csv s3://bucket-name/raw_data.csv
- E.g.
S3 can also act like a long term data repository. The files on local and virtual machines are subject to being deleted occasionally and S3 is a much safer place to store data.
Graph:
- 108.10.20.10.20.10 Data Analysis - Moving data from disk to S3 to 108.10.20 Data Analysis - Step 2 ingest clean validate
- 108.10.20.10.20 Data Analysis - Standardized process to ingest raw data to 108.10.20.10.20.10 Data Analysis - Moving data from disk to S3