Standardized working environment
At Lucky8, I quickly came to the realization that, once a project got underway, it gets crazy. Everybody works in their own tools, uses their own environment, local folders, opaque scripts. Very difficult to consistently produce high quality error-free work this way.
When a project gets under way it will quickly get overwhelmed by csv files, excel spreadsheets, sql files, python scripts, communication, jupyter notebooks, and more. And it gets worse when multiple people are all working on the project at the same time.
It is important to have a standard organization for all projects. Everything an analyst needs should be in one area, well documented, in a format they are used to seeing. Every project will be so different that at least the environment we are working in should be as standardized as possible. Data integrity and shared workflow will make a massive improvement to the way we used to do things at Lucky8.
Standardized working environment and tools
- Everybody should use the same tools
- Jupyter Notebook in the beginning, is preferable to a SQL client because history is preserved
- Every project gets its own Git repo and cloud file storage location
Shared knowledge and content
- Scripts, jupyter notebooks, data manipulations, and notes for it all, should be shared property, stored in a central location, that anybody on a project can see, refer to, grab from a previous job, etc.
- The goal is to create transparency, reusability, reproducibility. Eliminate redundancy, doing the same thing twice (probably differently!) Ensure output consistentcy and quality.
- Data consistency is key - everybody works on the same data, everybody works with the same highly standardized best practices for saving and documenting work.
- “How can I keep the code, output, formatting, analysis, notes, data quality, all together in one place? Along with files (Tableau, excel, graphs, pptx)?” (me, 20161017)
Graph:
- 108.40.10.10 Data Analysis - Standardized working environment to 108.40.10 Data Analysis - Working environment
- 108.40.10 Data Analysis - Working environment to 108.40.10.10 Data Analysis - Standardized working environment
- 108.40.10.20 Data Analysis - Excel is amazing and it sucks too to 108.40.10.10 Data Analysis - Standardized working environment