It all starts with a problem to be solved for your business or a question to be answered. The purpose is to define key parameters to evaluate the result together with the problem owner(s).
From there, typically a number of steps are taken that is commonly referred to as data wrangling, data preparation or data munging. The purpose of these steps is to get to a usable set of data and create a solid base for future data collection.
- Identify what data is available. Build an understanding of the data.
- Organize / structure the data.
- Clean the data. An important step as raw data typically has a number of anomalies that have to be fixed as they can influence the outcome of the analysis.
- Identify where there is data missing that can be added immediately, possible from different data sources.
- Ensure consistency and quality of the data
- Create output for further processing, analysis and/or visualization.
The steps above usually require a number of iterations. It is key to make the data usable.
The deliverables that are created in these steps are used for further analysis to get the desired business outcome.
