It all starts with a problem to be solved for your business or a question to be answered. The purpose is to define key parameters to evaluate the result together with the problem owner(s).

From there, typically a number of steps are taken that is commonly referred to as data wrangling, data preparation or data munging. The purpose of these steps is to get to a usable set of data and create a solid base for future data collection.

  1. Identify what data is available. Build an understanding of the data.
  2. Organize / structure the data.
  3. Clean the data. An important step as raw data typically has a number of anomalies that have to be fixed as they can influence the outcome of the analysis.
  4. Identify where there is data missing that can be added immediately, possible from different data sources.
  5. Ensure consistency and quality of the data
  6. Create output for further processing, analysis and/or visualization.

The steps above usually require a number of iterations. It is key to make the data usable.

The deliverables that are created in these steps are used for further analysis to get the desired business outcome.