Different problem domains have inherently different types of problems that can be analyzed and solved using a data-centric approach. However, as analysis is not an end in itself, and is rather a means to an end, one needs to clearly define the problem statement and the success criteria for the solution, in order to give a direction and a defined start and end points to the analysis project. This helps ensure that the analysis outcomes are time bound as well as measurable.
The next step is identifying the various sources and silos from where data relevant to the problem statement can be extracted. Data extraction can then proceed, followed by data transformation and massaging in order to prepare it for use by the statistical and machine learning algorithms.
Suitable analysis and learning algorithms are then deployed and trained up on the collected data, in order to explore ways to solve the defined problem statement. The selected trained models are then deployed on production environments for ongoing analysis on live transactional (rather than historical) data in the context of the solution domain
Visual Reports and Dashboards are deployed so that authorized users can then explore the data sets themselves, in an intuitive manner. Users are typically able to filter, sort and slice/dice the data to explore further nuances and build up their understanding of the domain beyond that which has been extracted by the machine learning algorithms by themselves.