During the implementation of Data Science Projects, we always face cases where we have to decide on the best method of implementation in order for it to be integrated with the pipeline smoothly. The goal is to achieve the most simplistic implementation as the overall design is always complex. We focus on to simplifying our approaches as much as possible so we can keep track of all the steps and modify them easily with minimum implementation/modification time.
Some tools can be more productive than others. Throughout our experience in implementing an optimal machine-learning pipeline in production, we have learned to appreciate the raw strength of the combination of SAP HANA with SAP Data Services. The amount of time that can be saved by reformulating the approach and optimizing it to use this combination is significant, compared to a vanilla approach involving usage of Python for data wrangling, cleaning, discovery, and normalization, which are significant aspects of machine learning pipeline development.
In today’s post, we will narrate you our journey navigating through the ocean of NULLs. This is the story of how we moved forward from the mystical, the initial expectations and assumption, to the practical, an actual problem-solving methodology that became an integral and reusable approach of our data science framework.