The data science lifecycle is based on the fundamental scientific process of:
Once you start assembling 'data sets' the exploration of these data sets and refining, cleaning is paramount in your journey to achieving a reproducible outcome you can stand behind. Like a building with solid foundations, you need to understand what your feeding into the pipeline.
Feature engineering is the addition of new fields (ie. facts) within the data sets based on other fields already present. This can come through Subject matter experts (e.g., scientists, engineers) that can provide insights on the process and phenomena that may and/or may not be held within the data
Fundamentally you are doing this work to answer the question at hand, what is the modeling technique telling you, can you verify the outcomes with subject matter experts within the domains?