GOALS¶
Specify the key variables that are to serve as the model targets and whose related metrics are used determine the success of the project.¶
Identify the relevant data sources that the business has access to or needs to obtain.¶
Data Accessibility¶
Is the data available - points to be considered below:
- What we have to do in order to have it available for the model online deployment?
- How much will it cost?
- How long will it take for us to have the data available?
- Can it be a roadblock for the deployment?
Data Consistency¶
What is the profile of the data that we have available?
- How does it behavior?
- Does it present any outliers?
- Is it influenced by external factors that are not [yet] measured?
- Can the behavior/outliers/external factors potentially be roadblocks for our initiative?
Data Completeness¶
- Is the data workable?
- Does it require much wrangling?
- Do we have enough data points enough to create a model with a reasonable confidence level (considering too, accuracy)?
Data Frequency¶
- Are there any infrastructure barriers or developments that need to be address now?
- Do we need to engineer a new pipeline?
- What tools and resources can we explore to extract data (e.g. Dariva´s solutions, ChemTech, Central PIMS, AT team)
- Does the event I am trying to capture occur at a frequency that can be captured by the available sensors?
- Is the data online?
- If necessary, can I change the data collection frequency so I can capture the event?
- Do I store this data in my databases, or where do I store the data?