Skip to content

GOALS

Identify the relevant data sources that the business has access to or needs to obtain.

Data Accessibility

Is the data available - points to be considered below:

  • What we have to do in order to have it available for the model online deployment?
  • How much will it cost?
  • How long will it take for us to have the data available?
  • Can it be a roadblock for the deployment?

Data Consistency

What is the profile of the data that we have available?

  • How does it behavior?
  • Does it present any outliers?
  • Is it influenced by external factors that are not [yet] measured?
  • Can the behavior/outliers/external factors potentially be roadblocks for our initiative?

Data Completeness

  • Is the data workable?
  • Does it require much wrangling?
  • Do we have enough data points enough to create a model with a reasonable confidence level (considering too, accuracy)?

Data Frequency

  • Are there any infrastructure barriers or developments that need to be address now?
  • Do we need to engineer a new pipeline?
  • What tools and resources can we explore to extract data (e.g. Dariva´s solutions, ChemTech, Central PIMS, AT team)
  • Does the event I am trying to capture occur at a frequency that can be captured by the available sensors?
  • Is the data online?
  • If necessary, can I change the data collection frequency so I can capture the event?
  • Do I store this data in my databases, or where do I store the data?