Data Journey¶

Mission Statement¶

The Data Journey is a cycle of courses on Data Science offered by the Braskem Digital Factory, a team of scientists in computing, statistics and applied mathematics.

The objective of the journey is clear: change your professional life.

Here you will find, in four different courses, the content to develop practical and theoretical skills in the field of Data Science. You will learn about collection, transport, protection, storage and processing, data analysis, modeling and sharing, so at the end of the journey, students can offer a solution to problems in their area of expertise.

Syllabus¶

Track 1

Introduction to Data Science

Who should take it? Everyone! If you wish to become a Data Scientist or if just want to get a more comprehensive overview of this field this should be your starting point.

Effort Required: Minimal, plan to attend a 2 hours workshop.

By the end of this track you will be able to:

Understand what is Data Science and how you can derive value from it in your business;
Have a basic intuition on how is a Data Science process conducted;
Get familiar with Braskem's initiatives and other market benchmarks.

Content

What is Data Science
Braskem's Initiaves & Market outlook
Effort
Workshop

Track 2

Business Analyst

Who should take it? Analysts, Coordinator or Managers in business areas that wish to get familiar with development of solutions. Professionals seeking tools to enhance productivity or to help them develop business solutions with data.

Effort Required Duration: TBD

By the end of this track you will be able to: - Have a practical knowledge of the tools for development of business solutions; - Ingest and process data in Python; - Know the concepts of different databases and cloud solutions, such as Data lake and data pipelines; - Get a basic concept of the process involved in developing and deploying a digital product;

Content

Developing Tools
Anaconda Navigator
Jupyter Notebook
Spyder / Sublime / Visual Studio
Introduction to Python
Objects & Structure
Logic Operators & Loops
Classes, Methods & Functions
Libraries (Pandas, Numpy & Matplotlib)
Data Wrangling
Importing Data
Data Structure
Tidying Data
Combining Data
Cleaning Data
Software & Data Engineering | Azure
Microsoft Azure Introduction
SQL & NoSQL Databases
Data Pipeline
Standard Tables Architecture
DevOps & Continuous Deployment
Big Data with PySpark & Databricks (optional)

Track 3

Data Analyst

Who should take it? Professionals seeking a technical approach to Data driven decisions. Business units undergoing digital transformation where more data driven reports and analysis are required.

Effort Required: Duration: TBD

By the end of this track you will be able to: How to derive insights from data; Perform basic statistical analysis with available data; Build graphs and reports in Python; Know how to transform and choose the best features to enhance the analysis of data; Be equipped to provide your team with more data driven insights.

Content

Exploratory Data Analysis I
Types of Variables
Variable Summary | Statiscal Moments
Correlation Matrix & ANOVA
Feature Engineering
Feature Selection
Feature Creation & Transformation
Data Normalization & Balance
Data Interpolation
Data Visualization
Matplotlib & Seaborn
Graph & Figure Types

Track 4

Data Scientist

Who should take it? Any professional aspiring to specialize in Data Science with development of Machine Learning algorithms and scientific data analysis.

Effort Required: Duration: TBD

By the end of this track you will be able to: Conduct scientific processes for data analysis; Develop Machine Learning algorithms; Deploy and deliver Data Science based digital products; Provide the business with huge data driven insights and strategies; Fully understand the intuition, math and code behind the main Machine Learning algorithms; Conduct explanatory data analysis to extract causation to enhance business decisions; Develop, deploy and maintain any Data Science project. Understand what are the best approaches and algorithms for each business problem;

Content

Linear Algebra
Matrices & Vectors with Numpy
Alternate Corrdinate Systems
Probability Theory
Probability Models & Axioms
Conditioning & Independence
Counting
Discrete & Continuous Variables Distributions
Bayesian Inference
The Monte Carlo Simulation with Scipy
Statistical Inference
Sample Means & Central Limit Theorem
Confidence Intervals & Hypothesis Testing
Causality Analysis
Linear Modeling & Experimental Design
Exploratory Data Analysis II
Hypothesis Creation & Testing
Endogenous Variables Transformation
Time Series Analysis
Calculus (optional)
Limits & Continuity
Differentials & Derivatives
Univariate Integration
Multivariate Calculus
Learning Techniques
Supervised & Unsupervised Learning
Regressions, Classifications & Clusterings
Train & Test Sets | Dimensionality
Model Scoring
Regression Metrics
Classification Metrics
Clustering Metrics
Cross Validation
Back Propagation & Hyper Parameters
Parameters vs Hyper Parameters
Grid Search & Hyper Parameters Settings
Back Propagation
Main Algorithms Intuition
Linear Regression (Regression)
SVM (Regression & Classification)
Random Forest (Regression & Classification)
Logistic Regression (Classification)
K-Means (Clustering)
Special Models
Boosting
Neural Networks (Deep Learning)
Reinforcement Learning
Natural Language Processing (NLP)
Computer Vision

Disclaimer: The materials produced to the Data Journey does not contain Braskem sensitive data.

Content¶

Track 0 - Introduction to Data Journey¶

Pt-br

Track 0 - Introdução à Jornada de Dados

título	video	slides	script	exercícios
Data Journey - T0V1 - Data Journey - Apresentação
Data Journey - T0V2 - Data Journey - Apresentação

En

Track 0 - Introduction to Data Journey

TBD

Track 1 - Introduction to Data Science¶

Pt-br

Track 1 - Introdução à Ciência de Dados

título	video	slides	script	exercicios
Data Journey - T1V1 - Introdução a DS - Apresentação
Data Journey - T1V2 - Introdução a DS - O que é Ciência de Dados
Data Journey - T1V3 - Introdução a DS - Iniciativas na Braskem
Data Journey - T1V9 - Introdução a DS - Exercício 1 - perguntas
Data Journey - T1V9 - Introdução a DS - Exercício 1 - respostas

En

Track 1 - Introduction to Data Science

TBD

Track 2 - Business Analyst¶

Pt-br

Track 2 - Analista de Negócios

título	video	slides	script	exercicios
Data Journey - T2V1 - Analista de Negócios - Apresentação
Data Journey - T2V2 - Analista de Negócios - Ferramentas de desenvolvimento
Data Journey - T2V3 - Analista de negócios - Anaconda
Data Journey - T2V4 - Analista de Negócios - Exercícios 1

En

Track 2 - Business Analyst

TBD

Track 3 - Data Analyst¶

Pt-br

Track 3 - Analista de Dados

TBD

En

Track 3 - Data Analyst

TBD

Track 4 - Data Scientist¶

Pt-br

Track 4 - Cientista de Dados

TBD

En

Track 4 - Data Scientist

TBD