Data Journey¶
Mission Statement¶
The Data Journey is a cycle of courses on Data Science offered by the Braskem Digital Factory, a team of scientists in computing, statistics and applied mathematics.
The objective of the journey is clear: change your professional life.
Here you will find, in four different courses, the content to develop practical and theoretical skills in the field of Data Science. You will learn about collection, transport, protection, storage and processing, data analysis, modeling and sharing, so at the end of the journey, students can offer a solution to problems in their area of expertise.
Syllabus¶
Introduction to Data Science
Who should take it? Everyone! If you wish to become a Data Scientist or if just want to get a more comprehensive overview of this field this should be your starting point.
Effort Required: Minimal, plan to attend a 2 hours workshop.
By the end of this track you will be able to:
- Understand what is Data Science and how you can derive value from it in your business;
- Have a basic intuition on how is a Data Science process conducted;
- Get familiar with Braskem's initiatives and other market benchmarks.
Content
- What is Data Science
- Braskem's Initiaves & Market outlook
- Effort
- Workshop
Business Analyst
Who should take it? Analysts, Coordinator or Managers in business areas that wish to get familiar with development of solutions. Professionals seeking tools to enhance productivity or to help them develop business solutions with data.
Effort Required Duration: TBD
By the end of this track you will be able to: - Have a practical knowledge of the tools for development of business solutions; - Ingest and process data in Python; - Know the concepts of different databases and cloud solutions, such as Data lake and data pipelines; - Get a basic concept of the process involved in developing and deploying a digital product;
Content
- Developing Tools
- Anaconda Navigator
- Jupyter Notebook
- Spyder / Sublime / Visual Studio
- Introduction to Python
- Objects & Structure
- Logic Operators & Loops
- Classes, Methods & Functions
- Libraries (Pandas, Numpy & Matplotlib)
- Data Wrangling
- Importing Data
- Data Structure
- Tidying Data
- Combining Data
- Cleaning Data
- Software & Data Engineering | Azure
- Microsoft Azure Introduction
- SQL & NoSQL Databases
- Data Pipeline
- Standard Tables Architecture
- DevOps & Continuous Deployment
- Big Data with PySpark & Databricks (optional)
Data Analyst
Who should take it? Professionals seeking a technical approach to Data driven decisions. Business units undergoing digital transformation where more data driven reports and analysis are required.
Effort Required: Duration: TBD
By the end of this track you will be able to: How to derive insights from data; Perform basic statistical analysis with available data; Build graphs and reports in Python; Know how to transform and choose the best features to enhance the analysis of data; Be equipped to provide your team with more data driven insights.
Content
- Exploratory Data Analysis I
- Types of Variables
- Variable Summary | Statiscal Moments
- Correlation Matrix & ANOVA
- Feature Engineering
- Feature Selection
- Feature Creation & Transformation
- Data Normalization & Balance
- Data Interpolation
- Data Visualization
- Matplotlib & Seaborn
- Graph & Figure Types
Data Scientist
Who should take it? Any professional aspiring to specialize in Data Science with development of Machine Learning algorithms and scientific data analysis.
Effort Required: Duration: TBD
By the end of this track you will be able to: Conduct scientific processes for data analysis; Develop Machine Learning algorithms; Deploy and deliver Data Science based digital products; Provide the business with huge data driven insights and strategies; Fully understand the intuition, math and code behind the main Machine Learning algorithms; Conduct explanatory data analysis to extract causation to enhance business decisions; Develop, deploy and maintain any Data Science project. Understand what are the best approaches and algorithms for each business problem;
Content
- Linear Algebra
- Matrices & Vectors with Numpy
- Alternate Corrdinate Systems
- Probability Theory
- Probability Models & Axioms
- Conditioning & Independence
- Counting
- Discrete & Continuous Variables Distributions
- Bayesian Inference
- The Monte Carlo Simulation with Scipy
- Statistical Inference
- Sample Means & Central Limit Theorem
- Confidence Intervals & Hypothesis Testing
- Causality Analysis
- Linear Modeling & Experimental Design
- Exploratory Data Analysis II
- Hypothesis Creation & Testing
- Endogenous Variables Transformation
- Time Series Analysis
- Calculus (optional)
- Limits & Continuity
- Differentials & Derivatives
- Univariate Integration
- Multivariate Calculus
- Learning Techniques
- Supervised & Unsupervised Learning
- Regressions, Classifications & Clusterings
- Train & Test Sets | Dimensionality
- Model Scoring
- Regression Metrics
- Classification Metrics
- Clustering Metrics
- Cross Validation
- Back Propagation & Hyper Parameters
- Parameters vs Hyper Parameters
- Grid Search & Hyper Parameters Settings
- Back Propagation
- Main Algorithms Intuition
- Linear Regression (Regression)
- SVM (Regression & Classification)
- Random Forest (Regression & Classification)
- Logistic Regression (Classification)
- K-Means (Clustering)
- Special Models
- Boosting
- Neural Networks (Deep Learning)
- Reinforcement Learning
- Natural Language Processing (NLP)
- Computer Vision
Disclaimer: The materials produced to the Data Journey does not contain Braskem sensitive data.
Content¶
Track 0 - Introduction to Data Journey¶
Track 0 - Introduction to Data Journey
TBD
Track 1 - Introduction to Data Science¶
Track 1 - Introdução à Ciência de Dados
Track 1 - Introduction to Data Science
TBD
Track 2 - Business Analyst¶
Track 2 - Analista de Negócios
Track 2 - Business Analyst
TBD
Track 3 - Data Analyst¶
Track 3 - Analista de Dados
TBD
Track 3 - Data Analyst
TBD
Track 4 - Data Scientist¶
Track 4 - Cientista de Dados
TBD
Track 4 - Data Scientist
TBD