Course: Machine Learning using Apache Spark | Timmins

Course Content

Section outline

Select section Day 1 – ML Foundations & Spark Introduction

Day 1 – ML Foundations & Spark Introduction

Collapse all Expand all
ML vs Statistics vs Data Science

Data Preprocessing: Encoding, Missing Values, Outliers

Python and R for ML: NumPy, Pandas, ggplot2

Spark Basics: RDD, DF, SparkR, MLlib

Lab: S&P 500 stock data analysis
Select section Day 2 – Supervised Learning: Regression

Day 2 – Supervised Learning: Regression
Linear, Multiple Linear Regression

Ridge, Lasso, ElasticNet, Cross Validation

Gradient Boosting for Regression

Lab: Power demand prediction, Housing price regression
Select section Day 3 – Supervised Learning: Classification

Day 3 – Supervised Learning: Classification
Decision Trees, Random Forests

Logistic Regression, Support Vector Machines

Evaluation: Confusion Matrix, ROC-AUC

Lab: Customer segmentation, Credit risk analysis, UCI wine dataset
Select section Day 4 – Unsupervised Learning & NLP

Day 4 – Unsupervised Learning & NLP
Clustering: K-Means, Hierarchical

Feature Engineering and PCA

Text Analytics: TF-IDF, POS, Lemmatization, Sentiment Analysis

Lab: Movie genre clustering, IMDB comment classification
Select section Day 5 – Scalable ML & Deployment

Day 5 – Scalable ML & Deployment
Spark MLlib & ML Pipelines

Saving and Serving Models

Optional: PredictionIO for streaming models

Lab: Stack Overflow dataset processing and community detection

Offices

Kuala Lumpur

Taman Zeta@Zetapark, C-11-01

Komplek Danau Kota, 67, Jln

Taman Ibu Kota, Setapak,

53300 Kuala Lumpur

Penang

Timmins Training Center

1-3-6 Jalan Mayang Pasir 3, Elit Avenue

Bayan Lepas

11950, Pulau Pinang

COMPANY

SERVICES