
HDFS, YARN, Namenode/DataNode Architecture
MapReduce Basics & Hive Setup
ETL with Hive, Hive SerDes, Partitioning
Lab: HDFS commands, MapReduce job, Hive ETL queries
Spark vs Hadoop, Lambda Architecture
RDD Operations, Caching, Checkpointing
Spark Internals: DAG, Partitions, Shuffling
Performance Tuning and Cluster Setup
Lab: RDD manipulation, caching, metrics analysis
DataFrame APIs, Catalyst Optimizer, Tungsten Engine
Working with CSV, JSON, XML, Parquet, ORC
Advanced SQL: Joins, Window Functions, Aggregations
Performance Optimizations in Spark SQL
Lab: SQL queries, UDFs, Hive Integration
DStream, Windowing, Fault Tolerance
Kafka Architecture and Cluster Configuration
Kafka-Spark Streaming Integration
Lab: Real-time analysis with Twitter & Kafka streams
Iceberg Architecture, Schema Evolution, Time Travel
Integration with Hive and Hadoop
Performance Optimizations & GDPR Compliance
Lab: Create, Query, Optimize Iceberg Tables using Spark SQL