
Overview of Hadoop and Big Data use cases
HDFS Architecture and File Operations
Sqoop for RDBMS-Hadoop data transfer
Labs: HDFS commands, Sqoop import/export jobs
MapReduce algorithms and Java-based job execution
Job orchestration with Oozie
YARN Architecture and Performance Tuning
Cluster sizing and planning
Labs: MapReduce via Eclipse, Oozie workflows, YARN tuning
Data Transformation and Analytics using Pig
Hive SQL: Partitions, SerDe, Table formats
Labs: Yahoo Finance data processing with Hive/Pig
Choosing optimal file formats and compression codecs
HBase architecture and REST API
Hive-HBase integration and bulk loading
Labs: Data format transformations, HBase table operations
Kafka Architecture, Producers/Consumers
Multi-node Kafka setup and integration
Spark Architecture: RDDs, DAG, Streaming
Labs: Kafka ingestion, Spark processing of SFPD crime data