Skip to main content

Section outline

    • Overview of Hadoop and Big Data use cases

    • HDFS Architecture and File Operations

    • Sqoop for RDBMS-Hadoop data transfer

    • Labs: HDFS commands, Sqoop import/export jobs

    • MapReduce algorithms and Java-based job execution

    • Job orchestration with Oozie

    • YARN Architecture and Performance Tuning

    • Cluster sizing and planning

    • Labs: MapReduce via Eclipse, Oozie workflows, YARN tuning

    • Data Transformation and Analytics using Pig

    • Hive SQL: Partitions, SerDe, Table formats

    • Labs: Yahoo Finance data processing with Hive/Pig

    • Choosing optimal file formats and compression codecs

    • HBase architecture and REST API

    • Hive-HBase integration and bulk loading

    • Labs: Data format transformations, HBase table operations

    • Kafka Architecture, Producers/Consumers

    • Multi-node Kafka setup and integration

    • Spark Architecture: RDDs, DAG, Streaming

    • Labs: Kafka ingestion, Spark processing of SFPD crime data