HRDC Reg. No: 10001548585
Course Duration: 35 Hours (5 Days)
Course Overview
This hands-on course focuses on using Apache Spark with Java to develop large-scale distributed data applications. Participants will learn how to process batch and streaming data using RDDs, DataFrames, Spark SQL, and Structured Streaming. The course also explores integration with Hadoop, Hive, Kafka, and Delta Lake to build robust real-time and versioned data pipelines.
Who Should Attend
-
Java Developers transitioning into Big Data roles
-
Data Engineers and Architects
-
ETL Developers working with Hadoop/Spark stack
-
Backend Developers integrating Spark into systems
-
Engineers developing real-time analytics pipelines
Why Choose This Course
HRDC Claimable. This course delivers practical, Java-centric expertise in Spark-based Big Data applications. It includes real-world lab exercises using IntelliJ IDE, Apache Kafka, Delta Lake, and Hive, preparing participants for data-intensive enterprise environments.
Learning Outcomes
Participants will be able to:
-
Understand Spark’s distributed architecture and execution model
-
Develop Spark applications in Java using IntelliJ
-
Leverage RDDs, DataFrames, and Spark SQL
-
Integrate Spark with Hive, Kafka, Delta Lake, and Hadoop
-
Optimize Spark jobs through performance tuning
-
Build real-time applications using Structured Streaming and Kafka
-
Implement ACID-compliant data lakes using Delta Lake
Prerequisites
-
Strong Java programming knowledge
-
Familiarity with Linux OS
-
Understanding of databases and data pipelines
-
Basic exposure to Big Data and messaging systems helpful
Lab Setup Requirements
Hardware:
Software & Tools:
-
Java JDK 11+
-
IntelliJ IDEA
-
Apache Spark 3.x, Hadoop, Hive
-
Apache Kafka, MySQL/PostgreSQL
-
NoSQL: HBase or Cassandra (optional)
-
Preconfigured VM or Docker image (provided)
Teaching Methodology
-
Instructor-led theory and live demonstrations
-
IntelliJ-based Java development
-
Hands-on labs with real-world data
-
Optional integration with BI tools and databases