HRDC Reg. No: 10001547565
Duration: 3 Days (24 Hours)
Course Overview
Apache Iceberg is a high-performance open table format designed for analytic workloads on cloud object stores and distributed data lakes. This hands-on course explores Iceberg’s architecture, table design, time travel, partitioning, schema evolution, and integration with modern big data tools such as Spark, Flink, Trino, and Presto. Real-world labs focus on streaming ingestion, rollback, governance, and security.
Who Should Attend
-
Data Engineers
-
Data Platform Architects
-
Big Data Developers
-
Lakehouse Engineers
-
DevOps Engineers
Targeted Industries
-
Cloud-Native SaaS Platforms
-
Financial Services and Banking
-
Retail and E-Commerce Analytics
-
Telecommunications and IoT
-
Healthcare and Pharma
-
Government & Public Sector Data Platforms
Why Choose This Course
HRDC Claimable – [TBD]
Master the open table format powering modern data lakes and lakehouses, with real-world training in Apache Iceberg—ideal for secure, efficient, and scalable analytics on cloud-native infrastructure.
Learning Outcomes
Participants will be able to:
-
Understand Iceberg’s architecture and benefits over Hive, Hudi, and Delta Lake
-
Perform schema/partition evolution, rollback, and metadata pruning
-
Ingest batch and streaming data with Spark and Flink
-
Optimize Iceberg performance via compaction and predicate pushdown
-
Secure data with Apache Ranger and encryption
-
Deploy Iceberg in multi-engine environments (Spark, Trino, Flink)
Prerequisites
-
Basic understanding of OLAP/OLTP and SQL (recommended)
-
Familiarity with Hadoop, Linux, and Python
-
Awareness of ETL and Java stack concepts
Lab Setup
Tools & Stack:
-
Apache Iceberg (latest), Spark 3.x or Flink 1.14+
-
Trino or Presto, MinIO/S3 emulation
-
Kafka (for streaming), Docker or cloud (optional)
-
Jupyter, Zeppelin, VS Code, Apache Ranger
Teaching Methodology
-
Instructor-led walkthroughs with diagrams
-
Hands-on lab sessions using real-world datasets
-
Daily knowledge checks and a capstone project