
21 Courses
Cloud Computing
HRDC Reg. No: 10001547565
Duration: 3 Days (24 Hours)
Apache Iceberg is a high-performance open table format designed for analytic workloads on cloud object stores and distributed data lakes. This hands-on course explores Iceberg’s architecture, table design, time travel, partitioning, schema evolution, and integration with modern big data tools such as Spark, Flink, Trino, and Presto. Real-world labs focus on streaming ingestion, rollback, governance, and security.
Data Engineers
Data Platform Architects
Big Data Developers
Lakehouse Engineers
DevOps Engineers
Cloud-Native SaaS Platforms
Financial Services and Banking
Retail and E-Commerce Analytics
Telecommunications and IoT
Healthcare and Pharma
Government & Public Sector Data Platforms
HRDC Claimable – [TBD]
Master the open table format powering modern data lakes and lakehouses, with real-world training in Apache Iceberg—ideal for secure, efficient, and scalable analytics on cloud-native infrastructure.
Participants will be able to:
Understand Iceberg’s architecture and benefits over Hive, Hudi, and Delta Lake
Perform schema/partition evolution, rollback, and metadata pruning
Ingest batch and streaming data with Spark and Flink
Optimize Iceberg performance via compaction and predicate pushdown
Secure data with Apache Ranger and encryption
Deploy Iceberg in multi-engine environments (Spark, Trino, Flink)
Basic understanding of OLAP/OLTP and SQL (recommended)
Familiarity with Hadoop, Linux, and Python
Awareness of ETL and Java stack concepts
Tools & Stack:
Apache Iceberg (latest), Spark 3.x or Flink 1.14+
Trino or Presto, MinIO/S3 emulation
Kafka (for streaming), Docker or cloud (optional)
Jupyter, Zeppelin, VS Code, Apache Ranger
Instructor-led walkthroughs with diagrams
Hands-on lab sessions using real-world datasets
Daily knowledge checks and a capstone project
Cloud Computing
HRDC Reg. No: 10001548574
Duration: 4 Days (28 Hours)
Apache Flink is a high-throughput, fault-tolerant, real-time stream processing framework. This course equips Data Engineers and Developers to master Flink’s DataStream and Table APIs, stateful stream processing, windowing, checkpointing, and integration with systems like Kafka, JDBC, S3, and Elasticsearch. Participants will gain hands-on experience through code labs and a capstone project using real-world data processing scenarios.
Data Engineers
Backend Developers
Streaming Engineers
Big Data Architects
DevOps and Platform Engineers
FinTech and Banking (fraud detection, payments)
Telecommunications and IoT (real-time telemetry)
E-Commerce and Retail (clickstream analytics)
Logistics and Manufacturing (sensor and vehicle tracking)
Media and Advertising (audience measurement, engagement)
HRDC Claimable – [TBD]
Get up to speed with production-grade stream processing using Apache Flink—ideal for building scalable, low-latency pipelines with integrations into real-time and batch ecosystems.
Participants will be able to:
Understand Flink’s architecture and distributed processing model
Develop streaming and batch applications using Flink APIs
Apply windowing, event-time processing, and state management
Integrate Flink with Kafka, S3, JDBC, Elasticsearch, and others
Secure Flink deployments with SSL, Kerberos, and RBAC
Optimize and deploy real-world streaming pipelines
Basic database and SQL understanding (recommended)
Knowledge of Python and Linux shell
Familiarity with ETL workflows and optionally Hadoop
JVM/Java knowledge is helpful but not mandatory
Minimum Requirements:
RAM: 8 GB (16 GB recommended)
CPU: Quad-core
OS: Ubuntu/CentOS preferred; Windows with WSL
Tools: Flink, Kafka, Python 3.8+, IntelliJ/VS Code, Docker (optional), sample datasets
Instructor-led architecture and code walkthroughs
Hands-on labs and scenario-based exercises
Daily practical assignments
Final capstone project and quiz
Cloud Computing
HRDC Reg. No: 10001547674
Duration: 4 Days (28 Hours)
This hands-on course provides a comprehensive guide to designing and implementing Big Data and Machine Learning solutions on AWS. It covers key AWS services such as EMR, Redshift, Glue, Kinesis, Athena, DynamoDB, and Airflow. Participants will learn to build scalable, secure, and cost-effective data pipelines and architectures using cloud-native services for ingestion, transformation, analysis, and orchestration.
Data Engineers
Cloud Architects
Data Analysts and Scientists
DevOps Engineers
ETL Developers
Finance and Insurance
Retail and E-Commerce
Healthcare and Life Sciences
Telecom and Media
Government and Defense
Logistics and Manufacturing
HRDC Claimable – [TBD]
Designed for teams moving their data workloads to AWS, this course equips participants with both foundational knowledge and advanced practical skills to harness the full potential of AWS Big Data services.
By the end of this course, participants will be able to:
Understand AWS cloud architecture and Big Data ecosystem
Ingest and analyze structured/unstructured data using AWS tools
Build data pipelines with Kinesis, Glue, Athena, and EMR
Optimize storage with S3 and databases with Redshift, RDS, and DynamoDB
Automate workflows using Airflow and Lambda
Secure and monitor AWS Big Data environments
Familiarity with Hadoop, Spark, Hive, and HDFS
Programming in Python
Knowledge of SQL/NoSQL and database design
Access & Infrastructure:
Free-tier AWS account recommended
Connectivity to AWS over HTTP, SSH, and TCP
Public IP whitelisting and AWS Infra Readiness guidance provided
Lab Activities:
Guided exercises with S3, Athena, Glue, EMR, Redshift, and Airflow
Real-world data sets and project simulations
Instructor-led sessions
Hands-on labs and use-case-driven exercises
Daily knowledge checks and real-time demos
Cloud Computing
HRDC Reg. No: 10001547563
Duration: 5 days (35 hours)
This intensive hands-on course provides a comprehensive understanding of Apache Kafka and the Confluent platform. Participants will gain expertise in building scalable, real-time streaming architectures, managing Kafka clusters, creating custom producers/consumers, integrating Kafka with Spark, and leveraging Confluent tools like Kafka Connect and KSQL DB for streamlined data pipeline development.
Data Engineers
Backend Developers
DevOps Engineers
Solution Architects
System Integrators
Financial Services & Banking
Telecommunications
E-Commerce & Retail
Media & Entertainment
Government & Smart Infrastructure
Logistics & Manufacturing
HRDC Claimable – [Insert HRDC Claimable ID once registration number is available]
A practical guide to building enterprise-grade, real-time data platforms with Kafka and Confluent tools, optimized for mission-critical event-driven systems and data pipelines.
By the end of this course, participants will be able to:
Deploy and manage multi-node Kafka clusters
Create custom Kafka producers and consumers using Java
Build and manage streaming pipelines using Spark and Kafka Streams
Secure Kafka with SSL, SASL, and ACLs
Use Kafka Connect for integration with external systems
Query and process real-time streams using KSQL DB
Monitor and tune Kafka for performance and reliability
Knowledge of distributed computing
Basic understanding of Hadoop and Spark
Programming experience in Java or Python
Familiarity with Linux and command-line tools
Awareness of enterprise architecture concepts
Each participant will receive a dedicated environment with:
3-node Kafka cluster (includes Zookeeper, Kafka, Spark, and connectors)
Hardware Requirements:
Processor: Intel i5 (8 cores)
RAM: 32 GB
Storage: 200 GB SSD (2,000 IOPS, 100 Mbps)
OS: Ubuntu 22.04
Software: IntelliJ, PyCharm, Docker, Java 8/11, Maven, Python 3.8+, Chrome
Access: Internet (GitHub, Google Drive), SSH, sudo access
Note: AWS setup, IP whitelisting, and proxy configuration as needed
Instructor-led architecture deep dives
Hands-on coding and labs with real-time data
Project simulations and daily quizzes
Scenario-based exercises using Twitter or finance data streams
Cloud Computing
HRDC Reg. No: 10001548281
Duration: 3 Days (24 hours)
This hands-on course is designed for Data Engineers and Big Data Architects aiming to adopt Apache Ozone—a high-performance, scalable object store that is a next-generation replacement for HDFS. Participants will explore architecture, cluster setup, security, integrations with big data tools (Hadoop, Hive, Spark, Kafka, Flink), and deployment of real-world data pipelines. Labs and a capstone project reinforce each learning objective.
Data Engineers
Hadoop Administrators
Platform Architects
Big Data Engineers
Cloud and Storage Architects
Banking and Financial Services
Telecom and Media
Healthcare and Pharma
Retail and E-Commerce
Government and Defense
Industrial IoT and Manufacturing
HRDC Claimable – [TBD]
Future-proof your data architecture with hands-on training in Apache Ozone, a cloud-native object store compatible with HDFS and S3, and optimized for next-gen data lakes and hybrid big data platforms.
By the end of this course, participants will be able to:
Understand the evolution and advantages of Apache Ozone over HDFS and S3-compatible object stores
Deploy and manage multi-node Ozone clusters
Perform volume and object-level operations using CLI and APIs
Tune performance and enable high availability configurations
Secure Ozone using Kerberos, Ranger, and TLS
Integrate Ozone with Spark, Hive, Flink, Kafka, and Presto
Implement enterprise-grade data lake pipelines using Ozone
Basic knowledge of Big Data and distributed storage systems
Familiarity with HDFS, YARN, and Linux file operations
Understanding of Hive, SQL, and object stores like AWS S3, MinIO
Basic CLI skills and networking concepts (SSH, ports)
Experience with Spark or Flink (recommended)
Pre-configured Lab Environment (provided):
3-node Ozone + Hadoop cluster
Hive, Spark, Flink, Kafka pre-installed
Configured with Kerberos, Ranger, Prometheus, Grafana
Sample datasets for log, financial, and clickstream analysis
Manual Deployment Requirements:
RAM: 16 GB per node
vCPUs: 4
OS: Linux (Ubuntu/CentOS)
Java: 8 or 11
Access: SSH, sudo, open ports for Web UI/S3
Instructor-led conceptual and practical sessions
Scenario-based labs and integrations
Real-world use cases
End-of-course capstone project and certification quiz
Cloud Computing
HRDC Reg. No: 10001547630
Course Duration: 35 Hours (5 Days)
Apache Cassandra is a fault-tolerant, distributed NoSQL database designed for large-scale data management. This training focuses on both open-source Cassandra and the DataStax Enterprise (DSE) version, equipping participants with the knowledge to deploy, manage, and integrate Cassandra with enterprise tools like Spark, Kafka, and Java SDKs. Topics include architecture, replication, security, monitoring, advanced querying, and data modeling.
Java Developers
Database Administrators
Data Architects
Big Data Engineers
DevOps and System Engineers
Telecommunications
Banking and Financial Services
E-commerce & Retail
Healthcare & Life Sciences
Public Sector & Defense
Media and Streaming Services
HRDC Claimable – [TBD]
Ideal for organizations looking to adopt scalable NoSQL systems, this course blends architecture mastery with real-world integration skills using Java, Spark, and DevOps tools.
By the end of this course, participants will be able to:
Design and administer Cassandra clusters
Perform replication, data modeling, and backup/restore
Write efficient queries using CQL
Secure and monitor Cassandra deployments
Use OpsCenter, nodetool, and Prometheus for administration
Integrate Cassandra with Java SDK, Apache Spark, and Kafka
Familiarity with Linux and shell commands
Basic Java programming knowledge
Awareness of Big Data tools and SQL concepts
Minimum System Requirements:
Processor: Intel i5 (8 cores, 2.5GHz+)
RAM: 32 GB
Storage: 200 GB SSD (2,000 IOPS, 100 Mbps bandwidth)
Internet: Access to GitHub, Google Drive
OS: Ubuntu 22.04
Software: IntelliJ, PyCharm, VirtualBox, Docker & Compose, Java 8/11, Maven 3.6+, Python 3.8+, Chrome, Git Bash, Putty (Windows)
Administrative Access: Required
AWS Labs (optional): SSH access, Elastic IP whitelisting, proxy setup (if applicable)
Instructor-led demos and lectures
Hands-on labs with real-world datasets
Integration projects with Java and Spark
Performance monitoring and debugging sessions
Cloud Computing
This practical course introduces Big Data concepts and the Hadoop ecosystem. Participants will learn to process, store, transform, and analyze large-scale data using tools such as HDFS, Hive, Pig, Sqoop, Oozie, Kafka, HBase, and Spark. Delivered via Cloudera's environment, the course combines lectures and hands-on labs with real-world datasets.
Data Architects
Enterprise Architects
Developers and Engineers
System Administrators
Data Analysts
Technical Architects
HRDC Claimable (subject to HRDC registration). Participants gain end-to-end understanding of how to architect and build Big Data pipelines using Hadoop tools and frameworks. This course blends foundational theory with hands-on practice for enterprise readiness.
Upon completion, participants will:
Understand Big Data frameworks and ecosystem tools
Use HDFS, MapReduce, and YARN
Perform ETL with Pig and Hive
Build workflows using Oozie
Utilize HBase for NoSQL storage
Use Kafka for real-time data ingestion
Analyze and process data using Apache Spark
Basic Linux command-line skills
Understanding of databases and SQL
Familiarity with Java is beneficial but not required
Hardware:
CPU: Intel i5 or higher
RAM: Minimum 8 GB (16 GB recommended)
Disk: 20+ GB free
Software:
Cloudera QuickStart VM
Java JDK 8+, IntelliJ/Eclipse
MySQL for Sqoop
Kafka, Spark pre-installed
Real datasets: Yahoo Finance, SFPD Crime Data
Instructor-led architecture walkthroughs
Hands-on labs with real data
Use of Cloudera VM sandbox
Daily guided exercises and demonstrations
Cloud Computing
This hands-on course focuses on using Apache Spark with Java to develop large-scale distributed data applications. Participants will learn how to process batch and streaming data using RDDs, DataFrames, Spark SQL, and Structured Streaming. The course also explores integration with Hadoop, Hive, Kafka, and Delta Lake to build robust real-time and versioned data pipelines.
Java Developers transitioning into Big Data roles
Data Engineers and Architects
ETL Developers working with Hadoop/Spark stack
Backend Developers integrating Spark into systems
Engineers developing real-time analytics pipelines
HRDC Claimable. This course delivers practical, Java-centric expertise in Spark-based Big Data applications. It includes real-world lab exercises using IntelliJ IDE, Apache Kafka, Delta Lake, and Hive, preparing participants for data-intensive enterprise environments.
Participants will be able to:
Understand Spark’s distributed architecture and execution model
Develop Spark applications in Java using IntelliJ
Leverage RDDs, DataFrames, and Spark SQL
Integrate Spark with Hive, Kafka, Delta Lake, and Hadoop
Optimize Spark jobs through performance tuning
Build real-time applications using Structured Streaming and Kafka
Implement ACID-compliant data lakes using Delta Lake
Strong Java programming knowledge
Familiarity with Linux OS
Understanding of databases and data pipelines
Basic exposure to Big Data and messaging systems helpful
Hardware:
Intel i5 CPU or higher
8 GB RAM minimum (16 GB recommended)
25 GB free disk space
Software & Tools:
Java JDK 11+
IntelliJ IDEA
Apache Spark 3.x, Hadoop, Hive
Apache Kafka, MySQL/PostgreSQL
NoSQL: HBase or Cassandra (optional)
Preconfigured VM or Docker image (provided)
Instructor-led theory and live demonstrations
IntelliJ-based Java development
Hands-on labs with real-world data
Optional integration with BI tools and databases
Cloud Computing
This intensive hands-on course covers Apache Spark with Python (PySpark), designed to equip participants with the skills to build and scale data processing applications for Big Data. Through real-world labs, learners will explore Spark Core, SQL, Streaming, and advanced topics like Apache Iceberg for scalable, fault-tolerant data lakes.
Data Engineers
Big Data Developers
ETL Developers
DevOps professionals working in Hadoop/Spark ecosystems
Data Analysts building distributed data solutions
HRDC Claimable (HRDC Registration Number required). Gain in-demand data engineering skills using PySpark through an immersive training experience. The course offers real-world use cases, from ETL pipelines to real-time analytics with Kafka and Iceberg integration.
Participants will learn to:
Understand Spark architecture and integrate with Hadoop
Develop applications using PySpark and Spark SQL
Perform distributed data processing, aggregations, and ETL tasks
Stream data with Spark Streaming and Kafka
Manage modern data lakes using Apache Iceberg
Optimize Spark applications for performance and scalability
Basic Python and Linux skills
Familiarity with databases and ETL workflows
Basic knowledge of Hadoop and SQL recommended but not essential
Hardware Requirements:
CPU: Intel i5 or higher
RAM: 8 GB minimum (16 GB recommended)
Storage: 20 GB free space
Software Environment:
Preconfigured VM with Hadoop, Spark 3.x, Hive, Kafka, Iceberg
Python 3.8+, Jupyter Notebook or VS Code
Additional libraries: PySpark, pandas, numpy, matplotlib, kafka-python
Instructor-led sessions
Guided hands-on lab exercises
Real-world scenarios and architecture walkthroughs
Use of VMs or containers for practical exposure
Cloud Computing
This Microsoft Azure expert-level course provides hands-on training for designing and implementing Azure-based infrastructure solutions. Participants will learn how to design governance, compute, storage, networking, security, and data integration solutions on Azure. The course aligns with industry best practices and prepares professionals for the AZ-305 certification exam.
This course is ideal for:
Cloud Architects – Designing scalable and secure Azure infrastructure.
IT Professionals – Managing enterprise cloud environments.
Solution Architects – Creating cloud-based solutions for businesses.
Database Administrators – Designing scalable and available data solutions.
HRDC Claimable (Check with HRDC for eligibility)
Covers real-world Azure infrastructure solutions
Hands-on training with practical design scenarios
Prepares for Microsoft AZ-305 certification
Knowledge of Azure Active Directory
Understanding of Azure Compute Technologies (VMs, Containers, Serverless)
Familiarity with Azure Virtual Networking and Load Balancers
Understanding of Azure Storage (structured & unstructured data)
Basic knowledge of application design concepts (messaging, high availability)
Cloud Computing
This advanced course provides a comprehensive understanding of SQL Azure, focusing on performance optimization, security, scalability, and automation within Microsoft's Azure SQL Database. Designed for database administrators, developers, and IT professionals, this course equips participants with the skills needed to manage and optimize cloud-based databases effectively.
Ideal for:
By the end of this course, participants will be able to:
Cloud Computing
This 5-day course provides a comprehensive understanding of Microsoft Azure architecture, focusing on designing and implementing secure, scalable, and efficient cloud solutions. Participants will learn about Azure infrastructure, networking, security, compute, storage, data management, and application services using best practices and Azure tools.
Ideal for:
Upon completing this course, participants will be able to:
Cloud Computing
This 1-day course provides a high-level overview of Amazon Web Services (AWS) and its key services, focusing on cloud computing benefits for business decision-makers. It covers financial advantages, security and compliance, and cloud migration strategies, helping participants make informed decisions on cloud adoption.
Ideal for:
By the end of the course, participants will be able to:
Cloud Computing
This one-day course offers a comprehensive introduction to AWS Cloud concepts, services, security, architecture, pricing, and support. Designed for business professionals and those new to cloud technologies, it covers key AWS services and foundational principles of cloud computing.
Ideal for:
By the end of this course, participants will be able to:
Cloud Computing
This comprehensive 5-day course covers AWS Cloud Computing concepts, focusing on essential services, architecture best practices, and hands-on experience with AWS tools. Participants will gain the skills to design, deploy, and manage secure, scalable, and resilient cloud-based solutions using Amazon Web Services (AWS).
Ideal for:
By the end of the course, participants will be able to:
Cloud Computing
This 5-day comprehensive course provides an in-depth exploration of Google Cloud Platform (GCP), focusing on core cloud services, data management, security, DevOps practices, and machine learning. Participants will gain hands-on experience in building, deploying, and managing scalable applications using GCP tools and best practices.
Ideal for:
By the end of the course, participants will be able to:
Cloud Computing
This 5-day course explores the principles, design patterns, and best practices essential for building scalable, resilient, and cost-efficient cloud architectures. Participants will gain hands-on experience designing solutions using AWS, Azure, and Google Cloud while addressing real-world architectural challenges.
Ideal for:
By the end of this course, participants will be able to:
Cloud Computing
This 3-day course is designed for developers who want to design, develop, and deploy applications using Google Cloud Platform (GCP) services. Participants will gain hands-on experience with cloud storage, compute engines, Kubernetes, serverless services, and APIs, enabling them to build scalable, secure, and cloud-native applications.
Ideal for:
By the end of this course, participants will be able to:
Cloud Computing
This 4-day course provides a deep dive into Microsoft Azure, focusing on developing, deploying, and managing cloud applications. Participants will gain expertise in Azure compute, storage, networking, and security services, ensuring they can build scalable, secure, and resilient Azure-based solutions.
Ideal for:
By the end of this course, participants will be able to:
Cloud Computing
This 35-hour comprehensive course is designed to equip cloud architects and engineers with the foundational knowledge and practical skills required to design, build, and manage API gateways and microservices using Google Cloud Apigee. Participants will explore the essential concepts of API design, development, security, and deployment while understanding the architecture and best practices of Apigee within Google Cloud Platform (GCP).
This HRDC-claimable course (HRDC Reg. No: 10001468849) provides hands-on experience in designing, securing, and managing APIs using Google Cloud Apigee, with real-world case studies and projects.
By the end of this course, participants will be able to:
Understand Apigee architecture and its components
Design, secure, and manage APIs using Google Cloud Apigee
Implement API policies for rate-limiting, quotas, security, and caching
Utilize OAuth2.0, SAML, and OpenID Connect for securing APIs
Monitor, troubleshoot, and optimize API performance on Apigee
Integrate Apigee with other Google Cloud services
Implement and manage API monetization models
Design scalable, secure, and reliable API gateways for enterprise solutions
Instructor-led Lectures – In-depth explanations of Apigee and API concepts
Hands-on Labs & Exercises – Practical implementation of API management techniques
Group Discussions & Problem-Solving – Collaborative approach to real-world API issues
Case Studies & Project-Based Assignments – Industry use cases and applications
Continuous Assessments – Quizzes, feedback sessions, and a final project