Skip to main content

Section outline

  • Session 1: Data Collection and Preparation
    • Importing and Exporting Data
    • Reading data from CSV, Excel, SQL databases, and APIs
    • Mathematical & statistical operations
    • Reshaping and stacking arrays
    • Hands-on Exercise: Solving mathematical problems using NumPy

    Session 2: Data Manipulation and Transformation
    • Filtering and selecting data from DataFrames
    • Sorting and ranking data
    • Grouping data and applying aggregate functions (groupby(), agg())
    • Merging, joining, and concatenating datasets
    • Hands-on Exercise: Checking for missing values, modifying data types, and sorting DataFrames

    Session 3: Data Aggregation & Multi-Indexing
    • Aggregating data using groupby()
    • Multi-indexing for hierarchical data
    • Hands-on Exercise:
      • Combining two datasets using joins
      • Grouping data by categorical variables and computing statistics

    Session 4: Exploratory Data Analysis (EDA)
    • Understanding the structure of datasets
    • Displaying and summarizing data using Pandas (head(), describe(), info())
    • Handling missing data (imputation, dropping rows/columns)
    • Data Cleaning: Removing duplicates, correcting data types
    • Feature Engineering: Creating new features, scaling, encoding categorical variables
    • Hands-on Exercise: Univariate, Bivariate, and Multivariate Data Analysis Case Studies

     

     

  • Session 5: Data Visualization with Matplotlib & Seaborn
    • Creating basic plots: Line, bar, histogram, scatter, etc.
    • Customizing plots: Titles, labels, legends, colors
    • Creating histograms, box plots, and heatmaps
    • Hands-on Exercise:
      • Creating line and bar plots from datasets
      • Creating a heatmap to visualize correlations

    Session 6: Statistical Analysis & Hypothesis Testing
    • Descriptive statistics: Mean, median, mode, variance, standard deviation
    • Probability distributions: Normal, Binomial, Poisson distributions
    • Inferential statistics: Confidence intervals, p-values
    • Introduction to hypothesis testing
    • Types of hypothesis tests: T-test, Chi-squared test, ANOVA
    • Hands-On Exercise:
      • One-Sample t-Test, Chi-Square Test, ANOVA using SciPy
      • Performing hypothesis testing on a real-world dataset

    Session 7: Optimizing Data Frames using Vectorized Operations
    • Using apply() vs. vectorized operations
    • Performance considerations when working with large datasets
    • Hands-on Exercise: Optimizing code performance using vectorization

     

    Final Project: Real-World Data Analytics Case Study
    • Working with a real-world dataset
    • End-to-End Data Analysis Process: Cleaning, manipulation, visualization, and hypothesis testing
    • Hands-on Exercise:
      • Apply learned concepts to analyze and visualize business data