Skip to main content

Section outline

    • Text indexing and IR fundamentals

    • Regex, BeautifulSoup for HTML extraction

    • Textract for PDFs and Word documents

    • SpaCy setup and Jupyter usage

    • Numpy and Pandas for data handling

    • Tokenization, Stemming, Lemmatization

    • POS Tagging and Named Entity Recognition

    • Bag of Words, TF-IDF, and N-gram models

    • Logistic Regression for Text Classification

    • Project: IMDB Comment Classifier

    • Twitter API integration and tweet processing

    • Sentiment prediction using trained TF-IDF models

    • Text summarization using scoring models

    • Word2Vec and pretrained models (GloVe)

    • Text generation basics using neural nets

    • Creating a RASA assistant

    • Entities, intents, stories, forms, rules

    • Model evaluation and confidence interpretation

    • Introduction to Rasa X and Conversational-Driven Development (CDD)