
Text indexing and IR fundamentals
Regex, BeautifulSoup for HTML extraction
Textract for PDFs and Word documents
SpaCy setup and Jupyter usage
Numpy and Pandas for data handling
Tokenization, Stemming, Lemmatization
POS Tagging and Named Entity Recognition
Bag of Words, TF-IDF, and N-gram models
Logistic Regression for Text Classification
Project: IMDB Comment Classifier
Twitter API integration and tweet processing
Sentiment prediction using trained TF-IDF models
Text summarization using scoring models
Word2Vec and pretrained models (GloVe)
Text generation basics using neural nets
Creating a RASA assistant
Entities, intents, stories, forms, rules
Model evaluation and confidence interpretation
Introduction to Rasa X and Conversational-Driven Development (CDD)