Logistic Regression
In this assignment, I developed a sentiment polarity analyzer using logistic regression to classify restaurant reviews as positive or negative. The tasks involved the following:
Feature Engineering:
- Implemented a Python program (feature.py) to process raw text data and convert it into numerical features using GloVe word embeddings.
- Transformed each review into a 300-dimensional vector by averaging the embeddings of words present in the GloVe dictionary, while ignoring out-of-vocabulary words.
Logistic Regression Classifier:
- Built a logistic regression model in Python (lr.py) using stochastic gradient descent (SGD) for optimization.
- Added support for: an intercept term in the model; configurable learning rate and number of epochs for training.
Training and Evaluation:
- Trained the classifier on a dataset of labeled restaurant reviews.
- Evaluated the model's performance on training and test datasets by computing training and testing error rates, average negative log-likelihood over epochs.
This project enhanced my skills in text preprocessing, feature engineering with embeddings, and implementing machine learning models from scratch. It also improved my understanding of optimization methods, regularization, and the practical challenges of building sentiment analysis systems.
- Machine Learning
- Logistic Regression
- Sentiment Analysis
- Text Classification
- Natural Language Processing
- GloVe Embeddings
- Feature Engineering
- Python
- Stochastic Gradient Descent