Logistic Regression

In this assignment, I developed a sentiment polarity analyzer using logistic regression to classify restaurant reviews as positive or negative. The tasks involved the following:

Feature Engineering:

Implemented a Python program (feature.py) to process raw text data and convert it into numerical features using GloVe word embeddings.
Transformed each review into a 300-dimensional vector by averaging the embeddings of words present in the GloVe dictionary, while ignoring out-of-vocabulary words.

Logistic Regression Classifier:

Built a logistic regression model in Python (lr.py) using stochastic gradient descent (SGD) for optimization.
Added support for: an intercept term in the model; configurable learning rate and number of epochs for training.

Training and Evaluation:

Trained the classifier on a dataset of labeled restaurant reviews.
Evaluated the model's performance on training and test datasets by computing training and testing error rates, average negative log-likelihood over epochs.

This project enhanced my skills in text preprocessing, feature engineering with embeddings, and implementing machine learning models from scratch. It also improved my understanding of optimization methods, regularization, and the practical challenges of building sentiment analysis systems.

Machine Learning
Logistic Regression
Sentiment Analysis
Text Classification
Natural Language Processing
GloVe Embeddings
Feature Engineering
Python
Stochastic Gradient Descent

Phone:

+(886) 909 756 966

Email:

moneychien20639@gmail.com

Arthur Chien

Course:

Time Spent:

Source Code:

Logistic Regression

Feature Engineering:

Logistic Regression Classifier:

Training and Evaluation: