Sports vs Politics | Zenith (M25CSA032)

Live Classifier Demo

This interface uses a client-side keyword heuristic derived from the training data (TF-IDF analysis) to simulate the full model's predictions.

Waiting...

Project Overview

This system classifies text documents into two categories: Sports and Politics. It was built as part of the CSL 7640 coursework to compare different feature representations and machine learning algorithms.

The Data

The dataset is a subset of the 20 Newsgroups corpus.

Sports Class: rec.sport.baseball, rec.sport.hockey
Politics Class: talk.politics.guns, talk.politics.mideast, talk.politics.misc

Performance

95.2% Best Accuracy

0.95 Best F1-Score

3 ML Techniques

TF-IDF Best Feature

Key Finding: Simple models like Multinomial Naive Bayes with TF-IDF performed exceptionally well, often matching more complex SVMs.

Experimental Results

Configuration ↕	Accuracy ↕	F1-Score ↕
TF-IDF + MultinomialNB	0.9520	0.9515
BoW + MultinomialNB	0.9480	0.9475
TF-IDF (Bigram) + MultinomialNB	0.9460	0.9455
TF-IDF + LinearSVC	0.9450	0.9442
BoW + LinearSVC	0.9390	0.9385
TF-IDF (Bigram) + LinearSVC	0.9410	0.9405
TF-IDF + RandomForest	0.9150	0.9120
BoW + RandomForest	0.9020	0.8980
TF-IDF (Bigram) + RandomForest	0.9100	0.9080

Visualisation Gallery

Accuracy Ranking

Best Model Profile

Confusion Matrix

Class Balance