Bradley Robb | Satrical Headline Classifier

Satirical Headline Predictor

This predictor was built using NLTK's Naive Bayes Classifier and a dataset of around 26,000 headlines with a roughly 55/45 split between real (NYTimes, BBC, etc) and satirical (The Onion).

This classifier does not generalize to "fake news" or "sarcasm" detection, and is instead intended to determine based on the words in a headline, if the headline is from a straight news site, or is intendeing to get a laugh ala the Onion. The usecase would be to automatically apply a satire when such a headline appears on a social networking site.

The dataset trained to an 86% accuracy, and skews towards 'not satire'.

Two words did appear as most indictative of satire "area" and "hoping" - so a headline like "Area Man Hoping to..." will almost always class as satire.

The original dataset can be found here.