Web Page Summarizer
This was a fun project to really get a handle on some practical skills.
The first skill was web scraping. The function is pretty basic - when given a webpage URL, extract the text contents of that webpage.
Contents in hand, I then needed to perform some basic text transformations to get the human-readable content standardized for computers to understand.
With normalized text output, I then took that same output and summarized it two ways. The first, using Sumy’s LSA model, I performed Extractive Text Summary - basically pulling out the three sentences with the most ‘value.’
That same normalized text was then sent to GPT-3 with a “summarize” command to take whatever was given and return a summary of no more than 100 words. This is an Abstractive Text Summary, meaning the words and sentences in the summary are new, and authored by the model.
Seeing the two results side-by-side shows that abstractive usually has an edge. However, I’ll probably revisit this in the near future to try and tweak the results for the best-possible scenario.
About the Project
Often, the gist is enough. So, feed this a website, get back the gist, two ways.
Built with:
Python, Django, Beautiful Soup/Web Scaping, Sumy, Latent Semantic Analysis, GPT-3, APIs