This is an important topic in NLP: Extract representative Keywords on each news or article in a collection of articles.
From the attached news.xml file, the script should extract the most important 5 keywords and print them with the title of the article.
Using Sickit Learn and MLTK basically to process the news texts and find the most frequent keywords.
Output:
Brain Disconnects During Sleep:
sleep cortex consciousness tononi tm
New Portuguese skull may be an early relative of Neandertals:
skull fossil europe trait genus
Living by the coast could improve mental health:
health coast mental living household
Did you knowingly commit a crime? Brain scans could tell:
brain suitcase study security scenario
Computer learns to detect skin cancer more accurately than doctors:
dermatologist skin melanoma cnn lesion
US economic growth stronger than expected despite weak demand:
rate growth quarter economy investment
Microsoft becomes third listed US firm to be valued at $1tn:
microsoft share cloud market company
Apple's Siri is a better rapper than you:
siri rhyme smooth rizzo producer
Netflix viewers like comedy for breakfast and drama at lunch:
netflix day comedy viewer tv
Loneliness May Make Quitting Smoking Even Tougher:
smoking loneliness smoke quit lead