Corpus Gathering and Analysis Framework
Project Melk is a tool that allows digital humanities instructors and students without significant technical backgrounds to easily collect large datasets about specific research topics from social networks and other online media sources. The tool is pedagogical in nature- rather than attempt to provide fine grained control of every aspect of data collection, it gives students a simple, approachable interface with which to collect datasets that will allow them to explore digital text analysis methods.
The ultimate goal of this project is to make an important research method accessible to students and researchers without backgrounds in computer science. We hope it will enable professors to introduce their students to the novel insights enabled by computational analysis methods without requiring them to spend a prohibitively high amount of time wrestling with the mechanics of data collection.
-
The New York Times
-
Reddit
-
Twitter
-
Local datasets including:
- Billboard Top 100 Song Lyrics Archive
- Poetry Foundation Archive
- State of the Union Archive
Project Melk is primarily intended to be used via its web interface, which is currently under development.