Skip to content

Latest commit

 

History

History
35 lines (21 loc) · 1.86 KB

README.md

File metadata and controls

35 lines (21 loc) · 1.86 KB

melk

Corpus Gathering and Analysis Framework

License: CC0-1.0 Python version: 3.9 Code style: black

Project Description

Project Melk is a tool that allows digital humanities instructors and students without significant technical backgrounds to easily collect large datasets about specific research topics from social networks and other online media sources. The tool is pedagogical in nature- rather than attempt to provide fine grained control of every aspect of data collection, it gives students a simple, approachable interface with which to collect datasets that will allow them to explore digital text analysis methods.

The ultimate goal of this project is to make an important research method accessible to students and researchers without backgrounds in computer science. We hope it will enable professors to introduce their students to the novel insights enabled by computational analysis methods without requiring them to spend a prohibitively high amount of time wrestling with the mechanics of data collection.

Supported Sources

The New York Times Reddit Logo 2021 Twitter logo - blue

  • The New York Times

  • Reddit

  • Twitter

  • Local datasets including:

    • Billboard Top 100 Song Lyrics Archive
    • Poetry Foundation Archive
    • State of the Union Archive

Usage

Project Melk is primarily intended to be used via its web interface, which is currently under development.