Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

be more principled about which texts to include #2

Open
aparrish opened this issue Aug 13, 2018 · 0 comments
Open

be more principled about which texts to include #2

aparrish opened this issue Aug 13, 2018 · 0 comments

Comments

@aparrish
Copy link
Owner

I believe that appropriative "remix" artwork, especially such artwork that "punches up" and/or uses material in the public domain, is fundamentally progressive: a way to loosen the stranglehold of power structures established in culture. In that spirit, the original intention of this corpus was to provide an ecumenical source of copyright-free "raw material" for evocative poetic text generation that has the cadence and form of stereotypical Poetry-with-a-capital-P.

Of course, the idea of "material" being "raw" sometimes serves only to obscure the (sometimes problematic) ways in which a material comes into existence, and textual raw material is no different—the texts in this corpus in particular carry with them the politics and points of view of the people that originally authored them. Though I've taken some effort to mitigate this, In some cases text that you get by randomly sampling this corpus will contain offensive content, or works and authors whose viewpoints are unacceptable. The demographic of authors included in the corpus is also very particular (mostly dead white men from America or Great Britain).

It's impossible to completely circumvent this problem, of course (there's no such thing as a neutral corpus), but I do think it's possible to mitigate it, and to appropriately set expectations for users of the corpus, by being more principled about which source texts to include. (This might include introducing texts that are not presently in Project Gutenberg.) I'd like to come up with a list of criteria that determine whether or not a text should be included, with "in the public domain" being the cornerstone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant