Spring 2018, Mondays 3–6 p.m. UTA 1.210A
Instructor: Tanya Clement
Office hours: Mondays 1–3 p.m., UTA 5.558
Week 1 | Week 2 | Week 3 | Week 4 | Week 5 | Week 6 | Week 7 | Week 8 | Week 9 | Week 10 | Week 11 | Week 12 | Week 13 | Week 14
Prerequsites: advanced-level undergraduate or graduate coursework in the humanities; no or very little programming experience preferred;
In the data, information, knowledge, wisdom (DIKW) hierarchy that circulates through Knowledge Management (KM) and Information Science (IS) discussions, data appears at the base of a pyramid of which wisdom is the pinnacle. In this schematic, data is “raw” and lacking in meaning, while information, the next higher level of the pyramid—just below knowledge and then wisdom—represents the presence of added links and relationships; information is higher up on the wisdom chain because it is data made meaningful. In the humanities, students are taught that data is not found in the “raw” but has rather been cooked all along, taken and constructed and seasoned according to our situated contexts including access issues (Where is the data?); media, format, and technology constraints (How is the data?); and perspectives (What is the data? Who is involved in and impacted by its creation and use?).
Learning to think critically about data as information means rejecting common illusions about data more generally, including its objectivity, impersonality, atemporality, and authorlessness. To teach students to think about information from this more critical perspective means first understanding how a culture tends to understand what is informative.
The aim of this course is to encourage students to generate high quality scholarship that applies computational and quantitative methods to the study of cultural artifacts (text, image, sound) at significantly larger scales than traditional methods. The final research paper is expected to combine critical theory, computational methods, and grounding in a particular humanities field towards the crafting of novel, thought-provoking arguments in the humanities.
Towards these ends, this course takes on “data wrangling” in the context of humanist perspectives.
Learning goals:
-
Exploring the cultural implications of large-scale data analysis with cultural materials.
-
Writing using perspectives in critical data studies;
-
Gaining familiarity with scripting-style programming in Python and Unix-like systems with an emphasis on gaining critical perspectives on the use of freely available data sets in the humanities and on free and open source software; in techniques for collecting, transforming, and analyzing media and metadata available on the Web; of commonly used data models and their standard formats, including CSV, JSON, and XML; of text analysis techniques such as natural language processing (NLP), sentiment analysis, and machine learning classification; and with tools for analyzing cultural data via visualization and statistical tests
-
Writing critically about data requires both a level of knowledge about data and data wrangling as it requires a level of knowledge about thinking and writing from critical perspectives learned in cultural studies. While this course does not teach cultural studies per se, an understanding of and experience in humanities theory and research and the principles of cultural studies are essential for success in the course.
-
Imitating and modifying others’ code is essential in learning to program. You can many examples and explanations on Stack Exchange and similar online forums. Taking one or two lines without attribution is OK; if you use a longer chunk of code found online, add a #comment with the source’s URL.
-
Begin assignments early. If you realize what you had in mind is more difficult than expected, talk to the instructor about choosing an alternative.
-
We’ll be focusing on a scripting approach to programming. This course is not oriented toward developing large, complex programs or writing perfectly optimized code.
-
Learning to code takes trial and error. Work through weekly programming tutorials before class and continue polishing in-class coding assignments at home.
There is one required text for this course:
Montfort, Nick. Exploratory Programming for the Arts and Humanities. Cambridge, MA: The MIT Press, 2016.
All other readings will either be available online and linked below or posted on Canvas.
- danah boyd & Kate Crawford (2012) "Critical Questions for Big Data," Information, Communication & Society, 15:5, 662-679.
- Piper, Andrew. "There will be Numbers." Journal of Cultural Analytics 1, no. 1 (May 23, 2016). http://culturalanalytics.org/2016/05/there-will-be-numbers/
- Bod, Rens. "Introduction: the Quest for Principles and Patterns." A New History of the Humanities: The Search for Principles and Patterns from Antiquity to the Present. Oxford University Press, 2013, pp. 1 - 12. Note: You must be logged in as a UT student to retrieve this text: http://catalog.lib.utexas.edu/record=b8902003~S29
- Montfort, chp. 1 "Introduction"; "Installation and Setup" (for your information); chp. 1 "Modifying a Program"; chp. 2 "Calculating"
- “The Jupyter Notebook.” http://jupyter-notebook.readthedocs.io/en/latest/notebook.html
- Booth, Wayne C., et al., chp. 3 "From Topics to Questions". The Craft of Research, Third Ed. University Of Chicago Press, 2008.
- Borgman chp. 1 "Provocations" Big Data, Little Data, No Data: Scholarship in the Networked World. The MIT Press, 2015.
- Marche, Stephen. “Literature Is not Data: Against Digital Humanities.” Los Angeles Review of Books, October 28th, 2012. https://lareviewofbooks.org/essay/literature-is-not-data-against-digital-humanities
- Read in order:
- Manovich, Lev. 2008. “The Next Big Thing in Humanities, Arts, and Social Science Computing: Cultural Analytics”. HPC Wire, July 29.
- Manovich, Lev. 2009. “Cultural Analytics: Visualizing Cultural Patterns in the Era of ‘More Media.’"" Software Studies Initiative website).
- Hall, Gary. “Toward a Postdigital Humanities: Cultural Analytics and the Computational Turn to Data-Driven Scholarship.” American Literature 85, no. 4 (January 1, 2013): 781–809.
- Padilla, T. "On a Collections as Data Imperative.".
- Kenner, Hugh. Sentences. Harvard Book Review No. 13/14 (Summer - Fall, 1989), pp. 3-4.
- Posner, Miriam. “Humanities Data: A Necessary Contradiction.” Miriam Posner’s Blog, June 25, 2015. http://miriamposner.com/blog/humanities-data-a-necessary-contradiction
- Gallinger, M. and Daniel Chudnov "Library of Congress Lab: Library of Congress Digital Scholars Lab Pilot Project Report."
- Montfort, Chp. 3 "Double, Double"
- Allardice, Simon. “Foundations of Programming: Fundamentals, parts 1-3; part 5, just "part 5, Breaking your code apart"; and part 14, just “Python” and “Libraries and frameworks”. http://www.lynda.com/JavaScript-tutorials/Foundations-of-Programming-Fundamentals/83603-2.html [To access Lynda.com. follow links below, click “Log in,” then “Organizational Login,” and enter your UT EID and password.]
- Introna, L. D. “The Enframing of Code: Agency, Originality and the Plagiarist.” Theory, Culture & Society 28, no. 6 (November 1, 2011): 113–41.
- Montfort “Why Program?” (p.267–77)
- Vee, Annette. "Sociomaterialities of Programming and Writing." Coding Literacy: How Computer Programming Is Changing Writing. The MIT Press, 2017.
- Shieber, Stuart M., Programming for Humanists pages 1–4, 2014. [http://blogs.harvard.edu/programmingforhumanists/files/2014/12/proghum.pdf]
- Montfort chp 4. "Programming Fundamentals"; Montfort, chp. 5 "Standard Starting Points";
- Zhuang, Atima Han, Ishita Vedvyas, and Rishikesh Dole. “Tutorial: OpenRefine,” 2013. http://casci.umd.edu/wp-content/uploads/2013/12/OpenRefine-tutorial-v1.5.pdf
- Borgman, chp 2 "What are Data?"
- Krumme, Coco. “What Data Doesn’t Do.” In Beautiful Data: The Stories behind Elegant Data Solutions, edited by Toby Segaran and Jeff Hammerbacher, 1st ed. Beijing ; Sebastopol, CA: O’Reilly, 2009.
- Rosenberg, "Data Before the Fact." In Gitelman, Lisa "Raw Data" is an Oxymoron. Cambridge: MIT Press, 2013.
- Borges, Luis. "The Analytical Language of John Wilkins."
- Nunberg, Geoffrey. "Google's Book Search: A Disaster for Scholars." Chronicle of Higher Education. 31 Aug. 2009.
- Fortune, Stephen. “A Brief History of Databases.” Avant, February 27th 2014. https://web.archive.org/web/20150220031213/http://avant.org/media/history-of-databases
- Pechenick, Eitan Adam, et al. “Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution.” PLOS ONE, vol. 10, no. 10, Oct. 2015, p. e0137041.
- Zhang, Sarah. “The Pitfalls Of Using Google Ngram To Study Language.” Wired. 12 October 2015.
- OpenRefine
- Montfort chp. 6 "Text I" (some hints on possible errors)
- “HTML Introduction” and “HTML5 Introduction”, W3Schools.
- Borgman chp. 3 "Data Scholarship"
- Liu. “Drafts for Against the Cultural Singularity” Alan Liu. 2 May 2016.
- Read in order:
- Winner, Langdon. “Do Artifacts Have Politics?” Daedalus 109, no. 1 (1980): 121–36.
- Joerges, B. “Do Politics Have Artefacts?” Social Studies of Science 29, no. 3 (June 1, 1999): 411–31.
- Sacasas, Michael. “Do Artifacts Have Ethics?” The Frailest Thing, November 29, 2014. http://thefrailestthing.com/2014/11/29/do-artifacts-have-ethics
REQUIRED Discussion post, 4 points
In class presentations of Data Set Reviews
- Downloading with Wget
- Fetching and Parsing Data from the Web with OpenRefine
- Montfort chp. 7 "Text II" (some hints on possible errors)
- Borgman chp. 4 "Data Diversity"
- Hitchcock, Tim. “Digital Searching and the Re-formulation of Historical Knowledge” 2008. In The Virtual Representation of the Past, edited by Mark Greenglass and Lorna Hughes, 81-90. Ashgate: 2008.
- Piper, A. Think Small: On Literary Modeling. PMLA, Volume 132, Number 3, May 2017, pp. 651–658.
- Pound, Scott. “Kenneth Goldsmith and the Poetics of Information.” PMLA, vol. 130, no. 2, Mar. 2015, pp. 315–30.
- Neff, Gina, Tanweer, Anissa, Fiore-Gartland, Brittany, Osburn, Laura Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science. Big Data 5, no. 2, 2017.
- Oualline, Steve. “The End of Line Puzzle.” The Practical Programmer.
- Getting Files
- Revisiting the basics
- Montfort chp. 8 "Image I"
- Albon, Chris. “String Operations.” http://chrisalbon.com/python/string_operations.html
- Christen, Kim. “Does Information Really Want to be Free? Indigenous Knowledge Systems and the Question of Openness.” International Journal of Communication 6 (2012), 2870–2893.
- Day, Ronald E. “Governing Expression: Social Big Data and Neoliberalism.” In Indexing It All: The Subject in the Age of Documentation, Information, and Data, 123–44. History and Foundations of Information Science. Cambridge, Massachusetts: The MIT Press, 2014.
- Pomerantz, Jeffrey. “The Future of Metadata.” In Metadata. The MIT Press Essential Knowledge Series. Cambridge, MA ; London, England: The MIT Press, 2015.
- Peters, Justin. The Idealist: Aaron Swartz and the Rise of Free Culture on the Internet, Chapters 7 and 8. New York: Scribner, 2016.
- O’Sullivan, Michael. “Aaron Swartz, New Technologies, and the Myth of Open Access.” In Academic Barbarism, Universities and Inequality. Palgrave Critical University Studies. Houndmills, Basingstoke, Hampshire ; New York, NY: Palgrave Macmillan, 2016.
- CSV Input/Output in Python
- Kreiss, D., M. Finn, and F. Turner. “The Limits of Peer Production: Some Reminders from Max Weber for the Network Society.” New Media & Society 13, no. 2 (March 1, 2011): 243–59.
- Swartz, Aaron. “Building a Platform: Providing APIs.” In Aaron Swartz’s ‘A Programmable Web’: An Unfinished Work, 31–39. San Rafael, CA: Morgan & Claypool Publishers, 2013.
- van Hooland, Seth, and Ruben Verborgh. “Modelling.” In Linked Data for Libraries, Archives and Museums: How to Clean, Link and Publish Your Metadata, 11–70. Chicago: Neal-Schuman, 2014.
- Kelly, Chelsea Emelie. “Beyond Digital: Open Collections & Cultural Institutions,” 2014.
- Manzo, Christina, Geoff Kaufman, Sukdith Punjasthitkul, and Mary Flanagan. “‘By the People, For the People’: Assessing the Value of Crowdsourced, User-Generated Metadata.” Digital Humanities Quarterly 9, no. 1 (2015). http://www.digitalhumanities.org/dhq/vol/9/1/000204/000204.html
- Veltman, Noah. Web APIs for non-programmers. November 18, 2013. School of Data.
Proposal due Friday, March 23 at 11:59pm; Peer reviews due by class March 26 at 3pm
- Using the Google Books REST API
- New York Times article scrape
- Scraping and Parsing XML
- Fetching and Parsing Data from the Web with OpenRefine, APIs
- Montfort chp. 10 "Text III"
- Final project directory: Booth chp. 4 "From Questions to a Problem"
- Conley, Tara L. "Decoding Black Feminist Hashtags as Becoming" The Black Scholar Vol. 47 , Iss. 3, 2017.
- Klein, Lauren. "Distant Reading after Moretti". MLA Conference, January 2018.
- Ramsay, Stephen. “Chapter 1: An Algorithmic Criticism.” In Reading Machines: Toward an Algorithmic Criticism, 1–17. Topics in the Digital Humanities. Urbana: University of Illinois Press, 2011.
- Underwood, T. "Theorizing Research Practices We Forgot to Theorize Twenty Years Ago". Representations, Vol. 127 No. 1, Summer 2014; (pp. 64-72).
- Risam, R. "Beyond the Margins: Intersectionality and the Digital Humanities" Digital Humanities Quarterly, Vol. 9 No. 2, 2015.
- Presner, Todd. "Critical Theory and the Mangle of Digital Humanities"[draft version; 2012)
- Unsupervised learning: Latent Dirichlet allocation (LDA) topic modeling
- Supervised learning: Naive Bayes classification
- Fetching and Parsing Data from the Web with OpenRefine, Advanced APIs
- “Working With Text Data.” scikit-learn. http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
- Khan, Khairullah, Baharudin, Baharum, Lam Hong Lee. “A Review of Machine Learning Algorithms for Text-Documents Classification.” Journal of Advances in Information Technology 1, no. 1 (February 1, 2010).
- Wolfram, S. Machine Learning for Middle Schoolers. Stephen Wolfram Blog. 11 May 2017. http://blog.stephenwolfram.com/2017/05/machine-learning-for-middle-schoolers/#comments
- Griffiths, Devin. "The Comparative Method and the History of the Modern Humanities" History of Humanities, Volume 2, Number 2, 2017.
- Schmidt, B. "Do Digital Humanists Need to Understand Algorithms?".
- Seaver, Nick "Algorithms as culture: Some tactics for the ethnography of algorithmic systems" Big Data and Society. 9 Nov. 2017
- Underwood, Ted. "A Genealogy of Distant Reading". Digital Humanities Quarterly Volume 11, Number 2, 2017.
- Borgman chp. 7
- Berendt, Bettina, Preibusch, Soren. Toward Accountable Discrimination-Aware Data Mining: The Importance of Keeping the Human in the Loop—and Under the Looking Glass. Big Data. Volume 5, Number 2, 2017.
- Moretti, F. "Conjectures in World Literature" New Left Review 1, January-February 2000.
- Unsupervised learning with K-Means Clustering
- Supervised learning with multiple classifiers: Naive Bayes, k-nearest neighbor, Logistic Regression, Support Vector Machine (SVM), Multi-layer perceptron classifier
- Montfort chp. 11 “Statistics and Visualization.”
- Brew, Chris. “Language Processing: Statistical Methods.” In Encyclopedia of Language & Linguistics, edited by Keith Brown, 2nd ed., 12:597–604. Elsevier, 2006.
- Gries, Stefan. “Useful statistics for corpus linguistics.”.
- McCandles, David. Information is Beautiful.
- Norvig, Peter. “Natural Language Corpus Data.” In Beautiful Data: The Stories Behind Elegant Data Solutions, edited by Toby Segaran and Jeff Hammerbacher, 1st ed. Beijing ; Sebastopol, CA: O’Reilly, 2009.
- Burrows, John. “Textual Analysis.” In Companion to Digital Humanities, edited by Susan Schreibman, Ray Siemens, and John Unsworth.
- Catherine D’Ignazio and Lauren F. Klein, “Feminist Data Visualization” IEEE VIS Conference, Baltimore, October, 23-28, 2016. 7, 2016
- Lev Manovich. "Cultural Data Possibilities and limitations of the digital data universe". Oliver Grau, ed., with Wendy Coones and Viola Rühse, Museum and Archive on the Move. Changing Cultural Institutions in the Digital Era (Berlin, Boston: De Gruyter, 2017), 259-276.
- Stack, John. "Exploring museum collections online: Some background reading'. Science Museum Group Digital Lab January 23, 2018.
- Thompson, Clive. “The Surprising History of the Infographic.”
- Moretti, Franco. “Graphs.” In Graphs, Maps, Trees: Abstract Models for Literary History, 3–33. London ; New York: Verso, 2007.
- Ramsay, Stephen. “Chapter 3: Potential Readings.” In Reading Machines: Toward an Algorithmic Criticism, 33–57. Topics in the Digital Humanities. Urbana: University of Illinois Press, 2011.
- Matplotlib
- Simple Viz for Sentiment Analysis
- Much more Viz
- Tableau
- Kazil, Jacqueline, and Katharine Jarmul. “PDFs and Problem Solving in Python.” In Data Wrangling with Python: Tips and Tools to Make Your Life Easier, 91–126. O’Reilly, 2016.
- Albon, Chris. “Parse JSON File.” http://chrisalbon.com/python/json_parse_file.html
- Lundh, Fredrik. “Elements and Element Trees.” http://effbot.org/zone/element.htm [Python XML tutorial]
- Beazley, David, and Brian K. Jones. “Chapter 6: Data Encoding and Processing.” In Python Cookbook: recipes for Mastering Python 3, 3. ed., 175–216. Bejing: O’Reilly, 2013.
- Brown and Mandell, "The Identity Issue: An Introduction." Journal of Cultural Analytics 13 February 2018.
- Hammond, Adam. "The double bind of validation: distant reading and the digital humanities' 'trough of disillusionment." Literature Compass 14, no. 8 (August 1, 2017): no. pg.
- Witmore, Michael. 2016. “Latour, the Digital Humanities, and the Divided Kingdom of Knowledge.” New Literary History 47 (2): 353–75.
- Scroll through these examples of neural networks inventing things based on example such as insect names, thesis titles, guinea pig names, and pie
- “Transparency” through end-user parameter modification or Change your algorithm for what Spotify picks for you
- Clement, T. and McLaughlin, S. “Measured Applause: Toward a Cultural Analysis of Audio Collections.” Cultural Analytics, vol. 1, no. 1, 2016.
- Liu, Alan. “The Meaning of the Digital Humanities.” PMLA 128, no. 2 (March 2013): 409–23.
5/7: Final Project due
- Jeroen Janssens Seven Command Line Tools for Data Science (2013) workbench.
- Karsdorp, Folgert. Python Programming for the Humanities
- Marini, Joe. “Up and Running with Python.” Lynda.com.
- Shieber, Stuart M., Programming for Humanists pages 1–4, 2014. [http://blogs.harvard.edu/programmingforhumanists/files/2014/12/proghum.pdf]
- Williamson, Evan Peter. Fetching and Parsing Data from the Web with OpenRefine
- Code of Best Practices in Fair Use for Academic and Research Libraries*. Association of Research Libraries, 2012*. http://www.arl.org/storage/documents/publications/code-of-best-practices-fair-use.pdf
- “The Digital Public Library of America Policy Statement on Metadata,” 2013. http://dp.la/info/wp-content/uploads/2013/04/DPLAMetadataPolicy.pdf
- “Creative Commons: About the Licenses.” https://creativecommons.org/licenses/
- DRM article: http://infojustice.org/wp-content/uploads/2015/03/band03102015.pdf
- Juola, P. and Ramsay, S. Six Septembers: Mathematics for the Humanist. Zea E-Books.
- American Civil Liberties Union. "First Amendment Lawsuit Brought on Behalf of Academic Researchers and Journalists Who Fear Prosecution Under the Computer Fraud and Abuse Act."
- Sanger, David E., and Eric Schmitt. “Snowden Used Low-Cost Tool to Best N.S.A.” The New York Times. February 8, 2014. http://www.nytimes.com/2014/02/09/us/snowden-used-low-cost-tool-to-best-nsa.html
- Sims, Nancy. “Library Licensing and Criminal Law: The Aaron Swartz Case.” College & Research Libraries News 72, no. 9 (2011): 534–37.
- Python tutorials at Programiz.