Skip to content

Latest commit

 

History

History
35 lines (22 loc) · 1.3 KB

README.md

File metadata and controls

35 lines (22 loc) · 1.3 KB

That's cool. Computational sociolinguistic methods for investigating individual lexico-grammatical variation

Script for data retrieval and processing

Hans-Jörg Schmid (1),
Quirin Würschinger (1),
Sebastian Fischer (2),
Helmut Küchenhoff (2)

(1) Department of English and American Studies, LMU Munich, Germany
(2) Department of Statistics, LMU Munich, Germany

Functionality

  • This notebook parses the XML version of BNC2014,
  • calculates total counts for texts, speakers and words in the corpus,
  • performs queries for the target pattern that's ADJ and stores all hits,
  • merges hits with semantic category descriptions from the USAS tagset,
  • merges hits with metadata for speakers and conversations from the spreadsheets provided by BNC2014.

Contents

  • The code is provided as a notebook with comments in IndVarBNC.ipynb.
  • Exported versions of the notebook for viewing can be found in IndVarBNC.html and IndVarBNC.pdf.
  • Output files are stored in the directory out/.

Correspondence

If you want to adapt and use the script just contact us via email.