- CQPweb is a Web-based user interface to the IMS Corpus Workbench
- we will use the local server of the CCL group at https://corpora.linguistik.uni-erlangen.de/cqpweb/
-
goal: understand Donald Trump's terminology, rhetoric and phraseology (in case he comes back …)
-
take a look at a corpus of Trump's tweets from 2009 – Jan 2021 (when he was finally banned from Twitter) to illustrate corpus linguistic research
-
simplest use: search
make america great again
, then explain kwic and context displays -
step 1: select a suitable subcorpus (only original tweets, no retweets etc.)
-
step 2: lemma frequency list for selection of relevant terms ➞ not very interesting
- option: look at hashtags (prefix
#
) - option: use POS-disambiguated lemmatisation (not available for all corpora) to filter by part of speech (suffix
_N
,_Z
,_J
,_V
) - results are still very general high-frequency words and often not particularly characteristic
- option: look at hashtags (prefix
-
step 3: keyword analysis = frequency comparison against reference corpus (➞ English tweets)
- use default settings, but change keyness measure to Log-Likelihood (or Log Ratio (conservative estimate)) and show positive keywords only (too many negative ones!)
- compare tabular view with visualisation options, click on thank to display concordance
- focus on salient fake and very frequent great
-
step 4: click concordance for great, randomised, context view
- sort + frequency breakdown on 1R ➞ used quite generally with different nouns
-
step 5: click on concordance for fake
- suspicion that it's mostly fake news confirmed by frequency breakdown on 1R (fake news 75%)
- conclusion: fake news as a salient unit of meaning
-
step 6: query
fake news
(subcorpus: Originals)- quick look at concordance ➞ more than 900 hits
- still need quantitative analysis to get overview
-
step 7: distribution analysis for fake news (esp. distribution across years is interesting)
-
step 8: collocation analysis for fake news reveals usage and phraseology