politicaltweets: Classify political tweets

The politicaltweets R package provides functions to preprocess and classify tweets data according to whether or not they are political based on a pre-trained ensemble classifier.

Installation

remotes::install_github("haukelicht/politicaltweets")

Note that all but one package dependencies are distributed via CRAN. The one exeption is the laserize package, which can be installed from GitHub.

Usage

To classify a tweet, five steps are required

Query the tweet data from the Twitter API using rtweet's lookup_statuses() (with parse = TRUE).
pass the parsed tweets data to argument x of create_tweet_features() to create data frame of tweet features
pass the parsed tweets data to argument x of create_tweet_text_representations() with .compute.pcs = FALSE to create obtain tweet text embeding representations¹
combine the tweet features and text representation objects in a data frame.
pass the resulting data frame to argument x of classify_tweets()

A minimal workin example:

library(dplyr)
library(politicaltweets)

# instead of querying data from the Tweet API (step 1)
# below we use a prototypical tweets data frame
glimpse(tweets.df.prototype)

# step 2
tfeats <- create_tweet_features(tweets.df.prototype, .as.data.table = FALSE)

# step 3
ttreps <- create_tweet_text_representations(tweets.df.prototype, .compute.pcs = FALSE)

# step 4
temp <- as_tibble(tfeats) %>%
  left_join(mutate(as_tibble(ttreps$ics), status_id = rownames(ttreps$ics)))
  
# step 5
preds <- classify_tweets(temp, .debug = TRUE) 

# inspect the result
cbind(temp[, c("text", "lang")], preds)

Details

Required format of `x`

All functions exported by politicaltweets expect that data passed to their arguments x conforms the naming and typing conventions of tweets data frames set by the rtweet package.

A prototypical tweets data frame is distributed with the politicaltweets package, see ?tweets.df.prototype. (Moreover, politicaltweets::required.tweets.df.cols maps required columns to the accepted classes.)

Using `classify_tweets()` with a pre-trained ensemble classifier

classify_tweets() can handle two types of model input:

By default, classify_tweets() uses a list of four pre-trained models (see ?constituent.modles for details) "blends" them into an ensemble classifier using blend.by = "PR-AUC" (maximize the area under the precision-recall curve).

More generally, classify_tweets() can handle two types of model inputs:

Lists of pre-trained base learner models: if the input to argument model is a 'caretList' object (i.e., a list of pre-trained base learners). In this case, the base learners are first "blended" into a greedy ensemble classifier, and the resulting ensemble model is then used to classify samples in x.
Pre-trained ensemble classifiers: If the input to argument model is a 'caretEnsemble' object, this ensemble model is directly used to classify samples in x.

Thus you can train o

It obtains tweet text LASER embedding representations using the laserize package and projects tweets LASER representations onto a pre-defined independent component space. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
R		R
data-raw		data-raw
data		data
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitattributes		.gitattributes
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md
politicaltweets.Rproj		politicaltweets.Rproj
renv.lock		renv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

politicaltweets: Classify political tweets

Installation

Usage

Details

Required format of `x`

Using `classify_tweets()` with a pre-trained ensemble classifier

About

Releases 1

Packages

Languages

haukelicht/politicaltweets

Folders and files

Latest commit

History

Repository files navigation

politicaltweets: Classify political tweets

Installation

Usage

Details

Required format of x

Using classify_tweets() with a pre-trained ensemble classifier

Footnotes

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Required format of `x`

Using `classify_tweets()` with a pre-trained ensemble classifier

Packages