-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added files to create presidential_speech data #40
base: develop
Are you sure you want to change the base?
Conversation
We don't need the |
That sounds fine. I'll drop the pyc and raw speechs, and resubmit. |
We also don't need the |
wrd.var <- apply(dtm.mat.log,2,var) | ||
top.wrd.var <- names(sort(wrd.var,decreasing = TRUE)[1:75]) | ||
dtm.mat.log <- dtm.mat.log[,colnames(dtm.mat.log) %in% top.wrd.var] | ||
saveRDS(dtm.mat.log,"presidential_speech.rds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing line ending
library(SnowballC) | ||
library(parallel) | ||
library(Matrix) | ||
library(tidyverse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all of these dependencies used? I only see tm
stringr
and tidyverse
below.
@@ -0,0 +1,11 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file appears to be identical to move_inaug_results.sh
. Is that intentional?
class InaugTextSpider(scrapy.Spider): | ||
name = "inaug_text" | ||
allowed_domains = ["http://www.presidency.ucsb.edu/"] | ||
start_urls = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a note explaining what these URLs are and how to keep this list up to date (if possible)
name = "sou_text" | ||
allowed_domains = ["http://www.presidency.ucsb.edu"] | ||
start_urls = ( | ||
'http://www.presidency.ucsb.edu/ws/index.php?pid=123408', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explanatory note needed.
@jjn13 Any updates on this PR? |
Created raw-data directory and with code for creating presidential_speech dataset.