Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User lookup #30

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

User lookup #30

wants to merge 2 commits into from

Conversation

JanaLasser
Copy link

Added a new function "infer_ids" and corresponding helper functions that uses the GET /users/loopup endpoint to request information for up to 100 user IDs at the same time.

@@ -265,3 +353,54 @@ def process_twitter(self, data):
"output": pred[id]
}
return output


def process_twitter_batch(self, data_list, batch_size, num_workers):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems largely overlaps with the non-batched version. Could you call the non-batched version and then aggregate the result?

for id in id_list:
# If a json file exists, we'll use that. Otherwise go get the data.
try:
with open("{}/{}.json".format(self.cache_dir, id), "r") as fh:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.path.join might be better?

if len(id_list) > 0:
# the twitter API handles a maximum of 100 user IDs per request. Chunk up the user
# id list into batches of max 100 IDs and run them through the pipeline sequentially
API_batch_size = 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be defined as a global variable for easy tracking?

@@ -193,6 +193,34 @@ def _twitter_api(self,id=None,screen_name=None):
return self.process_twitter(r.json())


def _twitter_api_lookup(self, ids=None, batch_size=16, num_workers=4):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably could reuse/extend the twitter_api function by extending id to ids?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants