Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to scrape the stories themselves? #2

Open
depthfirst opened this issue Nov 22, 2015 · 8 comments
Open

Is there a way to scrape the stories themselves? #2

depthfirst opened this issue Nov 22, 2015 · 8 comments
Assignees

Comments

@depthfirst
Copy link
Collaborator

Working on topic modeling with LDA, and I'd like to try it on a sample of the stories, not just the summaries. Do you have code to do that, or can you write a function to return the text of story given ?

@MrTyton
Copy link
Owner

MrTyton commented Nov 22, 2015

I don't have code to do that. I'll see if I can write something up for it
for tomorrow; otherwise you could try using
http://www.mobileread.com/forums/showthread.php?t=259221

On Sat, Nov 21, 2015 at 9:30 PM John Blackmore [email protected]
wrote:

Working on topic modeling with LDA, and I'd like to try it on a sample of
the stories, not just the summaries. Do you have code to do that, or can
you write a function to return the text of story given ?


Reply to this email directly or view it on GitHub
#2.

@MrTyton
Copy link
Owner

MrTyton commented Nov 25, 2015

pip install FanFicFare

On Tue, Nov 24, 2015 at 11:03 PM John Blackmore [email protected]
wrote:

Assigned #2 #2 to @MrTyton
https://github.com/MrTyton.


Reply to this email directly or view it on GitHub
#2 (comment).

@depthfirst
Copy link
Collaborator Author

Can you give me a little more, like the function I'm asking for? I just thought this would be a lot easier for you since you scraped everything else.

@MrTyton
Copy link
Owner

MrTyton commented Nov 25, 2015

Sorry the meds that I'm taking are fucking me up some. That plugin does all
the nice epub formatting and stuff and scrapes it all. Give me half an
hour, I'll have a basic thing, but it'll still have the formatting tags
within the story.

On Wed, Nov 25, 2015 at 7:29 AM John Blackmore [email protected]
wrote:

Can you give me a little more, like the function I'm asking for? I just
thought this would be a lot easier for you since you scraped everything
else.


Reply to this email directly or view it on GitHub
#2 (comment).

@MrTyton
Copy link
Owner

MrTyton commented Nov 25, 2015

Done. Have to do it from the story class, do you want me to rewrite the
init function so that you can just make it from the sql row instead of just
getting it from the initial scrape?

On Wed, Nov 25, 2015 at 7:36 AM Joshua Gang [email protected] wrote:

Sorry the meds that I'm taking are fucking me up some. That plugin does
all the nice epub formatting and stuff and scrapes it all. Give me half an
hour, I'll have a basic thing, but it'll still have the formatting tags
within the story.

On Wed, Nov 25, 2015 at 7:29 AM John Blackmore [email protected]
wrote:

Can you give me a little more, like the function I'm asking for? I just
thought this would be a lot easier for you since you scraped everything
else.


Reply to this email directly or view it on GitHub
#2 (comment).

@depthfirst
Copy link
Collaborator Author

Does the init take a bit of time? If so, then yes, that'd be great. Thanks.

@MrTyton
Copy link
Owner

MrTyton commented Nov 25, 2015

Init doesn't take time, but right now it literally only works when you give
it an XML document. Hold on...

On Wed, Nov 25, 2015 at 8:14 AM John Blackmore [email protected]
wrote:

Does the init take a bit of time? If so, then yes, that'd be great.
Thanks.


Reply to this email directly or view it on GitHub
#2 (comment).

@MrTyton
Copy link
Owner

MrTyton commented Nov 25, 2015

Done, works with the sql row from stories, just make sure that you expand
it [Story(*row)]

On Wed, Nov 25, 2015 at 8:16 AM Joshua Gang [email protected] wrote:

Init doesn't take time, but right now it literally only works when you
give it an XML document. Hold on...

On Wed, Nov 25, 2015 at 8:14 AM John Blackmore [email protected]
wrote:

Does the init take a bit of time? If so, then yes, that'd be great.
Thanks.


Reply to this email directly or view it on GitHub
#2 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants