Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early suite exit #7

Open
nrabinowitz opened this issue Dec 15, 2011 · 1 comment
Open

Early suite exit #7

nrabinowitz opened this issue Dec 15, 2011 · 1 comment

Comments

@nrabinowitz
Copy link
Owner

Use case:

Let's call http://www.example.com/ as "root". "root" contains links to root.1, root.2, root.3...root.250 (see hermitageart.com...an actual example with 260 links!!!). Each of these 250 links contain links to other pages. If my feature of interest was found only in root.3 and root.102, then ideally I would have liked root.4, root.5,....root.250 to not be accessed, i.e. page.open should not be called on them.

I think this would need to be addressed by setting a flag (maybe on the _pjs.state object?) to end the suite early, which could be checked in the page completion callback, emptying out the array of still-to-scrape pages. Question: this only affects the current level of recursion. Is that good? Do we need an early exit from the entire suite?

@nrabinowitz
Copy link
Owner Author

Better option here:

  • add a complete callback option, in the PhantomJS scope, with page as an argument
  • add page.manager as a pointer to the SuiteManager
  • add SuiteManager.endSuite() and SuiteManager.endAllSuites() to control flow.

It's relatively simple to end the current suite (set its urls array to []) and to end all suites (that, plus setting suites = []). Killing the ancestor of the current suite in a recursive situation might be more difficult - it's worth thinking about whether I'd want/need an actual tree structure to manage the suites if I wanted more fine-grained control.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant