-
Notifications
You must be signed in to change notification settings - Fork 18
Issues: ScaleUnlimited/flink-crawler
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Include UrlDBFunction's domain scores in saved state
bug
in progress
actively being worked on
#163
opened Sep 5, 2018 by
Schmed
Sync up the total active urls value with state in UrlDbFunction
enhancement
#162
opened Aug 16, 2018 by
vmagotra
CrawlTopologyTest.testAsync now fails for me
in progress
actively being worked on
task
#151
opened May 9, 2018 by
Schmed
Try using stream harness support for unit testing
in progress
actively being worked on
#133
opened Apr 11, 2018 by
Schmed
Restore pre-fetching state during restore of UrlDBFunction state
bug
#121
opened Mar 23, 2018 by
kkrugler
Support checkpointing of SeedUrlSource
in progress
actively being worked on
#108
opened Mar 16, 2018 by
kkrugler
Auto-retry https URLs that fail with IOException using http protocol
enhancement
#83
opened Dec 21, 2017 by
kkrugler
CommonCrawlFetcher should limit bytes requested to max content size
cleanup
#77
opened Nov 10, 2017 by
kkrugler
Fix parsing errors during processing of Common Crawl data
cleanup
#73
opened Nov 9, 2017 by
kkrugler
Set refetch time for blocked URLs based on robots.txt reload/retry time
enhancement
#53
opened Oct 15, 2017 by
kkrugler
Use CommonCrawl robots.txt data if we're in common crawl mode
enhancement
#44
opened Sep 20, 2017 by
kkrugler
Use one FetchFunction and one ParseFunction for all types of URLs
cleanup
#34
opened Apr 19, 2017 by
kkrugler
Calculate a DomainRank as part of the crawling process
enhancement
#25
opened Feb 2, 2017 by
kkrugler
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.