This unit describes how to deploy scrapy spiders to Scrapy Cloud and how to leverage from this platform.
- Introduction to Scrapy Cloud
- Deploying spiders to Scrapy Cloud
- Controlling spiders via command line
- UI walkthrough
Check out the slides for this unit
- A simple project to demonstrate deploy:
p1_first_deploy
- A project to deploy with dependencies:
p2_dependencies
- A project to deploy with Python Scripts:
p3_scripts
Deploy the crawler for books.toscrape.com built in unit 2 to Scrapy Cloud.
a. Run the spider without touching any settings
b. Run the spider, but now with DOWNLOAD_DELAY = 1
set via web UI
Check out the project once you're done.
Create a crawler to fetch the 100 hottest submissions from reddit.com/r/programming (to run on Scrapy Cloud).
After that, create a CLI app to fetch the scraped data from Scrapy Cloud and list the top 10 submissions from the latest crawl, based on the score below:
new_score = S * C * K
S → current score on reddit
C → number of comments
K → original poster's comments karma