Unit 3: Running Spiders in the Cloud

This unit describes how to deploy scrapy spiders to Scrapy Cloud and how to leverage from this platform.

Topics

Introduction to Scrapy Cloud
Deploying spiders to Scrapy Cloud
Controlling spiders via command line
UI walkthrough

Check out the slides for this unit

Sample Spiders

A simple project to demonstrate deploy: p1_first_deploy
A project to deploy with dependencies: p2_dependencies
A project to deploy with Python Scripts: p3_scripts

Hands-on

1. Deploy the books crawler

Deploy the crawler for books.toscrape.com built in unit 2 to Scrapy Cloud.

a. Run the spider without touching any settings b. Run the spider, but now with DOWNLOAD_DELAY = 1 set via web UI

Check out the project once you're done.

2. Reddit Ranker

Create a crawler to fetch the 100 hottest submissions from reddit.com/r/programming (to run on Scrapy Cloud).

After that, create a CLI app to fetch the scraped data from Scrapy Cloud and list the top 10 submissions from the latest crawl, based on the score below:

new_score = S * C * K

    S → current score on reddit
    C → number of comments
    K → original poster's comments karma

Check out the project once you're done.

References

Scrapy Tutorial
Dependencies in Scrapy Cloud Projects
Running custom Python scripts in Scrapy Cloud
Blog post: How to run Python in Scrapy Cloud

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Unit 3: Running Spiders in the Cloud

Topics

Sample Spiders

Hands-on

1. Deploy the books crawler

2. Reddit Ranker

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Unit 3: Running Spiders in the Cloud

Topics

Sample Spiders

Hands-on

1. Deploy the books crawler

2. Reddit Ranker

References