Name		Name	Last commit message	Last commit date
parent directory ..
spiders		spiders
README.md		README.md

README.md

Unit 2: Navigating websites with Scrapy

This unit build upon the previous one and covers how to crawl websites with Scrapy. Crawling a website means basically following the links found in the pages, so that the spider visits all the pages it needs.

Topics

Link crawling
Crawling settings

Check out the slides for this unit

Sample Spiders

Spider that follows pagination links to scrape quotes.toscrape.com: spider_1_quotes_pagination.py
Spider that extracts authors' data from details pages in quotes.toscrape.com: spider_2_authors_details.py
Spider that extracts the quotes alongside authors's information from quotes.toscrape.com: spider_3_quotes_authors.py

Hands-on

1. Books Crawler

Build a spider to extract title, price (float) and stock from all the 1000 books available in books.toscrape.com.

Check out the spider once you're done.

2. Blog.scrapinghub.com Crawler

Build a spider that extracts the following data from blog.scrapinghub.com posts: post title, URL, author name, date and post tags. Your spider should not extract posts that have no tags associated.

Check out the spider once you're done.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unit2

unit2

README.md

Unit 2: Navigating websites with Scrapy

Topics

Sample Spiders

Hands-on

1. Books Crawler

2. Blog.scrapinghub.com Crawler

References

Files

unit2

Directory actions

More options

Directory actions

More options

Latest commit

History

unit2

Folders and files

parent directory

README.md

Unit 2: Navigating websites with Scrapy

Topics

Sample Spiders

Hands-on

1. Books Crawler

2. Blog.scrapinghub.com Crawler

References