GitHub - DesiSanou/data-scraping: scrape e-commerce site products information

E-commerce Website Scraper

This scraper is used to get multiple information on products.
Parameters are defined in a yaml file (data_config.yml). You can run the process via scrapy command line. This implementation is based on scrapy. For more information on scrapy, see https://docs.scrapy.org/en/latest/index.html

information:

Url of the product
Title
Brand
Price
Rating
Number of ratings
no discount price

Note : An additional field was added called no_discount_price

Tested on would scrap manomano.fr to study its catalog drill section

It can be adapted for other information or other section but not tested yet. The scrapped data is saved in folder called data

Python version used: 3.7.5

To run the code: Assuming you are in the project root folder.

> pip3 install -r requirements.txt # install scrapy and pyaml
> cd mano_scraper/mano_scraper/spiders
> scrapy crawl products # run the spider 'products'

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
mano_scraper		mano_scraper
tests		tests
.gitignore		.gitignore
README.md		README.md
data_config.yml		data_config.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

E-commerce Website Scraper

Note : An additional field was added called no_discount_price

About

Releases

Packages

Languages

DesiSanou/data-scraping

Folders and files

Latest commit

History

Repository files navigation

E-commerce Website Scraper

Note : An additional field was added called no_discount_price

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages