Skip to content

How to configure the pyCharm Debugger for Scrapy

Criamos edited this page Apr 5, 2022 · 4 revisions

Setting up pyCharm's debugger for Scrapy

  1. Run your currently selected crawler via pyCharm (Shift + F10) to have a baseline of settings that you can modify

  2. Open the configuration via Main Menu -> Run -> Edit Configurations scrapy_pycharm_debugger_1

  3. Click on Script Path and select Module name from the drop-down menu

    1. Module name: scrapy.cmdline
    2. Parameters: runspider <crawler_name.py> or crawl <spider_name>
      1. for example: runspider serlo_spider.py or crawl serlo_spider
  4. Your Working directory should be set to the /spiders/-folder

    1. for example: /home/<your_username>/PycharmProjects/oeh-search-etl/converter/spiders
  5. You should now be able to use pyCharm's debugger by pressing Shift + F9


After you're done, your debugger configuration should roughly look like this:

scrapy_pycharm_debugger_3

additional configurations

You can customize your debug-run with additional CLI parameters / options, e.g. by dumping the collected scrapy.Items into a .json-file or saving a new logfile with each run of your debugger.

Saving a JSON output of each scraped item

If you want to see a JSON dump of all scraped items, you can use the --output-option in your parameters, e.g.:

scrapy crawl <spider_name> -O "spidername.json"

Reminder:

  • -O overwrites the .json-file with each new crawl
  • -o appends

Saving a logfile

By default, scrapy appends the current run into the scrapy.log-file in the project root folder. If you wanted to customize the filename or where it's saved, you could use

scrapy crawl <spider_name> --logfile "./logs/filename.log"

in the debug configuration parameters.

The gist of it (tldr):

If you just want to use both options at the same time, here's a short TLDR version that you can copy-paste and customize to your needs. You can either use the "runspider" or "crawl" command:

runspider <spider_name.py> -O "spider_name.json" --logfile "spider_name.log"

or:

crawl <spider_name> -O "spider_name.json" --logfile "spider_name.log"

scrapy_pycharm_debugger_5