destatis_crawler

Just a simple crawler to get a full index of destatis (Federal Statistical Office of Germany) statistics to use for further processing

This is just a basic crawler to get a full list of all statistics of the Destatis Genesis database. You can use the output for further processing.

##Requirements To get this work you just need to install the scrapy framework via pip (pip install scrapy), your favorite package manager or directly via the Scrapy GitHub repository

##How to To run the crawler you just need to enter the command:

scrapy crawl destatis

To get a csv output enter the command:

scrapy crawl destatis -o output.csv

For json output:

scrapy crawl destatis -o output.json

##Handler With the scrapy_handler class inside scrapy_handler.py you can call the scrapy spider and receive a dict of the links (either the full list of all stats or the direct download link) which you can use for further processing, without the need to read the output from another file.

call get_list() to receive a list of all stats and download_csv(url) with the URL of a statistic overview page as argument to receive the direct download link.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
destatis_scrapy		destatis_scrapy
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg
scrapy_handler.py		scrapy_handler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

destatis_crawler

About

Releases

Packages

Languages

qwertzlbert/destatis_crawler

Folders and files

Latest commit

History

Repository files navigation

destatis_crawler

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages