The sample demonstrates how to crawl website to find out 404 pages in Python.
pip install beautifulsoup4
-
Run
404crawler.py
with the target page, the depth of crawling and link filter:For example, if you want to crawl the website
https://www.dynamsoft.com
with the depth of 1, you can run the following command:python 404crawler.py -l https://www.dynamsoft.com -d 1 -f dynamsoft.com
The default depth is
0
, which means only the target page will be checked. If the depth is-1
, it will crawl all the pages on the website. -
Press
ctrl+c
to stop the program.