Skip to content

Bryan0119/crawl-404

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Crawling Broken Links with 404 Error in Python

The sample demonstrates how to crawl website to find out 404 pages in Python.

Installation

pip install beautifulsoup4

How to Run

  1. Run 404crawler.py with the target page, the depth of crawling and link filter:

    For example, if you want to crawl the website https://www.dynamsoft.com with the depth of 1, you can run the following command:

    python 404crawler.py -l https://www.dynamsoft.com -d 1 -f dynamsoft.com

    The default depth is 0, which means only the target page will be checked. If the depth is -1, it will crawl all the pages on the website.

  2. Press ctrl+c to stop the program.

Blog

How to Check Broken Links with 404 Error in Python

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages