Please note this is a work in progress. If you just want to use the script, use this

A WEB app for crawling ads.txt of desired domains

User will login with credentials, upload a file with a list of domains in a certain format and schedule a crawl.
Once crawling is done. User can download the zip file which will have all ads.txt content in csv format.
At the backend all scheduled crawls will be registered as jobs in queues in Rabbitmq.
Multiple workers will be spawned to handle parallel demand for crawling.

What is ads.txt?

Tell me

Format for file with list of domains.

NOTE: List of domains should be written separately each on a new line.

domain1.com  
domain2.in  
www.domain3.net

File structure

adstxt/           --- Helper scripts, spiders and other scrapy files.  
adstxtui/         --- All UI related files.  
archives/         --- Old archived code for reference.  
crawl.sh          --- shell script to run individual spider.
docs/             --- Required documents for reference.  
requirements.txt  --- List of python libraries required by this app.  
LICENSE           --- License file
pencilproject/    --- Rudimentary wireframe made in pencil project.
setup_app.sh      --- Shell script to setup the entire web application.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Please note this is a work in progress. If you just want to use the script, use this

A WEB app for crawling ads.txt of desired domains

What is ads.txt?

Format for file with list of domains.

File structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
adstxt		adstxt
adstxtui		adstxtui
archives		archives
docs		docs
pencilproject		pencilproject
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
crawl.sh		crawl.sh
requirements.txt		requirements.txt
setup_app.sh		setup_app.sh

License

kaustubhd93/adstxt-crawler-ui

Folders and files

Latest commit

History

Repository files navigation

Please note this is a work in progress. If you just want to use the script, use this

A WEB app for crawling ads.txt of desired domains

What is ads.txt?

Format for file with list of domains.

File structure

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages