Course project for the Architecture of Computer Systems course.
We are working on multiple components of the web crawler at the same time:
- Website backend
- Elasticsearch database backend
- Two crawlers (one in Python, and one in Rust)
- Language detection backend in Rust and Python.
Each component is intended to run as a separate Docker container, for us to be able to freely mix them in different amounts and on different computers/servers.
Progress can be tracked over here.
Launch each container independently with instructions in respective directories, or launch all of them together:
# Download the file with crawled websites, or crawl the websites on your own into
# the root of the project as out.txt: https://drive.google.com/file/d/1XsnWbmk4YzLmZqWjRaMXDzMC_-Rv0Zwm/view
docker-compose build
docker-compose up