Web Crawler

A small project to familiarise myself again with Ruby. I have in the past written a web crawler before, so I thought it to be a good exercise to try and do this again.

In the past I have written this in Go, a language that lends itself well for crawling thanks to its multi-threading. An interesting challenge will be to see how I design a crawler in Ruby.

Set up

To initialise the project, ensure bundler is installed and run:

bundle install

Testing

To run all tests, run:

rspec

Specs

V1 We want this project:

To be able to start crawling from a certain URL. This URL can be hardcoded.
to fetch all URLs from this main URL
to return a list of all URLs that are on this base URL

V2 We want this project:

To be able to start crawling from a certain URL. This URL can be hardcoded.
to fetch all URLs from this main URL
for each URL returned, to go one level deep and fetch all URLs for each child URL
to return all fetched URLs. We want to have an understanding of the tree structure what child URL belongs to which parent.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
spec		spec
.rspec		.rspec
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
crawler.rb		crawler.rb
extractor.rb		extractor.rb
main.rb		main.rb
printer.rb		printer.rb
response_body.html		response_body.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Crawler

Set up

Testing

Specs

About

Releases

Packages

Languages

RSijelmass/ruby-web-crawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler

Set up

Testing

Specs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages