Web crawler for checking links.
The purpose of spider crab is to provide a small, portable, and fast static website checker that can be used in CI pipelines to monitor for broken links.
If Spider Crab finds the following, then it will return a non-zero exit code:
- A referenced URL/page returns an unsuccessful HTTP status code
- An
<a>
or<link>
element without anhref
attribute, or anhref
attribute that is blank (href=""
) - An
<img>
element without asrc
attribute, or asrc
attribute that is empty - A
<script>
element without asrc
attribute and no content between the tags
If Spider Crab does not find any issues, then it will return a 0
exit code.
Usage: spider-crab.exe [OPTIONS] <url>
Arguments:
<url> URL of the webpage to check.
Options:
-d, --depth <depth> Depth of links to check. Default is -1 which is unlimited. [default: -1]
-q Silence logging output.
-v... Print more log messages.
-o, --dot <dot> Save output to file in graphiz Dot format.
-h, --help Print help
Example:
spider-crab -v https://example.com
If you do not want Spider Crab to check a link/element on your webpage, add the scrab-skip
CSS class to the link.
Example:
<a href="https://non-existent-website.net" class="scrab-skip my-custom-class" >This link will not be checked by Spider Crab!</a>
If you want to ignore specific errors on specific pages, then you can write a .spidercrab-ignore
file and place it in your working directory.
When spider-crab launches, it will read the file line by line for a ignore-rule target-url
pairing, separated by any amount whitespace.
Lines starting with a #
are comments and will be ignored.
The names of rules to ignore are printed between the parenthesis ()
of an error report when you run spider crab.
For example, to ignore this error:
ERROR - SpiderError (missing-title): Page at "https://example-page.com/somewhere/something.html" does not have a title!
We would need to add this line to our .spidercrab-ignore
file:
missing-title https://example-page.com/somewhere/something.html
Here is a more complete example of an .spidercrab-ignore
file:
# This line is a comment
# Ignore that this page doesn't have a title. It's an archived page that we won't fix due to historic reasons
missing-title https://old-website.com/somewhere/something.html
# Ignore the 400 HTTP status code this website returns. It's an external website that blocks spider crab
http-error https://another-website-somewhere.org/
Since version 1.0.0, spider-crab
uses the Conventional Commits 1.0.0 standard for commit messages.
However, if you make contributions that do not follow the Conventional Commits standard, then a maintainer will squash your commits and make a merge commit that follows the Conventional Commits standard.
spider-crab
uses the default cargo fmt
formatter and cargo clippy
linter.
To run the integration tests, run: cargo test
.
To generate source based code coverage reports, use the following commands:
- Install
llvm-tools-preview
andgrcov
rustup component add llvm-tools-preview
cargo install grcov
- Clean the build
cargo clean
- Run the tests with
RUSTFLAGS
set to create profile files
CARGO_INCREMENTAL=0 RUSTFLAGS='-Cinstrument-coverage' LLVM_PROFILE_FILE='cargo-test-%p-%m.profraw' cargo test
For Windows users:
CARGO_INCREMENTAL=0
RUSTFLAGS=-C instrument-coverage
LLVM_PROFILE_FILE=cargo-test-%p-%m.profraw
cargo test
- Generate an HTML report file with grcov:
grcov . --binary-path ./target/debug/deps/ -s . -t html --branch --ignore-not-existing --ignore '../*' --ignore "/*" --ignore 'target/*/build/*5ever' -o target/coverage/html
This repo uses two workflows in .github/workflows
:
lint_fmt_test.yml
- Checks Pull Requests, runs formatter, clippy,cargo check
,cargo test
, and ensures that the version number inCargo.toml
was bumped.release.yml
- Performs acargo build --release
and uploads the compiled binary to GitHub
To make a new release, perform the following:
- Make a tag on the
main
branch that matches the version number inCargo.toml
. UpdateCargo.toml
if necessary. - Push the tag.
- Watch for the GitHub workflow to finish and upload the build artifacts.