Skip to content

Check for broken links

CDM edited this page Feb 11, 2021 · 2 revisions

The following is one method for checking for broken links across the site; other methods could be used instead.

Set up

  • Download and install GNU Wget - see download options

  • Make sure you can run wget from the command line (test with wget --version)

Check for broken links

Run the following:

wget --spider --debug -nd -nv -o wget.log -e robots=off -r https://niem.github.io/

This may take a few minutes to run.

Options

Option Description
--spider Tell wget to check that pages are there but not download them
--debug Turns on debug output. This is needed to capture which page called for the broken link.
-e robots=off Turn off the robot exclusion
-o wget.log Output results to a file named "wget.log"
-nd No directories. Does not create a hierarchy of directories when retrieving recursively.
-nv No verbose. Prints basic info and error messages. Output option between quiet and verbose.
-r Recursive. Default maximum depth is 5.

Review the results

There will be a list of broken links at the end of the log file:

Broken links summary

However, the summary does not tell you which page called the broken link.

Search the log file for "404 Not Found" for more information:

HEAD and Referer lines above the '404 Not Found' error

About a dozen lines above the "404 Not Found" line is a "---request begin---" comment. The two lines following the comment provide more information:

  • "HEAD ..." - This is the broken link
  • "Referer: - This is the page that is calling the broken link

Note: glogg is a nice tool for searching through log files. You can search for a string ("404 Not Found"), see a list of matches with line numbers in the panel at the bottom, and click on each one to jump to the line in the file.

glogg display

General process

  • Check for broken links on your fork of niem.github.io before updating the main site. You may catch two kinds of broken links:
    • Links that will be broken on niem.github.io.
    • Links that are only broken on your fork. Use Jekyll's relative_url filter in order to convert links for your fork (instead of niem.github.io, your site will be at FORK.github.io.NIEM.github.io) - use {{ url | relative_url }}
  • Check for broken links on niem.github.io once the site has been updated.
    • Note: It may take 10 minutes once the changes have been committed for the site to update.
  • Check niem.gov for broken links in case any pages refer to niem.github.io content that is no longer available.