-
Notifications
You must be signed in to change notification settings - Fork 35
Check for broken links
The following is one method for checking for broken links across the site; other methods could be used instead.
-
Download and install GNU Wget - see download options
-
Make sure you can run wget from the command line (test with
wget --version
)
Run the following:
wget --spider --debug -nd -nv -o wget.log -e robots=off -r https://niem.github.io/
This may take a few minutes to run.
Options
Option | Description |
---|---|
--spider | Tell wget to check that pages are there but not download them |
--debug | Turns on debug output. This is needed to capture which page called for the broken link. |
-e robots=off | Turn off the robot exclusion |
-o wget.log | Output results to a file named "wget.log" |
-nd | No directories. Does not create a hierarchy of directories when retrieving recursively. |
-nv | No verbose. Prints basic info and error messages. Output option between quiet and verbose. |
-r | Recursive. Default maximum depth is 5. |
There will be a list of broken links at the end of the log file:
However, the summary does not tell you which page called the broken link.
Search the log file for "404 Not Found" for more information:
About a dozen lines above the "404 Not Found" line is a "---request begin---" comment. The two lines following the comment provide more information:
- "HEAD ..." - This is the broken link
- "Referer: - This is the page that is calling the broken link
Note: glogg is a nice tool for searching through log files. You can search for a string ("404 Not Found"), see a list of matches with line numbers in the panel at the bottom, and click on each one to jump to the line in the file.
- Check for broken links on your fork of niem.github.io before updating the main site. You may catch two kinds of broken links:
- Links that will be broken on niem.github.io.
- Links that are only broken on your fork. Use Jekyll's relative_url filter in order to convert links for your fork (instead of niem.github.io, your site will be at FORK.github.io.NIEM.github.io) - use
{{ url | relative_url }}
- Check for broken links on niem.github.io once the site has been updated.
- Note: It may take 10 minutes once the changes have been committed for the site to update.
- Check niem.gov for broken links in case any pages refer to niem.github.io content that is no longer available.