From 27332f40b18a999ab52ec8c022006a5705cb8f3e Mon Sep 17 00:00:00 2001 From: henrik molnes Date: Tue, 10 May 2016 09:02:12 +0200 Subject: [PATCH 01/29] Update README.md --- README.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index bb68713..b31f859 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,12 @@ # LinkCrawler Simple C# console application that will crawl the given webpage for image-tags and hyperlinks. If some of them is not working, info will be sent to output. -| Branch | Build status | -| :----- | :---------------------------------------| -| develop | [![Build status](https://ci.appveyor.com/api/projects/status/syw3l7xeicy7xc0b/branch/develop?svg=true)](https://ci.appveyor.com/project/hmol/linkcrawler/branch/develop) | -| master | [![Build status](https://ci.appveyor.com/api/projects/status/syw3l7xeicy7xc0b/branch/master?svg=true)](https://ci.appveyor.com/project/hmol/linkcrawler/branch/master) | + ## Why? Because it could be useful to know when a webpage you have responsibility for displays broken links to it's users. I have this running continuously, but you don't have to. For instance, after upgrading your CMS, changing database-scheme, migrating content etc, it can be relevant to know if this did or did not not introduce broken links. Just run this tool one time and you will know exactly how many links are broken, where they link to, and where they are located. -## App.Settings +## AppSettings | Key | Usage | | :-------------------------- | :---------------------------------------| @@ -22,13 +19,20 @@ Because it could be useful to know when a webpage you have responsibility for di | ```Slack.WebHook.Bot.IconEmoji``` | Custom Emoji for slack bot | | ```OnlyReportBrokenLinksToOutput``` | If true, only broken links will be reported to output. | | ```Slack.WebHook.Bot.MessageFormat``` | String format message that will be sent to slack | -| ```Csv.Enabled``` | Enable/disable CSV output | | ```Csv.FilePath``` | File path for the CSV file | | ```Csv.Overwrite``` | Whether to overwrite or append (if file exists) | | ```Csv.Delimiter ``` | Delimiter between columns in the CSV file (like ',' or ';') | +Ther also is a `````` that controls what output should be used. + ## Build Clone repo :point_right: open solution in Visual Studio :point_right: build :facepunch: +AppVeyor is used as CI, so when code is pushed to this repo the solution will get built and all tests will be run. + +| Branch | Build status | +| :----- | :---------------------------------------| +| develop | [![Build status](https://ci.appveyor.com/api/projects/status/syw3l7xeicy7xc0b/branch/develop?svg=true)](https://ci.appveyor.com/project/hmol/linkcrawler/branch/develop) | +| master | [![Build status](https://ci.appveyor.com/api/projects/status/syw3l7xeicy7xc0b/branch/master?svg=true)](https://ci.appveyor.com/project/hmol/linkcrawler/branch/master) | ## Output to console ![Example run on www.github.com](http://henrikm.com/content/images/2016/Feb/linkcrawler_example.PNG "Example run on www.github.com") From 08ce6e2925f80b363ea1653ff3db7f0f283c08ef Mon Sep 17 00:00:00 2001 From: henrik molnes Date: Tue, 10 May 2016 09:03:04 +0200 Subject: [PATCH 02/29] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index b31f859..b4b7cd9 100644 --- a/README.md +++ b/README.md @@ -27,6 +27,7 @@ Ther also is a `````` that controls what output should be used. ## Build Clone repo :point_right: open solution in Visual Studio :point_right: build :facepunch: + AppVeyor is used as CI, so when code is pushed to this repo the solution will get built and all tests will be run. | Branch | Build status | From d4ce3ab69511a011d6e0b791c26644c26a14dc7f Mon Sep 17 00:00:00 2001 From: henrik molnes Date: Tue, 10 May 2016 12:14:35 +0200 Subject: [PATCH 03/29] Update README.md --- README.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index b4b7cd9..9a9da66 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,16 @@ Simple C# console application that will crawl the given webpage for image-tags a ## Why? Because it could be useful to know when a webpage you have responsibility for displays broken links to it's users. I have this running continuously, but you don't have to. For instance, after upgrading your CMS, changing database-scheme, migrating content etc, it can be relevant to know if this did or did not not introduce broken links. Just run this tool one time and you will know exactly how many links are broken, where they link to, and where they are located. +## Build +Clone repo :point_right: open solution in Visual Studio :point_right: build :facepunch: + +AppVeyor is used as CI, so when code is pushed to this repo the solution will get built and all tests will be run. + +| Branch | Build status | +| :----- | :---------------------------------------| +| develop | [![Build status](https://ci.appveyor.com/api/projects/status/syw3l7xeicy7xc0b/branch/develop?svg=true)](https://ci.appveyor.com/project/hmol/linkcrawler/branch/develop) | +| master | [![Build status](https://ci.appveyor.com/api/projects/status/syw3l7xeicy7xc0b/branch/master?svg=true)](https://ci.appveyor.com/project/hmol/linkcrawler/branch/master) | + ## AppSettings | Key | Usage | @@ -25,16 +35,6 @@ Because it could be useful to know when a webpage you have responsibility for di Ther also is a `````` that controls what output should be used. -## Build -Clone repo :point_right: open solution in Visual Studio :point_right: build :facepunch: - -AppVeyor is used as CI, so when code is pushed to this repo the solution will get built and all tests will be run. - -| Branch | Build status | -| :----- | :---------------------------------------| -| develop | [![Build status](https://ci.appveyor.com/api/projects/status/syw3l7xeicy7xc0b/branch/develop?svg=true)](https://ci.appveyor.com/project/hmol/linkcrawler/branch/develop) | -| master | [![Build status](https://ci.appveyor.com/api/projects/status/syw3l7xeicy7xc0b/branch/master?svg=true)](https://ci.appveyor.com/project/hmol/linkcrawler/branch/master) | - ## Output to console ![Example run on www.github.com](http://henrikm.com/content/images/2016/Feb/linkcrawler_example.PNG "Example run on www.github.com") From 41e936b8b1c721f0fee4c4facf54ef166221b83f Mon Sep 17 00:00:00 2001 From: Henrik Molnes Date: Wed, 11 May 2016 09:52:32 +0200 Subject: [PATCH 04/29] Made write to csv file thread safe, issue #15 --- LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs b/LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs index 6e59dff..b0bec70 100644 --- a/LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs +++ b/LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs @@ -1,14 +1,14 @@ -using System; -using System.IO; -using LinkCrawler.Models; +using LinkCrawler.Models; using LinkCrawler.Utils.Settings; +using System; +using System.IO; namespace LinkCrawler.Utils.Outputs { public class CsvOutput : IOutput, IDisposable { private readonly ISettings _settings; - private StreamWriter _writer; + private TextWriter _writer; public CsvOutput(ISettings settings) { @@ -20,7 +20,9 @@ private void Setup() { var fileMode = _settings.CsvOverwrite ? FileMode.Create : FileMode.Append; var file = new FileStream(_settings.CsvFilePath, fileMode, FileAccess.Write); - _writer = new StreamWriter(file); + + var streamWriter = new StreamWriter(file); + _writer = TextWriter.Synchronized(streamWriter); if (fileMode == FileMode.Create) { From 7d4622155f52f3d0d3c1c3bbf68bd68def45374c Mon Sep 17 00:00:00 2001 From: henrik molnes Date: Wed, 11 May 2016 10:09:22 +0200 Subject: [PATCH 05/29] Update README.md --- README.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/README.md b/README.md index 9a9da66..db9cdb9 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,5 @@ # LinkCrawler -Simple C# console application that will crawl the given webpage for image-tags and hyperlinks. If some of them is not working, info will be sent to output. - - +Simple C# console application that will crawl the given webpage for broken image-tags and hyperlinks. The result of this will be written to output. Right now we have these outputs: console, csv, slack. ## Why? Because it could be useful to know when a webpage you have responsibility for displays broken links to it's users. I have this running continuously, but you don't have to. For instance, after upgrading your CMS, changing database-scheme, migrating content etc, it can be relevant to know if this did or did not not introduce broken links. Just run this tool one time and you will know exactly how many links are broken, where they link to, and where they are located. From d4ba0260fb1b50b0b9ec9eaa1e11677f33e4acca Mon Sep 17 00:00:00 2001 From: henrik molnes Date: Wed, 11 May 2016 14:23:27 +0200 Subject: [PATCH 06/29] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index db9cdb9..2cf690e 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,8 @@ # LinkCrawler Simple C# console application that will crawl the given webpage for broken image-tags and hyperlinks. The result of this will be written to output. Right now we have these outputs: console, csv, slack. +Example run with console output: +![Example run with console output](http://henrikm.com/content/images/2016/May/linkcrawler.gif "Example run with console output") ## Why? Because it could be useful to know when a webpage you have responsibility for displays broken links to it's users. I have this running continuously, but you don't have to. For instance, after upgrading your CMS, changing database-scheme, migrating content etc, it can be relevant to know if this did or did not not introduce broken links. Just run this tool one time and you will know exactly how many links are broken, where they link to, and where they are located. From f42cbb3b5c9ebc1f3bf79d802b258392aee70628 Mon Sep 17 00:00:00 2001 From: henrik molnes Date: Thu, 12 May 2016 18:38:46 +0200 Subject: [PATCH 07/29] Update README.md --- README.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/README.md b/README.md index 2cf690e..e494c31 100644 --- a/README.md +++ b/README.md @@ -35,9 +35,6 @@ AppVeyor is used as CI, so when code is pushed to this repo the solution will ge Ther also is a `````` that controls what output should be used. -## Output to console -![Example run on www.github.com](http://henrikm.com/content/images/2016/Feb/linkcrawler_example.PNG "Example run on www.github.com") - ## Output to file ```LinkCrawler.exe >> crawl.log``` will save output to file. ![Slack](http://henrikm.com/content/images/2016/Feb/as-file.png "Output to file") From 3d7aa87f22c9de4b93d2eb0c32fcd6a1544e6e57 Mon Sep 17 00:00:00 2001 From: Henrik Molnes Date: Sat, 14 May 2016 22:15:15 +0200 Subject: [PATCH 08/29] added AutoFlush=true to StreamWriter, would not write to file if this is not set --- LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs b/LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs index b0bec70..0b6ed79 100644 --- a/LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs +++ b/LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs @@ -8,7 +8,7 @@ namespace LinkCrawler.Utils.Outputs public class CsvOutput : IOutput, IDisposable { private readonly ISettings _settings; - private TextWriter _writer; + public TextWriter _writer; public CsvOutput(ISettings settings) { @@ -21,9 +21,9 @@ private void Setup() var fileMode = _settings.CsvOverwrite ? FileMode.Create : FileMode.Append; var file = new FileStream(_settings.CsvFilePath, fileMode, FileAccess.Write); - var streamWriter = new StreamWriter(file); + var streamWriter = new StreamWriter(file) {AutoFlush = true}; _writer = TextWriter.Synchronized(streamWriter); - + if (fileMode == FileMode.Create) { _writer.WriteLine("Code{0}Status{0}Url{0}Referer", _settings.CsvDelimiter); From aea6305f3c15ca58eb8dc6b7aaca3b316abf45cc Mon Sep 17 00:00:00 2001 From: Henrik Molnes Date: Wed, 15 Jun 2016 20:52:01 +0200 Subject: [PATCH 09/29] added new extensionmethod to support baseurls with segments --- LinkCrawler/LinkCrawler/LinkCrawler.cs | 1 - LinkCrawler/LinkCrawler/LinkCrawler.csproj | 1 + .../Utils/Extensions/StringExtensions.cs | 10 ++++++++++ .../Utils/Extensions/UriExtensions.cs | 16 ++++++++++++++++ .../LinkCrawler/Utils/Parsers/ValidUrlParser.cs | 3 ++- 5 files changed, 29 insertions(+), 2 deletions(-) create mode 100644 LinkCrawler/LinkCrawler/Utils/Extensions/UriExtensions.cs diff --git a/LinkCrawler/LinkCrawler/LinkCrawler.cs b/LinkCrawler/LinkCrawler/LinkCrawler.cs index 1751888..04ffad1 100644 --- a/LinkCrawler/LinkCrawler/LinkCrawler.cs +++ b/LinkCrawler/LinkCrawler/LinkCrawler.cs @@ -7,7 +7,6 @@ using RestSharp; using System; using System.Collections.Generic; -using LinkCrawler.Utils.Outputs; namespace LinkCrawler { diff --git a/LinkCrawler/LinkCrawler/LinkCrawler.csproj b/LinkCrawler/LinkCrawler/LinkCrawler.csproj index 56ff5bc..bc226d1 100644 --- a/LinkCrawler/LinkCrawler/LinkCrawler.csproj +++ b/LinkCrawler/LinkCrawler/LinkCrawler.csproj @@ -79,6 +79,7 @@ + diff --git a/LinkCrawler/LinkCrawler/Utils/Extensions/StringExtensions.cs b/LinkCrawler/LinkCrawler/Utils/Extensions/StringExtensions.cs index d9298ca..658976e 100644 --- a/LinkCrawler/LinkCrawler/Utils/Extensions/StringExtensions.cs +++ b/LinkCrawler/LinkCrawler/Utils/Extensions/StringExtensions.cs @@ -20,5 +20,15 @@ public static bool ToBool(this string str) bool.TryParse(str, out parsed); return parsed; } + + public static string TrimEnd(this string input, string suffixToRemove) + { + if (input != null && suffixToRemove != null + && input.EndsWith(suffixToRemove)) + { + return input.Substring(0, input.Length - suffixToRemove.Length); + } + return input; + } } } diff --git a/LinkCrawler/LinkCrawler/Utils/Extensions/UriExtensions.cs b/LinkCrawler/LinkCrawler/Utils/Extensions/UriExtensions.cs new file mode 100644 index 0000000..a098d24 --- /dev/null +++ b/LinkCrawler/LinkCrawler/Utils/Extensions/UriExtensions.cs @@ -0,0 +1,16 @@ +using System; +using System.Linq; + +namespace LinkCrawler.Utils.Extensions +{ + public static class UriExtensions + { + public static string RemoveSegments(this Uri uri) + { + var uriString = uri.ToString(); + var segments = string.Join("/", uri.Segments.Where(x => x != "/")); + + return uriString.TrimEnd(segments); + } + } +} diff --git a/LinkCrawler/LinkCrawler/Utils/Parsers/ValidUrlParser.cs b/LinkCrawler/LinkCrawler/Utils/Parsers/ValidUrlParser.cs index 20f95a3..54d9097 100644 --- a/LinkCrawler/LinkCrawler/Utils/Parsers/ValidUrlParser.cs +++ b/LinkCrawler/LinkCrawler/Utils/Parsers/ValidUrlParser.cs @@ -12,7 +12,8 @@ public class ValidUrlParser : IValidUrlParser public ValidUrlParser(ISettings settings) { Regex = new Regex(settings.ValidUrlRegex); - BaseUrl = settings.BaseUrl; + var baseUri = new Uri(settings.BaseUrl); + BaseUrl = baseUri.RemoveSegments(); } public bool Parse(string url, out string validUrl) From 61d5de21dff68367d990adfe845307d18fe31522 Mon Sep 17 00:00:00 2001 From: Henrik Molnes Date: Wed, 15 Jun 2016 21:24:59 +0200 Subject: [PATCH 10/29] corrected removing of segments in uri --- LinkCrawler/LinkCrawler/Utils/Extensions/UriExtensions.cs | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/LinkCrawler/LinkCrawler/Utils/Extensions/UriExtensions.cs b/LinkCrawler/LinkCrawler/Utils/Extensions/UriExtensions.cs index a098d24..8b62be9 100644 --- a/LinkCrawler/LinkCrawler/Utils/Extensions/UriExtensions.cs +++ b/LinkCrawler/LinkCrawler/Utils/Extensions/UriExtensions.cs @@ -1,5 +1,4 @@ using System; -using System.Linq; namespace LinkCrawler.Utils.Extensions { @@ -8,8 +7,7 @@ public static class UriExtensions public static string RemoveSegments(this Uri uri) { var uriString = uri.ToString(); - var segments = string.Join("/", uri.Segments.Where(x => x != "/")); - + var segments = string.Join(string.Empty, uri.Segments); return uriString.TrimEnd(segments); } } From a50b0240063ea6a0f0bd35e493882b3da546a08c Mon Sep 17 00:00:00 2001 From: Paul Trott Date: Thu, 16 Jun 2016 22:00:02 -0500 Subject: [PATCH 11/29] Added a MockSettings class for dependency injection unit testing purposes. Added the SlackClientTests class and tests for the default constructor --- .../LinkCrawler.Tests.csproj | 5 ++- .../ClientsTests/SlackClientTests.cs | 38 ++++++++++++++++ LinkCrawler/LinkCrawler/LinkCrawler.csproj | 1 + .../Utils/Settings/MockSettings.cs | 44 +++++++++++++++++++ 4 files changed, 87 insertions(+), 1 deletion(-) create mode 100644 LinkCrawler/LinkCrawler.Tests/UtilsTests/ClientsTests/SlackClientTests.cs create mode 100644 LinkCrawler/LinkCrawler/Utils/Settings/MockSettings.cs diff --git a/LinkCrawler/LinkCrawler.Tests/LinkCrawler.Tests.csproj b/LinkCrawler/LinkCrawler.Tests/LinkCrawler.Tests.csproj index 2f7ce1b..ecbef52 100644 --- a/LinkCrawler/LinkCrawler.Tests/LinkCrawler.Tests.csproj +++ b/LinkCrawler/LinkCrawler.Tests/LinkCrawler.Tests.csproj @@ -50,6 +50,7 @@ + @@ -60,7 +61,9 @@ - + + + {db53303b-f9fb-4d77-b656-d05db0420e6a} diff --git a/LinkCrawler/LinkCrawler.Tests/UtilsTests/ClientsTests/SlackClientTests.cs b/LinkCrawler/LinkCrawler.Tests/UtilsTests/ClientsTests/SlackClientTests.cs new file mode 100644 index 0000000..1908a18 --- /dev/null +++ b/LinkCrawler/LinkCrawler.Tests/UtilsTests/ClientsTests/SlackClientTests.cs @@ -0,0 +1,38 @@ +using LinkCrawler.Utils.Clients; +using NUnit.Framework; +using LinkCrawler.Utils.Settings; + +namespace LinkCrawler.Tests.UtilsTests.ClientsTests { + + [TestFixture] + public class SlackClientTests { + + //MethodName_StateUnderTest_ExpectedBehaviour + [Test] + public void SlackClient_InstantiationWithWebHookUrl_InstantiatedCorrectlyWithWebHookUrl() { + MockSettings settings = new MockSettings(true); + SlackClient sc = new SlackClient(settings); + + Assert.AreEqual(@"https://hooks.slack.com/services/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", sc.WebHookUrl); + Assert.AreEqual("Homer Bot", sc.BotName); + Assert.AreEqual(":homer:", sc.BotIcon); + Assert.AreEqual("*Doh! There is a link not working* Url: {0} Statuscode: {1} The link is placed on this page: {2}", sc.MessageFormat); + Assert.IsTrue(sc.HasWebHookUrl); + + } + + [Test] + public void SlackClient_InstantiationWithoutWebHookUrl_InstantiatedCorrectlyWithoutWebHookUrl() { + MockSettings settings = new MockSettings(false); + SlackClient sc = new SlackClient(settings); + + Assert.AreEqual("", sc.WebHookUrl); + Assert.AreEqual("Homer Bot", sc.BotName); + Assert.AreEqual(":homer:", sc.BotIcon); + Assert.AreEqual("*Doh! There is a link not working* Url: {0} Statuscode: {1} The link is placed on this page: {2}", sc.MessageFormat); + Assert.IsFalse(sc.HasWebHookUrl); + + } + } + +} diff --git a/LinkCrawler/LinkCrawler/LinkCrawler.csproj b/LinkCrawler/LinkCrawler/LinkCrawler.csproj index bc226d1..c1ccabe 100644 --- a/LinkCrawler/LinkCrawler/LinkCrawler.csproj +++ b/LinkCrawler/LinkCrawler/LinkCrawler.csproj @@ -84,6 +84,7 @@ + diff --git a/LinkCrawler/LinkCrawler/Utils/Settings/MockSettings.cs b/LinkCrawler/LinkCrawler/Utils/Settings/MockSettings.cs new file mode 100644 index 0000000..1cf9c74 --- /dev/null +++ b/LinkCrawler/LinkCrawler/Utils/Settings/MockSettings.cs @@ -0,0 +1,44 @@ +using LinkCrawler.Utils.Extensions; +using System.Net; + +namespace LinkCrawler.Utils.Settings { + public class MockSettings : ISettings { + + public string BaseUrl => "https://github.com"; + + public bool CheckImages => true; + + public string CsvDelimiter => ";"; + + public string CsvFilePath => @"C:\tmp\output.csv"; + + public bool CsvOverwrite => true; + + public bool OnlyReportBrokenLinksToOutput => false; + + public string SlackWebHookBotIconEmoji => ":homer:"; + + public string SlackWebHookBotMessageFormat => "*Doh! There is a link not working* Url: {0} Statuscode: {1} The link is placed on this page: {2}"; + + public string SlackWebHookBotName => "Homer Bot"; + + private bool IncludeWebHookUrl { get; set; } + public string SlackWebHookUrl + { + get + { + return IncludeWebHookUrl ? @"https://hooks.slack.com/services/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" : ""; + } + } + + public string ValidUrlRegex => @"(^http[s]?:\/{2})|(^www)|(^\/{1,2})"; + + public bool IsSuccess(HttpStatusCode statusCode) { + return statusCode.IsSuccess("1xx,2xx,3xx"); + } + + public MockSettings(bool includeWebHookUrl) { + this.IncludeWebHookUrl = includeWebHookUrl; + } + } +} From df51a2a3e48ef58d6990bd2c7bc6aad6c8b3514a Mon Sep 17 00:00:00 2001 From: Thomas Wright Date: Fri, 23 Sep 2016 16:28:43 +0100 Subject: [PATCH 12/29] Added option to print elapsed milliseconds to console at end of processing. --- LinkCrawler/LinkCrawler/App.config | 1 + LinkCrawler/LinkCrawler/LinkCrawler.cs | 34 +++++++++++++++++++ .../Utils/Outputs/ConsoleOutput.cs | 7 +++- .../LinkCrawler/Utils/Outputs/CsvOutput.cs | 5 +++ .../LinkCrawler/Utils/Outputs/IOutput.cs | 1 + .../LinkCrawler/Utils/Outputs/SlackOutput.cs | 5 +++ .../LinkCrawler/Utils/Settings/Constants.cs | 1 + .../LinkCrawler/Utils/Settings/ISettings.cs | 2 ++ .../LinkCrawler/Utils/Settings/Settings.cs | 3 ++ 9 files changed, 58 insertions(+), 1 deletion(-) diff --git a/LinkCrawler/LinkCrawler/App.config b/LinkCrawler/LinkCrawler/App.config index fbedffb..a87ddf3 100644 --- a/LinkCrawler/LinkCrawler/App.config +++ b/LinkCrawler/LinkCrawler/App.config @@ -13,6 +13,7 @@ + diff --git a/LinkCrawler/LinkCrawler/LinkCrawler.cs b/LinkCrawler/LinkCrawler/LinkCrawler.cs index 04ffad1..80523ea 100644 --- a/LinkCrawler/LinkCrawler/LinkCrawler.cs +++ b/LinkCrawler/LinkCrawler/LinkCrawler.cs @@ -7,6 +7,7 @@ using RestSharp; using System; using System.Collections.Generic; +using System.Diagnostics; namespace LinkCrawler { @@ -19,7 +20,9 @@ public class LinkCrawler public IValidUrlParser ValidUrlParser { get; set; } public bool OnlyReportBrokenLinksToOutput { get; set; } public static List VisitedUrlList { get; set; } + public static List CompletedUrlList { get; set; } private ISettings _settings; + private Stopwatch timer; public LinkCrawler(IEnumerable outputs, IValidUrlParser validUrlParser, ISettings settings) { @@ -28,13 +31,17 @@ public LinkCrawler(IEnumerable outputs, IValidUrlParser validUrlParser, ValidUrlParser = validUrlParser; CheckImages = settings.CheckImages; VisitedUrlList = new List(); + CompletedUrlList = new List(); RestRequest = new RestRequest(Method.GET).SetHeader("Accept", "*/*"); OnlyReportBrokenLinksToOutput = settings.OnlyReportBrokenLinksToOutput; _settings = settings; + this.timer = new Stopwatch(); } public void Start() { + this.timer.Start(); + VisitedUrlList.Add(BaseUrl); SendRequest(BaseUrl); } @@ -91,6 +98,33 @@ public void WriteOutput(IResponseModel responseModel) output.WriteInfo(responseModel); } } + + CheckIfFinal(responseModel); + } + + private void CheckIfFinal(IResponseModel responseModel) + { + if (!CompletedUrlList.Contains(responseModel.RequestedUrl)) + { + CompletedUrlList.Add(responseModel.RequestedUrl); + + if ((CompletedUrlList.Count == VisitedUrlList.Count) && (VisitedUrlList.Count > 1)) + FinaliseSession(); + } + } + + private void FinaliseSession() + { + this.timer.Stop(); + if (this._settings.PrintSummary) + { + string message = @" +Processing completed in " + this.timer.ElapsedMilliseconds.ToString() + "ms"; + foreach (var output in Outputs) + { + output.WriteInfo(message); + } + } } } } \ No newline at end of file diff --git a/LinkCrawler/LinkCrawler/Utils/Outputs/ConsoleOutput.cs b/LinkCrawler/LinkCrawler/Utils/Outputs/ConsoleOutput.cs index 1301c28..ccdef3a 100644 --- a/LinkCrawler/LinkCrawler/Utils/Outputs/ConsoleOutput.cs +++ b/LinkCrawler/LinkCrawler/Utils/Outputs/ConsoleOutput.cs @@ -13,7 +13,12 @@ public void WriteError(IResponseModel responseModel) public void WriteInfo(IResponseModel responseModel) { - Console.WriteLine(responseModel.ToString()); + WriteInfo(responseModel.ToString()); + } + + public void WriteInfo(String Info) + { + Console.WriteLine(Info); } } } diff --git a/LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs b/LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs index 0b6ed79..68fc7a1 100644 --- a/LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs +++ b/LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs @@ -40,6 +40,11 @@ public void WriteInfo(IResponseModel responseModel) Write(responseModel); } + public void WriteInfo(String Info) + { + // Do nothing - string info is only for console + } + private void Write(IResponseModel responseModel) { _writer?.WriteLine("{1}{0}{2}{0}{3}{0}{4}", diff --git a/LinkCrawler/LinkCrawler/Utils/Outputs/IOutput.cs b/LinkCrawler/LinkCrawler/Utils/Outputs/IOutput.cs index c924c13..888fb2a 100644 --- a/LinkCrawler/LinkCrawler/Utils/Outputs/IOutput.cs +++ b/LinkCrawler/LinkCrawler/Utils/Outputs/IOutput.cs @@ -6,5 +6,6 @@ public interface IOutput { void WriteError(IResponseModel responseModel); void WriteInfo(IResponseModel responseModel); + void WriteInfo(string InfoString); } } diff --git a/LinkCrawler/LinkCrawler/Utils/Outputs/SlackOutput.cs b/LinkCrawler/LinkCrawler/Utils/Outputs/SlackOutput.cs index 9454a69..4269ed7 100644 --- a/LinkCrawler/LinkCrawler/Utils/Outputs/SlackOutput.cs +++ b/LinkCrawler/LinkCrawler/Utils/Outputs/SlackOutput.cs @@ -21,5 +21,10 @@ public void WriteInfo(IResponseModel responseModel) { // Write nothing to Slack } + + public void WriteInfo(string Info) + { + // Write nothing to Slack + } } } diff --git a/LinkCrawler/LinkCrawler/Utils/Settings/Constants.cs b/LinkCrawler/LinkCrawler/Utils/Settings/Constants.cs index 33d4adc..434ad5b 100644 --- a/LinkCrawler/LinkCrawler/Utils/Settings/Constants.cs +++ b/LinkCrawler/LinkCrawler/Utils/Settings/Constants.cs @@ -17,6 +17,7 @@ public static class AppSettings public const string CsvDelimiter = "Csv.Delimiter"; public const string SuccessHttpStatusCodes = "SuccessHttpStatusCodes"; public const string OutputProviders = "outputProviders"; + public const string PrintSummary = "PrintSummary"; } public static class Response diff --git a/LinkCrawler/LinkCrawler/Utils/Settings/ISettings.cs b/LinkCrawler/LinkCrawler/Utils/Settings/ISettings.cs index 8f82816..cc03c8f 100644 --- a/LinkCrawler/LinkCrawler/Utils/Settings/ISettings.cs +++ b/LinkCrawler/LinkCrawler/Utils/Settings/ISettings.cs @@ -27,5 +27,7 @@ public interface ISettings string CsvDelimiter { get; } bool IsSuccess(HttpStatusCode statusCode); + + bool PrintSummary { get; } } } diff --git a/LinkCrawler/LinkCrawler/Utils/Settings/Settings.cs b/LinkCrawler/LinkCrawler/Utils/Settings/Settings.cs index b9a6c7d..a560270 100644 --- a/LinkCrawler/LinkCrawler/Utils/Settings/Settings.cs +++ b/LinkCrawler/LinkCrawler/Utils/Settings/Settings.cs @@ -39,6 +39,9 @@ public class Settings : ISettings public string CsvDelimiter => ConfigurationManager.AppSettings[Constants.AppSettings.CsvDelimiter]; + public bool PrintSummary => + ConfigurationManager.AppSettings[Constants.AppSettings.PrintSummary].ToBool(); + public bool IsSuccess(HttpStatusCode statusCode) { var configuredCodes = ConfigurationManager.AppSettings[Constants.AppSettings.SuccessHttpStatusCodes] ?? ""; From 7402ddef4d83f396148a3bf9ec0275cb929ee0a4 Mon Sep 17 00:00:00 2001 From: Henrik Molnes Date: Fri, 7 Oct 2016 21:33:11 +0200 Subject: [PATCH 13/29] fix for PrintSummary property --- LinkCrawler/LinkCrawler/Utils/Settings/MockSettings.cs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/LinkCrawler/LinkCrawler/Utils/Settings/MockSettings.cs b/LinkCrawler/LinkCrawler/Utils/Settings/MockSettings.cs index 1cf9c74..3f576d8 100644 --- a/LinkCrawler/LinkCrawler/Utils/Settings/MockSettings.cs +++ b/LinkCrawler/LinkCrawler/Utils/Settings/MockSettings.cs @@ -21,7 +21,7 @@ public class MockSettings : ISettings { public string SlackWebHookBotMessageFormat => "*Doh! There is a link not working* Url: {0} Statuscode: {1} The link is placed on this page: {2}"; public string SlackWebHookBotName => "Homer Bot"; - + public bool PrintSummary => false; private bool IncludeWebHookUrl { get; set; } public string SlackWebHookUrl { From 45f859fc9513db7ea58ab2f27ae889ff1b453909 Mon Sep 17 00:00:00 2001 From: tdwright Date: Mon, 10 Oct 2016 14:32:32 +0100 Subject: [PATCH 14/29] Refactored two lists of URLs into a single URL model --- .../LinkCrawler.Tests/LinkCrawlerTests.cs | 3 +- LinkCrawler/LinkCrawler/LinkCrawler.cs | 25 +++++++------ LinkCrawler/LinkCrawler/LinkCrawler.csproj | 1 + LinkCrawler/LinkCrawler/Models/LinkModel.cs | 35 +++++++++++++++++++ 4 files changed, 52 insertions(+), 12 deletions(-) create mode 100644 LinkCrawler/LinkCrawler/Models/LinkModel.cs diff --git a/LinkCrawler/LinkCrawler.Tests/LinkCrawlerTests.cs b/LinkCrawler/LinkCrawler.Tests/LinkCrawlerTests.cs index 3d3935e..332a324 100644 --- a/LinkCrawler/LinkCrawler.Tests/LinkCrawlerTests.cs +++ b/LinkCrawler/LinkCrawler.Tests/LinkCrawlerTests.cs @@ -5,6 +5,7 @@ using LinkCrawler.Utils.Settings; using Moq; using NUnit.Framework; +using System.Linq; namespace LinkCrawler.Tests { @@ -49,7 +50,7 @@ public void CrawlForLinksInResponse_ResponseModelWithMarkup_ValidUrlFoundInMarku mockResponseModel.Setup(x => x.Markup).Returns(markup); LinkCrawler.CrawlForLinksInResponse(mockResponseModel.Object); - Assert.That(LinkCrawler.VisitedUrlList.Contains(url)); + Assert.That(LinkCrawler.UrlList.Where(l=>l.Address == url).Count() > 0); } } } diff --git a/LinkCrawler/LinkCrawler/LinkCrawler.cs b/LinkCrawler/LinkCrawler/LinkCrawler.cs index 80523ea..6fd0691 100644 --- a/LinkCrawler/LinkCrawler/LinkCrawler.cs +++ b/LinkCrawler/LinkCrawler/LinkCrawler.cs @@ -8,6 +8,7 @@ using System; using System.Collections.Generic; using System.Diagnostics; +using System.Linq; namespace LinkCrawler { @@ -19,8 +20,7 @@ public class LinkCrawler public IEnumerable Outputs { get; set; } public IValidUrlParser ValidUrlParser { get; set; } public bool OnlyReportBrokenLinksToOutput { get; set; } - public static List VisitedUrlList { get; set; } - public static List CompletedUrlList { get; set; } + public static List UrlList; private ISettings _settings; private Stopwatch timer; @@ -30,8 +30,7 @@ public LinkCrawler(IEnumerable outputs, IValidUrlParser validUrlParser, Outputs = outputs; ValidUrlParser = validUrlParser; CheckImages = settings.CheckImages; - VisitedUrlList = new List(); - CompletedUrlList = new List(); + UrlList = new List(); RestRequest = new RestRequest(Method.GET).SetHeader("Accept", "*/*"); OnlyReportBrokenLinksToOutput = settings.OnlyReportBrokenLinksToOutput; _settings = settings; @@ -41,7 +40,7 @@ public LinkCrawler(IEnumerable outputs, IValidUrlParser validUrlParser, public void Start() { this.timer.Start(); - VisitedUrlList.Add(BaseUrl); + UrlList.Add(new LinkModel(BaseUrl)); SendRequest(BaseUrl); } @@ -74,10 +73,10 @@ public void CrawlForLinksInResponse(IResponseModel responseModel) foreach (var url in linksFoundInMarkup) { - if (VisitedUrlList.Contains(url)) + if (UrlList.Where(l => l.Address == url).Count() > 0) continue; - VisitedUrlList.Add(url); + UrlList.Add(new LinkModel(url)); SendRequest(url, responseModel.RequestedUrl); } } @@ -104,12 +103,16 @@ public void WriteOutput(IResponseModel responseModel) private void CheckIfFinal(IResponseModel responseModel) { - if (!CompletedUrlList.Contains(responseModel.RequestedUrl)) + // First set the status code for the completed link (this will set "CheckingFinished" to true) + foreach (LinkModel lm in UrlList.Where(l => l.Address == responseModel.RequestedUrl)) { - CompletedUrlList.Add(responseModel.RequestedUrl); + lm.StatusCode = responseModel.StatusCodeNumber; + } - if ((CompletedUrlList.Count == VisitedUrlList.Count) && (VisitedUrlList.Count > 1)) - FinaliseSession(); + // Then check to see whether there are any pending links left to check + if(UrlList.Where(l => l.CheckingFinished == false).Count() == 0) + { + FinaliseSession(); } } diff --git a/LinkCrawler/LinkCrawler/LinkCrawler.csproj b/LinkCrawler/LinkCrawler/LinkCrawler.csproj index bc226d1..8e059db 100644 --- a/LinkCrawler/LinkCrawler/LinkCrawler.csproj +++ b/LinkCrawler/LinkCrawler/LinkCrawler.csproj @@ -77,6 +77,7 @@ + diff --git a/LinkCrawler/LinkCrawler/Models/LinkModel.cs b/LinkCrawler/LinkCrawler/Models/LinkModel.cs new file mode 100644 index 0000000..baa5a13 --- /dev/null +++ b/LinkCrawler/LinkCrawler/Models/LinkModel.cs @@ -0,0 +1,35 @@ +using System; +using System.Collections.Generic; +using System.Linq; +using System.Text; +using System.Threading.Tasks; + +namespace LinkCrawler.Models +{ + public class LinkModel + { + public string Address { get; private set; } + public bool CheckingFinished { get; private set; } + private int _StatusCode; + + public int StatusCode + { + get + { + return _StatusCode; + } + set + { + _StatusCode = value; + this.CheckingFinished = true; + } + } + + public LinkModel (string Address) + { + this.Address = Address; + this.CheckingFinished = false; + } + + } +} From fec6c518ae11e6969abb17fa20334b188e644550 Mon Sep 17 00:00:00 2001 From: tdwright Date: Mon, 10 Oct 2016 14:36:53 +0100 Subject: [PATCH 15/29] Tweaked outputs to allow for multiline info --- LinkCrawler/LinkCrawler/Utils/Outputs/ConsoleOutput.cs | 6 +++--- LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs | 2 +- LinkCrawler/LinkCrawler/Utils/Outputs/IOutput.cs | 2 +- LinkCrawler/LinkCrawler/Utils/Outputs/SlackOutput.cs | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/LinkCrawler/LinkCrawler/Utils/Outputs/ConsoleOutput.cs b/LinkCrawler/LinkCrawler/Utils/Outputs/ConsoleOutput.cs index ccdef3a..ece38a6 100644 --- a/LinkCrawler/LinkCrawler/Utils/Outputs/ConsoleOutput.cs +++ b/LinkCrawler/LinkCrawler/Utils/Outputs/ConsoleOutput.cs @@ -13,12 +13,12 @@ public void WriteError(IResponseModel responseModel) public void WriteInfo(IResponseModel responseModel) { - WriteInfo(responseModel.ToString()); + WriteInfo(new string[] { responseModel.ToString() }); } - public void WriteInfo(String Info) + public void WriteInfo(String[] Info) { - Console.WriteLine(Info); + foreach(string line in Info) Console.WriteLine(line); } } } diff --git a/LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs b/LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs index 68fc7a1..2c1d695 100644 --- a/LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs +++ b/LinkCrawler/LinkCrawler/Utils/Outputs/CsvOutput.cs @@ -40,7 +40,7 @@ public void WriteInfo(IResponseModel responseModel) Write(responseModel); } - public void WriteInfo(String Info) + public void WriteInfo(String[] Info) { // Do nothing - string info is only for console } diff --git a/LinkCrawler/LinkCrawler/Utils/Outputs/IOutput.cs b/LinkCrawler/LinkCrawler/Utils/Outputs/IOutput.cs index 888fb2a..4dcd64e 100644 --- a/LinkCrawler/LinkCrawler/Utils/Outputs/IOutput.cs +++ b/LinkCrawler/LinkCrawler/Utils/Outputs/IOutput.cs @@ -6,6 +6,6 @@ public interface IOutput { void WriteError(IResponseModel responseModel); void WriteInfo(IResponseModel responseModel); - void WriteInfo(string InfoString); + void WriteInfo(string[] InfoString); } } diff --git a/LinkCrawler/LinkCrawler/Utils/Outputs/SlackOutput.cs b/LinkCrawler/LinkCrawler/Utils/Outputs/SlackOutput.cs index 4269ed7..ecb2287 100644 --- a/LinkCrawler/LinkCrawler/Utils/Outputs/SlackOutput.cs +++ b/LinkCrawler/LinkCrawler/Utils/Outputs/SlackOutput.cs @@ -22,7 +22,7 @@ public void WriteInfo(IResponseModel responseModel) // Write nothing to Slack } - public void WriteInfo(string Info) + public void WriteInfo(string[] Info) { // Write nothing to Slack } From 04ef217f65151fa13a55097409df6b7818d469a7 Mon Sep 17 00:00:00 2001 From: tdwright Date: Mon, 10 Oct 2016 15:22:04 +0100 Subject: [PATCH 16/29] Now outputs a summary table when finished --- LinkCrawler/LinkCrawler/LinkCrawler.cs | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/LinkCrawler/LinkCrawler/LinkCrawler.cs b/LinkCrawler/LinkCrawler/LinkCrawler.cs index 6fd0691..0af1364 100644 --- a/LinkCrawler/LinkCrawler/LinkCrawler.cs +++ b/LinkCrawler/LinkCrawler/LinkCrawler.cs @@ -110,7 +110,7 @@ private void CheckIfFinal(IResponseModel responseModel) } // Then check to see whether there are any pending links left to check - if(UrlList.Where(l => l.CheckingFinished == false).Count() == 0) + if ((UrlList.Count > 1) && (UrlList.Where(l => l.CheckingFinished == false).Count() == 0)) { FinaliseSession(); } @@ -121,11 +121,24 @@ private void FinaliseSession() this.timer.Stop(); if (this._settings.PrintSummary) { - string message = @" -Processing completed in " + this.timer.ElapsedMilliseconds.ToString() + "ms"; + List messages = new List(); + messages.Add(""); // add blank line to differentiate summary from main output + + messages.Add("Processing complete. Checked " + UrlList.Count() + " links in " + this.timer.ElapsedMilliseconds.ToString() + "ms"); + + messages.Add(""); + messages.Add(" Status | # Links"); + messages.Add(" -------+--------"); + + IEnumerable> StatusSummary = UrlList.GroupBy(link => link.StatusCode, link => link.Address); + foreach(IGrouping statusGroup in StatusSummary) + { + messages.Add(String.Format(" {0} | {1,5}", statusGroup.Key, statusGroup.Count())); + } + foreach (var output in Outputs) { - output.WriteInfo(message); + output.WriteInfo(messages.ToArray()); } } } From 78f84b594ad1e5597564018e2910759153af5d9e Mon Sep 17 00:00:00 2001 From: tdwright Date: Mon, 10 Oct 2016 15:43:13 +0100 Subject: [PATCH 17/29] Added locks to the LinkModel list to eliminate issues caused by concurrent access. --- LinkCrawler/LinkCrawler/LinkCrawler.cs | 29 ++++++++++++++++---------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/LinkCrawler/LinkCrawler/LinkCrawler.cs b/LinkCrawler/LinkCrawler/LinkCrawler.cs index 0af1364..ed947be 100644 --- a/LinkCrawler/LinkCrawler/LinkCrawler.cs +++ b/LinkCrawler/LinkCrawler/LinkCrawler.cs @@ -73,10 +73,13 @@ public void CrawlForLinksInResponse(IResponseModel responseModel) foreach (var url in linksFoundInMarkup) { - if (UrlList.Where(l => l.Address == url).Count() > 0) - continue; + lock (UrlList) + { + if (UrlList.Where(l => l.Address == url).Count() > 0) + continue; - UrlList.Add(new LinkModel(url)); + UrlList.Add(new LinkModel(url)); + } SendRequest(url, responseModel.RequestedUrl); } } @@ -103,16 +106,20 @@ public void WriteOutput(IResponseModel responseModel) private void CheckIfFinal(IResponseModel responseModel) { - // First set the status code for the completed link (this will set "CheckingFinished" to true) - foreach (LinkModel lm in UrlList.Where(l => l.Address == responseModel.RequestedUrl)) + lock (UrlList) { - lm.StatusCode = responseModel.StatusCodeNumber; - } - // Then check to see whether there are any pending links left to check - if ((UrlList.Count > 1) && (UrlList.Where(l => l.CheckingFinished == false).Count() == 0)) - { - FinaliseSession(); + // First set the status code for the completed link (this will set "CheckingFinished" to true) + foreach (LinkModel lm in UrlList.Where(l => l.Address == responseModel.RequestedUrl)) + { + lm.StatusCode = responseModel.StatusCodeNumber; + } + + // Then check to see whether there are any pending links left to check + if ((UrlList.Count > 1) && (UrlList.Where(l => l.CheckingFinished == false).Count() == 0)) + { + FinaliseSession(); + } } } From 628bdc391bc88870ab9f492dd654915c94c5c8cf Mon Sep 17 00:00:00 2001 From: Tom Wright Date: Mon, 10 Oct 2016 15:47:11 +0100 Subject: [PATCH 18/29] Added PrintSummary setting to README --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index e494c31..9748360 100644 --- a/README.md +++ b/README.md @@ -31,7 +31,8 @@ AppVeyor is used as CI, so when code is pushed to this repo the solution will ge | ```Slack.WebHook.Bot.MessageFormat``` | String format message that will be sent to slack | | ```Csv.FilePath``` | File path for the CSV file | | ```Csv.Overwrite``` | Whether to overwrite or append (if file exists) | -| ```Csv.Delimiter ``` | Delimiter between columns in the CSV file (like ',' or ';') | +| ```Csv.Delimiter``` | Delimiter between columns in the CSV file (like ',' or ';') | +| ```PrintSummary``` | If true, a summary will be printed when all links have been checked. | Ther also is a `````` that controls what output should be used. From 7c729a30bb9c112974ba2c78ab26e24133f69b2b Mon Sep 17 00:00:00 2001 From: Stratos Kourtzanidis Date: Sat, 6 May 2017 03:21:54 +0300 Subject: [PATCH 19/29] Accept url as commnd line argument --- LinkCrawler/LinkCrawler/Program.cs | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/LinkCrawler/LinkCrawler/Program.cs b/LinkCrawler/LinkCrawler/Program.cs index 824e4af..8ce7c3e 100644 --- a/LinkCrawler/LinkCrawler/Program.cs +++ b/LinkCrawler/LinkCrawler/Program.cs @@ -1,6 +1,8 @@ using LinkCrawler.Utils; using StructureMap; using System; +using LinkCrawler.Utils.Parsers; +using LinkCrawler.Utils.Settings; namespace LinkCrawler { @@ -8,9 +10,18 @@ class Program { static void Main(string[] args) { + using (var container = Container.For()) { var linkCrawler = container.GetInstance(); + if (args != null) + { + string parsed; + var validUrlParser = new ValidUrlParser(new Settings()); + var result = validUrlParser.Parse(args[0], out parsed); + if(result) + linkCrawler.BaseUrl = parsed; + } linkCrawler.Start(); Console.Read(); } From 44d9cb7b4246df4b98850b0095a3ef6fefbd15cd Mon Sep 17 00:00:00 2001 From: Stratos K Date: Tue, 9 May 2017 10:02:24 +0300 Subject: [PATCH 20/29] Update program.cs change arguments exist check --- LinkCrawler/LinkCrawler/Program.cs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/LinkCrawler/LinkCrawler/Program.cs b/LinkCrawler/LinkCrawler/Program.cs index 8ce7c3e..e351285 100644 --- a/LinkCrawler/LinkCrawler/Program.cs +++ b/LinkCrawler/LinkCrawler/Program.cs @@ -14,7 +14,7 @@ static void Main(string[] args) using (var container = Container.For()) { var linkCrawler = container.GetInstance(); - if (args != null) + if (args.Length >0) { string parsed; var validUrlParser = new ValidUrlParser(new Settings()); From 021aa9d60f28ac1e37ba79fef57654af2f1d235d Mon Sep 17 00:00:00 2001 From: David Tolan Date: Tue, 2 Oct 2018 22:03:21 -0500 Subject: [PATCH 21/29] Added tests for StringExtension.TrimEnd --- .../ExtensionsTests/StringExtensionsTests.cs | 44 +++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/LinkCrawler/LinkCrawler.Tests/UtilsTests/ExtensionsTests/StringExtensionsTests.cs b/LinkCrawler/LinkCrawler.Tests/UtilsTests/ExtensionsTests/StringExtensionsTests.cs index 392bec1..fecd706 100644 --- a/LinkCrawler/LinkCrawler.Tests/UtilsTests/ExtensionsTests/StringExtensionsTests.cs +++ b/LinkCrawler/LinkCrawler.Tests/UtilsTests/ExtensionsTests/StringExtensionsTests.cs @@ -118,5 +118,49 @@ public void StartsWithIgnoreCase_DifferentLetterAndSameCase_True() var result = word.StartsWithIgnoreCase(letter); Assert.AreEqual(false, result); } + + [Test] + public void TrimEnd_InputNull_Null() + { + string input = null; + string expected = null; + + var actual = input.TrimEnd(""); + + Assert.AreEqual(expected, actual); + } + + [Test] + public void TrimEnd_InputEndsWithSuffix_RemovesSuffix() + { + string input = "friend"; + string expected = "fri"; + + var actual = input.TrimEnd("end"); + + Assert.AreEqual(expected, actual); + } + + [Test] + public void TrimEnd_InputEndsWithSuffixDifferentCase_ReturnsOriginal() + { + string input = "friEND"; + string expected = "friEND"; + + var actual = input.TrimEnd("end"); + + Assert.AreEqual(expected, actual); + } + + [Test] + public void TrimEnd_InputEndsWithSuffixDifferentCase_ReturnsEmptyString() + { + string input = "friend"; + string expected = string.Empty; + + var actual = input.TrimEnd("friend"); + + Assert.AreEqual(expected, actual); + } } } From 6dcdfdb0269e36c9cd5e4a1c1a990b9d21891218 Mon Sep 17 00:00:00 2001 From: David Tolan Date: Tue, 2 Oct 2018 22:08:20 -0500 Subject: [PATCH 22/29] Added ParsersTests folder and moved ValidUrlParserTests.cs into it to match folder structure in main project --- .../HelpersTests/ValidUrlParserTests.cs | 50 ------------------- 1 file changed, 50 deletions(-) delete mode 100644 LinkCrawler/LinkCrawler.Tests/UtilsTests/HelpersTests/ValidUrlParserTests.cs diff --git a/LinkCrawler/LinkCrawler.Tests/UtilsTests/HelpersTests/ValidUrlParserTests.cs b/LinkCrawler/LinkCrawler.Tests/UtilsTests/HelpersTests/ValidUrlParserTests.cs deleted file mode 100644 index b19c52f..0000000 --- a/LinkCrawler/LinkCrawler.Tests/UtilsTests/HelpersTests/ValidUrlParserTests.cs +++ /dev/null @@ -1,50 +0,0 @@ -using LinkCrawler.Utils.Parsers; -using LinkCrawler.Utils.Settings; -using NUnit.Framework; - -namespace LinkCrawler.Tests.UtilsTests.HelpersTests -{ - [TestFixture] - public class ValidUrlParserTests - { - public ValidUrlParser ValidUrlParser { get; set; } - [SetUp] - public void SetUp() - { - ValidUrlParser = new ValidUrlParser(new Settings()); - } - - [Test] - public void Parse_CompleteValidUrl_True() - { - var url = "http://www.github.com"; - string parsed; - var result = ValidUrlParser.Parse(url, out parsed); - Assert.That(result, Is.True); - Assert.That(parsed, Is.EqualTo(url)); - } - - [Test] - public void Parse_UrlNoScheme_True() - { - var url = "//www.github.com"; - string parsed; - var result = ValidUrlParser.Parse(url, out parsed); - Assert.That(result, Is.True); - var validUrl = "http:" + url; - Assert.That(parsed, Is.EqualTo(validUrl)); - } - - [Test] - public void Parse_UrlOnlyRelativePath_True() - { - var relativeUrl = "/relative/path"; - string parsed; - var result = ValidUrlParser.Parse(relativeUrl, out parsed); - Assert.That(result, Is.True); - var validUrl = string.Format("{0}{1}",ValidUrlParser.BaseUrl, relativeUrl); - - Assert.That(parsed, Is.EqualTo(validUrl)); - } - } -} From 3813000f06eaf4eff03d1f95d76a0216c4ba88b3 Mon Sep 17 00:00:00 2001 From: David Tolan Date: Tue, 2 Oct 2018 22:08:59 -0500 Subject: [PATCH 23/29] Moved ValidUrlParserTests.cs to ParsersTests folder --- .../LinkCrawler.Tests.csproj | 5 +- .../ParsersTests/ValidUrlParserTests.cs | 50 +++++++++++++++++++ 2 files changed, 54 insertions(+), 1 deletion(-) create mode 100644 LinkCrawler/LinkCrawler.Tests/UtilsTests/ParsersTests/ValidUrlParserTests.cs diff --git a/LinkCrawler/LinkCrawler.Tests/LinkCrawler.Tests.csproj b/LinkCrawler/LinkCrawler.Tests/LinkCrawler.Tests.csproj index ecbef52..8919198 100644 --- a/LinkCrawler/LinkCrawler.Tests/LinkCrawler.Tests.csproj +++ b/LinkCrawler/LinkCrawler.Tests/LinkCrawler.Tests.csproj @@ -53,7 +53,7 @@ - + @@ -70,6 +70,9 @@ LinkCrawler + + + + + + + + + + + \ No newline at end of file diff --git a/LinkCrawler/LinkCrawler/Utils/Outputs/SlackOutput.cs b/LinkCrawler/LinkCrawler/Utils/Outputs/SlackOutput.cs index ecb2287..143c05f 100644 --- a/LinkCrawler/LinkCrawler/Utils/Outputs/SlackOutput.cs +++ b/LinkCrawler/LinkCrawler/Utils/Outputs/SlackOutput.cs @@ -14,7 +14,7 @@ public SlackOutput(ISlackClient slackClient) public void WriteError(IResponseModel responseModel) { - _slackClient.NotifySlack(responseModel); + _slackClient.NotifySlack(responseModel); } public void WriteInfo(IResponseModel responseModel) From 1a8786c73725c2c3d6e22c2ae50e7065418633b6 Mon Sep 17 00:00:00 2001 From: Emery Weist Date: Wed, 23 Oct 2019 11:10:54 -0400 Subject: [PATCH 25/29] Code cleanup and variable name consistency. --- LinkCrawler/LinkCrawler/LinkCrawler.cs | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/LinkCrawler/LinkCrawler/LinkCrawler.cs b/LinkCrawler/LinkCrawler/LinkCrawler.cs index ed947be..5e3378a 100644 --- a/LinkCrawler/LinkCrawler/LinkCrawler.cs +++ b/LinkCrawler/LinkCrawler/LinkCrawler.cs @@ -22,7 +22,7 @@ public class LinkCrawler public bool OnlyReportBrokenLinksToOutput { get; set; } public static List UrlList; private ISettings _settings; - private Stopwatch timer; + private Stopwatch _timer; public LinkCrawler(IEnumerable outputs, IValidUrlParser validUrlParser, ISettings settings) { @@ -34,12 +34,12 @@ public LinkCrawler(IEnumerable outputs, IValidUrlParser validUrlParser, RestRequest = new RestRequest(Method.GET).SetHeader("Accept", "*/*"); OnlyReportBrokenLinksToOutput = settings.OnlyReportBrokenLinksToOutput; _settings = settings; - this.timer = new Stopwatch(); + _timer = new Stopwatch(); } public void Start() { - this.timer.Start(); + _timer.Start(); UrlList.Add(new LinkModel(BaseUrl)); SendRequest(BaseUrl); } @@ -125,13 +125,13 @@ private void CheckIfFinal(IResponseModel responseModel) private void FinaliseSession() { - this.timer.Stop(); - if (this._settings.PrintSummary) + _timer.Stop(); + if (_settings.PrintSummary) { List messages = new List(); messages.Add(""); // add blank line to differentiate summary from main output - messages.Add("Processing complete. Checked " + UrlList.Count() + " links in " + this.timer.ElapsedMilliseconds.ToString() + "ms"); + messages.Add("Processing complete. Checked " + UrlList.Count() + " links in " + _timer.ElapsedMilliseconds.ToString() + "ms"); messages.Add(""); messages.Add(" Status | # Links"); From 9cc30ba868cc9ffc386e9d3c9c89ee9ba5dcf6de Mon Sep 17 00:00:00 2001 From: Emery Weist Date: Wed, 23 Oct 2019 11:17:51 -0400 Subject: [PATCH 26/29] More consistent style and cleanup --- LinkCrawler/LinkCrawler/Models/LinkModel.cs | 14 +++++++------- LinkCrawler/LinkCrawler/Program.cs | 2 +- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/LinkCrawler/LinkCrawler/Models/LinkModel.cs b/LinkCrawler/LinkCrawler/Models/LinkModel.cs index baa5a13..2f246fd 100644 --- a/LinkCrawler/LinkCrawler/Models/LinkModel.cs +++ b/LinkCrawler/LinkCrawler/Models/LinkModel.cs @@ -10,25 +10,25 @@ public class LinkModel { public string Address { get; private set; } public bool CheckingFinished { get; private set; } - private int _StatusCode; + private int _statusCode; public int StatusCode { get { - return _StatusCode; + return _statusCode; } set { - _StatusCode = value; - this.CheckingFinished = true; + _statusCode = value; + CheckingFinished = true; } } - public LinkModel (string Address) + public LinkModel (string address) { - this.Address = Address; - this.CheckingFinished = false; + Address = address; + CheckingFinished = false; } } diff --git a/LinkCrawler/LinkCrawler/Program.cs b/LinkCrawler/LinkCrawler/Program.cs index e351285..1c4ca96 100644 --- a/LinkCrawler/LinkCrawler/Program.cs +++ b/LinkCrawler/LinkCrawler/Program.cs @@ -14,7 +14,7 @@ static void Main(string[] args) using (var container = Container.For()) { var linkCrawler = container.GetInstance(); - if (args.Length >0) + if (args.Length > 0) { string parsed; var validUrlParser = new ValidUrlParser(new Settings()); From 1c3709c251fd61a043e047833a28504aa4c786e1 Mon Sep 17 00:00:00 2001 From: henrik molnes Date: Wed, 29 Jul 2020 09:16:53 +0200 Subject: [PATCH 27/29] Update README.md --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index 9748360..65d28a0 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,7 @@ # LinkCrawler Simple C# console application that will crawl the given webpage for broken image-tags and hyperlinks. The result of this will be written to output. Right now we have these outputs: console, csv, slack. -Example run with console output: -![Example run with console output](http://henrikm.com/content/images/2016/May/linkcrawler.gif "Example run with console output") + ## Why? Because it could be useful to know when a webpage you have responsibility for displays broken links to it's users. I have this running continuously, but you don't have to. For instance, after upgrading your CMS, changing database-scheme, migrating content etc, it can be relevant to know if this did or did not not introduce broken links. Just run this tool one time and you will know exactly how many links are broken, where they link to, and where they are located. From e03559dc3793b6d273067440fd30754644f6a004 Mon Sep 17 00:00:00 2001 From: Jose Montanez Date: Fri, 4 Nov 2022 16:16:15 -0600 Subject: [PATCH 28/29] Unit test issue 44 Created a unit test to check valid URL pattern. --- .../ExtensionsTests/RegexExtensionsTest.cs | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+) create mode 100644 LinkCrawler/LinkCrawler.Tests/UtilsTests/ExtensionsTests/RegexExtensionsTest.cs diff --git a/LinkCrawler/LinkCrawler.Tests/UtilsTests/ExtensionsTests/RegexExtensionsTest.cs b/LinkCrawler/LinkCrawler.Tests/UtilsTests/ExtensionsTests/RegexExtensionsTest.cs new file mode 100644 index 0000000..e9ffd41 --- /dev/null +++ b/LinkCrawler/LinkCrawler.Tests/UtilsTests/ExtensionsTests/RegexExtensionsTest.cs @@ -0,0 +1,23 @@ +using LinkCrawler.Utils.Extensions; +using NUnit.Framework; +using System; +using System.Collections.Generic; +using System.Linq; +using System.Text; +using System.Threading.Tasks; + +namespace LinkCrawler.Tests.UtilsTests.ExtensionsTests +{ + [TestFixture] + public class RegexExtensionsTest + { + public void IsNotMatch_Should_Return_False() + { + + string regex = "(^http[s]?:\\/{2})|(^www)|(^\\/{1,2})"; + string url = "website.com:///podcast/"; + bool expression = RegexExtensions.IsNotMatch(new System.Text.RegularExpressions.Regex(regex), url); + Assert.IsFalse(expression); + } + } +} From cdf05b3e3808bdcfb969ec0df3e6d80a9a29b4ae Mon Sep 17 00:00:00 2001 From: Jose Montanez Date: Mon, 14 Nov 2022 18:50:04 -0700 Subject: [PATCH 29/29] DotNet7 - Added a new folder that contains: - A new Link Crawler Solution in .Net 7 - A Unit test --- .../LinkCrawler.Test/LinkCrawler.Test.csproj | 24 ++++++++++++++++++ .../LinkCrawler/LinkCrawler.Test/UnitTest1.cs | 11 ++++++++ .../LinkCrawler/LinkCrawler.Test/Usings.cs | 1 + .../LinkCrawler/LinkCrawler.sln | 25 +++++++++++++++++++ .../LinkCrawler/LinkCrawler.csproj | 10 ++++++++ .../LinkCrawler/LinkCrawler/Program.cs | 2 ++ 6 files changed, 73 insertions(+) create mode 100644 LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler.Test/LinkCrawler.Test.csproj create mode 100644 LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler.Test/UnitTest1.cs create mode 100644 LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler.Test/Usings.cs create mode 100644 LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler.sln create mode 100644 LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler/LinkCrawler.csproj create mode 100644 LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler/Program.cs diff --git a/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler.Test/LinkCrawler.Test.csproj b/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler.Test/LinkCrawler.Test.csproj new file mode 100644 index 0000000..9928ef5 --- /dev/null +++ b/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler.Test/LinkCrawler.Test.csproj @@ -0,0 +1,24 @@ + + + + net7.0 + enable + enable + + false + + + + + + + runtime; build; native; contentfiles; analyzers; buildtransitive + all + + + runtime; build; native; contentfiles; analyzers; buildtransitive + all + + + + diff --git a/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler.Test/UnitTest1.cs b/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler.Test/UnitTest1.cs new file mode 100644 index 0000000..c35856a --- /dev/null +++ b/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler.Test/UnitTest1.cs @@ -0,0 +1,11 @@ +namespace LinkCrawler.Test +{ + public class UnitTest1 + { + [Fact] + public void Test1() + { + + } + } +} \ No newline at end of file diff --git a/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler.Test/Usings.cs b/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler.Test/Usings.cs new file mode 100644 index 0000000..8c927eb --- /dev/null +++ b/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler.Test/Usings.cs @@ -0,0 +1 @@ +global using Xunit; \ No newline at end of file diff --git a/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler.sln b/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler.sln new file mode 100644 index 0000000..be9dde9 --- /dev/null +++ b/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler.sln @@ -0,0 +1,25 @@ + +Microsoft Visual Studio Solution File, Format Version 12.00 +# Visual Studio Version 17 +VisualStudioVersion = 17.4.33103.184 +MinimumVisualStudioVersion = 10.0.40219.1 +Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "LinkCrawler", "LinkCrawler\LinkCrawler.csproj", "{DD0C6451-84A7-4DA6-8BF5-32EB97BE32FE}" +EndProject +Global + GlobalSection(SolutionConfigurationPlatforms) = preSolution + Debug|Any CPU = Debug|Any CPU + Release|Any CPU = Release|Any CPU + EndGlobalSection + GlobalSection(ProjectConfigurationPlatforms) = postSolution + {DD0C6451-84A7-4DA6-8BF5-32EB97BE32FE}.Debug|Any CPU.ActiveCfg = Debug|Any CPU + {DD0C6451-84A7-4DA6-8BF5-32EB97BE32FE}.Debug|Any CPU.Build.0 = Debug|Any CPU + {DD0C6451-84A7-4DA6-8BF5-32EB97BE32FE}.Release|Any CPU.ActiveCfg = Release|Any CPU + {DD0C6451-84A7-4DA6-8BF5-32EB97BE32FE}.Release|Any CPU.Build.0 = Release|Any CPU + EndGlobalSection + GlobalSection(SolutionProperties) = preSolution + HideSolutionNode = FALSE + EndGlobalSection + GlobalSection(ExtensibilityGlobals) = postSolution + SolutionGuid = {0815F252-9C5A-42C9-A1CC-743850B55836} + EndGlobalSection +EndGlobal diff --git a/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler/LinkCrawler.csproj b/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler/LinkCrawler.csproj new file mode 100644 index 0000000..f02677b --- /dev/null +++ b/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler/LinkCrawler.csproj @@ -0,0 +1,10 @@ + + + + Exe + net7.0 + enable + enable + + + diff --git a/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler/Program.cs b/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler/Program.cs new file mode 100644 index 0000000..3751555 --- /dev/null +++ b/LinkCrawler_DotNet_7/LinkCrawler/LinkCrawler/Program.cs @@ -0,0 +1,2 @@ +// See https://aka.ms/new-console-template for more information +Console.WriteLine("Hello, World!");