Skip to content

Crawler Policy

Ken Krugler edited this page Feb 1, 2017 · 1 revision

What is the Flink Crawler?

If you are reading this, you’ve probably seen the Flink Crawler robot visiting your site when checking through your server logs. Our software obeys robots.txt files, the standard to allow webmasters to tell web robots what portion of a site is allowed for access.

Why are we crawling the web?

If you're only seeing us grab a couple of pages from your site occasionally, then chances are this is just us running some integration tests. However, if you see any significant load at all on your servers, then someone is probably using this open source project to crawl the web. If so, their user agent shouldn't be pointing you here. Please contact us so that we can help the responsible party figure out how to use this software appropriately.

Are you wasting my bandwidth?

Our crawler should use very little bandwidth to monitor your site. We are very aware that this may be a concern for site owners, and we make every effort to limit our page requests both by rate of request and total amount.

Ok, but what if I still want to exclude my site?

Our software obeys the robots.txt exclusion standard, described at http://www.robotstxt.org/wc/exclusion.html#robotstxt. To ban the Flink Crawler robot from visiting your entire website, place the following in your robots.txt file:

User-agent: flink-crawler
Disallow: /

To ban the flink-crawler from visiting just a portion of your website, adjust the URL in the Disallow line to that specific portion of the website.

Please don’t rely on meta tags like <META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”> – they will be ignored due to technology limitations.

Contact Us

If your site has any problems or you have further questions about what we do with the statistics collected, please contact us at [email protected]