Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we change robots.txt at https://library.kiwix.org #232

Open
kelson42 opened this issue Aug 11, 2024 · 4 comments
Open

Should we change robots.txt at https://library.kiwix.org #232

kelson42 opened this issue Aug 11, 2024 · 4 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@kelson42
Copy link
Contributor

curl https://library.kiwix.org/robots.txt
User-agent: *
Disallow: /

This forbids everything and this maybe not the best thing to do do advert our library?!

@kelson42 kelson42 added enhancement New feature or request question Further information is requested labels Aug 11, 2024
@rgaudin
Copy link
Member

rgaudin commented Aug 11, 2024

Do we want to bring search engine's attention to the library? It's basically a copy of multiple other sources (known to be disliked by search engines), it doesn't bring people to Kiwix because there's no mention of Kiwix there ; nor the readers or the format or anything. It also pollutes search engines with outdated data and finally it increases load on our machine for traffic we're not interested in.
As far as I can remember, we've always want to avoid this (See kiwix/container-images#13). What's changed?

@kelson42
Copy link
Contributor Author

kelson42 commented Aug 11, 2024

A this stage, IMO, the catalog part of library.kiwix.org should crawled, but not the demo part.

@rgaudin
Copy link
Member

rgaudin commented Aug 12, 2024

What do you mean by catalog part? The homepage or /catalog?

If /catalog:

  • it's an API and there are not links
  • Crawlers would thus come from existing links
  • I doubt they would navigate the OPDS API.
  • I doubt they would index OPDS content
  • If they did, I doubt they'd serve OPDS links on their search results

Anyway, why would we drive people towards library.kiwix.org if they are not told where they are, what Kiwix is, etc?

@Popolechien
Copy link
Member

Popolechien commented Aug 12, 2024

I know for a fact that the zim files get crawled already as we regularly receive spam-like emails that are pretty much always like

Hi,
I noticed that a broken link appears on this page: http://library.kiwix.org/wikipedia_en_computer_2017-04/A/Android_(operating_system).html
Link text "Global mobile statistics 2014 Part A: Mobile subscribers; handset market share; mobile operators" The screenshot is attached below, points to https://mobiforge.com/mobile-marketing-tools/latest-mobile-stats which is not alive anymore.
Also, you may consider replacing the broken link with this exact updated resource which points back to the right pages within that website: https://vivipins.com/mobile-marketing-statistics/
Let me know if there’s anything else I can help you with!

They are basically trying to replace a random link with theirs, I guess as a form of SEO optimization. They never seem to realize that they're pointing at a Wikipedia page and the text barely ever changes so I'm suspecting a fully automated operation.

Long story short we could do without these, and I see no material advantage for us to drive traffic to content pages (as opposite to letting people access the source material or driving folks to the more generic library.kiwix.org landing page)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants