-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create sitemap publishing and versioning control #193
Comments
Have a working version of the sitemap index generation that reflects the last time an update was made to the source of the urlset (csv file or xml files). Am planning on modifying this to run as a standalone GitHub Action that can be implemented in pids.geoconnex.us. Thinking about generating sitemaps for regex: and explicit truth of a regex namespace is that there are a lot of features. Having a URL to download a file where the corresponding sitemap can be generated becomes a problem as the PID list grows. I suggest we invest more effort in the tooling to generate sitemaps indexes and urlsets from arbitrary source (as I am planning to implement in the GitHub action) - to promote contributors to generate and include their urlset files to reduce the exponential growth this entails. Having a mechanism to only regenerate sitemaps that have a change instead of all sitemaps anytime ANY namespace changes. |
As per meeting, proposed strategy: Use github Action/ pygeoapi container to generate sitemaps from a) zipped csv template PR'd directly to github or The User's decision tree is:
|
harvest.geoconnex.us ideally will automatically recrawl all new or modified resources added to the PID registry.
harvest.geoconnex.us uses sitemap.xml to crawl resources
Therefore, we need a away to process diffs between sitemap.xml according to PID additions or other triggered recrawls by data contributors.
Suggestion: add releases of zipped sitemap_XXX.xml files, so that harvest.geoconnex.us can download last release, to compare with the contents of sitemap_XXX.xml directed to by live sitemap index https://geoconnex.us/sitemap.xml
Suggestion: change how sitemap.xml is generated so that lastmod reflects the true last filechange datetime by csv file in /namespaces
The text was updated successfully, but these errors were encountered: