Replies: 1 comment 2 replies
-
The easiest solution for you would be disabling Google scholar crawler during debug, and only re-enable it when pushing to GitHub. Other than that, the crawler should be set to run at frequent intervals or run manually to fetch new content if done your way. Maybe it would be more useful to create a GitHub action for that. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, thank the author and the contributing community members for this amazing template!
While I am locally debugging the website, each file change triggers a rebuild, and with every rebuild it queries Google Scholar to update the citations. When we have a lot of papers, and when the debugging work is heavy, it is very easy to exceed the limit and encounter the error code HTTP 429 "too many requests", which slows down the speed of local build, and more importantly it seems to stop GitHub Pages auto-deployment. We don't know how long code 429 persists, but it can take days.
Before I switched to al-folio, I was using academicpages for my personal website. I implemented in my old website a Google Scholar Crawler that caches Google Scholar info in another branch in the form of .json files. With this method, the building process is faster since it stores GS info in one go rather than querying individually -- also it does not trigger error code 429 "too many requests". This crawler was originally from jeckyll theme AcadHomepage and I managed to add a bit my improvements and got it working on academicpages. However, unfortunately, it does not work with al-folio. When it is added to al-folio, the local debug build freezes in the middle of nowhere and gives no error message.
My modified version of this decoupled Google Scholar Crawler is here. I wonder if this small project will be useful somehow for al-folio, since the user might already have their GS paper ID in the bibtex. All the improvement needs to do is to search the paper ID in the
gs_data.json
database of the crawler's sub-branch without sending queries too frequently to Google. This crawler is optimized for internet conditions in China that grabs info from a mirrored source.I am no expert in jekyll coding, and I currently cannot implement it alone. I am launching this discussion to provide a potential solution and see if you are interested in avoiding the error "too many requests" which can be very annoying.
Beta Was this translation helpful? Give feedback.
All reactions