Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

up to date docs in context. #3

Open
scrungus opened this issue Jul 18, 2023 · 0 comments
Open

up to date docs in context. #3

scrungus opened this issue Jul 18, 2023 · 0 comments

Comments

@scrungus
Copy link

the bot needs to have up to date documentation for the technologies we use. A good one to start with would be Ceph - https://docs.ceph.com/en/quincy/.

there should be some way of scraping all this documentation and inserting it into the data source. to start with, we can just do it for releases (e.g. quincy for ceph, yoga/xena/zed for openstack). it can be a manual job - e.g. run script to scrape + insert into data source. for Ceph, it looks like we'd want to scrape everything under the https://docs.ceph.com/en/latest/dev/* directory. there is probably some prebuilt web scraping we can adapt - i.e. don't write a web scraper.

we will need to experiment with chunk size etc to see how the model responds to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant