-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spike: As tech lead, I need elastic re-indexing to be automated #2870
Comments
@jtimpe below are the metrics after releasing sprint 93 to staging yesterday:
Update: there are 6.6 million records in prod as of today. |
@jtimpe do you happen to know if the re-index command impacts the I'm noticing in staging (which is on sprint 93 release), that the latest submission isn't flagged as "newest" in DAC. I tried submitting the same file in ( |
Notes from testing in |
Potential spike - still working through it. |
Relabeled to spike. |
data lifecycle
other options
|
Moving into Raft Review this week but staying in blocked pending testing in staging |
PR for this is in QASP, will be able to test soon. |
Description:
As mentioned in #2820,
python manage.py search_index --rebuild
is needed to facilitate elastic re-indexing.This is a manual step currently that was needed to yield expected results in QASP review for parsing/validation tickets like #2825. This will also be a manual step before releasing code to
hhs:main
andhhs:master
.This is a step that would be better to automate.
Acceptance Criteria:
Create a list of functional outcomes that must be achieved to complete this issue
Tasks:
Create a list of granular, specific work items that must be completed to deliver the desired outcomes of this issue
python manage.py search_index --rebuild
with the--parallel
flag over the weekend to gauge--parallel
flag makes a differenceBulkIndexError
exception handling - should not roll-back un-indexed records, but instead write them to postgres and ignore the indexing stepNotes:
Possible approaches
apply-remote-migrations
- rebuild search indexes after backend deployment (viassh
orcf run-task
)BulkIndexException
to not roll back un-indexed recordsbulk_create
(after parsing completion for each file)django-elasticsearch-dsl
seems to use the_bulk
endpoint when making requests, we may need to furtherinvestigate how the library works and/or introduce some customization to tune it to our needs
django-elasticsearch-dsl
), then try to bulk index any un-indexed dataes-dev
in dev/staging andes-medium
in prod--parallel
option inpython manage.py search_index --rebuild
- https://django-elasticsearch-dsl.readthedocs.io/en/latest/management.html--use-alias
option inpython manage.py search_index --rebuild
Supporting Documentation:
Please include any relevant log snippets/files/screen shots
Open Questions:
Please include any questions or decisions that must be made before beginning work or to confidently call this issue complete
The text was updated successfully, but these errors were encountered: