Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the Database Search API #2668

Closed
Tracked by #3358
ndegwamartin opened this issue Sep 4, 2024 · 6 comments · Fixed by #2669 or #2715
Closed
Tracked by #3358

Optimize the Database Search API #2668

ndegwamartin opened this issue Sep 4, 2024 · 6 comments · Fixed by #2669 or #2715
Assignees

Comments

@ndegwamartin
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Did some profiling on this DatabaseImpl method here that has the below code:

  override suspend fun <R : Resource> search(
    query: SearchQuery,
  ): List<ResourceWithUUID<R>> {
    return db.withTransaction {
      resourceDao.getResources(SimpleSQLiteQuery(query.query, query.args.toTypedArray())).map {
        ResourceWithUUID(it.uuid, iParser.parseResource(it.serializedResource) as R)
      }
    }
  }

The mapping block takes a lot of time since each serialized resource has to be deserialized sequentially for every element in the list returned from the database. The time taken is linear and the more the db results returned the longer the API takes to return a result.

Describe the solution you'd like
It should be possible to optimize this block by introducing a parallelized implementation e.g. Using coroutines and async within each iteration and then collecting the results and get a significant improvement.

Describe alternatives you've considered
Instead of letting the async launch with the current dispatcher(probably IO at that point) we could instead switch to the Default dispatcher since it is a computationally expensive task. This will however require us to create a new JSONParser object per iteration since it is not thread safe.

Additional context
Add any other context or screenshots about the feature request here.

Would you like to work on the issue?
Yeah

@ndegwamartin
Copy link
Collaborator Author

The DatabaseImpl.searchForwardReferencedResources and DatabaseImpl.searchReverseReferencedResources functions can also benefit from the same optimization and should be included.

@FikriMilano
Copy link
Collaborator

FikriMilano commented Sep 6, 2024

Very excited to have this merged in the future :D
I'll review once a PR is ready.

@joiskash
Copy link
Collaborator

joiskash commented Sep 17, 2024

This seems like an interesting approach. What are the memory implications of creating a new Parser per iteration?
Regarding the Dispatcher, I feel that this should still be launched by the IO dispatcher since the core functionality is an IO database read operation. There are also some interesting implications of using this with runBlocking
as mentioned in this blog https://jivimberg.io/blog/2018/05/04/parallel-map-in-kotlin/

By default runBlocking uses a dispatcher that is confined to the invoker thread. Which means we are forcefully blocking the thread until the execution of pmap finishes, instead of letting the caller decide how the execution should go.

I have a few questions related to this issue that are related to the use case:

  1. How many resources are we talking about here? 100s, 1000s or more?
  2. Do these resources only contain text or does this also contain binary data like base64 encoded images?
  3. Is there any scope for chunking instead of fetching all resources?

@ndegwamartin
Copy link
Collaborator Author

@joiskash I've added the results from the benchmarking to the PR - see link here

The resources do not contain binary data.

We can investigate the chunking approach, however since the objective is to reduce the performance hit that occurs when mapping from serialized json to the corresponding FHIR Resource object we might not get an improvement that way.

@ndegwamartin
Copy link
Collaborator Author

I suppose though if the idea is to use chunking to change the overall approach in terms of improving the UX then yeah, that works okay. We have implemented batching on our record fetches (10 records) by using pagination for registers and infinite scrolling e.g. for searches.

@ndegwamartin
Copy link
Collaborator Author

Looks like creating a new JsonParser for each iteration should not be a concern after all (provided the parent class FHIR Context is already created) as mentioned on the comment here.

Performance Note: This method is cheap to call, and may be called once for every message being processed without incurring any performance penalty

I will go ahead and update the PR with this variant of the optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Complete
4 participants
@ndegwamartin @joiskash @FikriMilano and others