Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alignments track performance optimization #969

Closed
rbuels opened this issue May 29, 2020 · 26 comments
Closed

alignments track performance optimization #969

rbuels opened this issue May 29, 2020 · 26 comments
Labels
enhancement New feature or request

Comments

@rbuels
Copy link
Contributor

rbuels commented May 29, 2020

Do a round of performance optimization on the Alignments track. Run some profiling, try to figure out where time is being spent, and try to improve the time it takes to render large alignments.

Main categories of stuff that takes time:

  • downloading the data
  • parsing the data using @gmod/bam-js or @gmod/cram-js
  • rendering in the worker onto an OffscreenCanvas
  • shipping the features, layout, and rendering to the main thread
  • deserializing the features, layout, and rendering on the main thread (mostly native code that we don't see)
  • hydrating the rendering on the main thread (ServerSideRenderedContent.js)

Might be worth it to run some profiling on igv.js to compare/contrast where it is spending its time vs. where JB2 is spending time.

Might also compare BAM vs CRAM, to see how they differ.

Main deliverable for this is to know where time is being spent and develop a prioritized list of optimizations that we should do.

@rbuels rbuels added the enhancement New feature or request label May 29, 2020
@rbuels
Copy link
Contributor Author

rbuels commented May 29, 2020

Possible avenues of performance improvement:

@cmdcolin
Copy link
Collaborator

cmdcolin commented Jun 1, 2020

At least one candidate might be optimizing mismatch calculations. Doesn't discount considering other methods but mismatch calculations do take a lot of profiling time

Screenshot from 2020-05-31 23-06-06

Note that this is from viewing left-heavy view of a profile.json from performance trace on chrome in https://speedscope.app/

@cmdcolin
Copy link
Collaborator

cmdcolin commented Jun 1, 2020

Note that you can also view different threads by clicking the top (see how it is looking at only the webworker thread where it says DedicatedWorker (3/6)

@cmdcolin
Copy link
Collaborator

cmdcolin commented Jun 30, 2020

For comparing JBrowse vs igv these links can be used, it puts the same file head to head

IGV link https://igv.org/app/?sessionURL=blob:rZNfa9swFMW_StHTBo5sxynBflxh68Oesow9jBBk.9rWqj_ulZw_DfnuvXLT0UDpui2BBKIrHen.zrkHhtAAgqmAFQcma1awrk1zFjEjNK2x20ELc_XhZnHTZfM41D5SsRHOi..Lr2G7970r4thlXGjxYI3YOl5ZHct2w0u0opbGeekHD9xiG7dgrAYXO7gf5cYfPgqSsDQ17C4uTL.SxKu9t6Uw9YX0g9wnkuN.59kxYspWg2PFT1Z1mBbpNEry2STNomyesVXEPIrqLtQPzO_7wJaUhhF9xCzWgKyY5EkyT_N8ej2bz5I8T8kH.62XxoSqxwGO0eHZGgRRu7XCtbsrMaMe16bVCicJn_JsrUXfQ81dr6T3gI6XQgfnpIL_Om9RC0.nn_4OqN5C.Ys4bt05v5HdeGX8Ty94T0Qufi99Q4KcHbCC5ZN9ASWtncwUSrZGg_Fhds49i1gHsu2IWpYkFEO7ARQtLEMgbk.VayoIpUDBZ4T7ZYfgOqtoHulNv.ORvXB_AQ3l5.oLGHAvfaGBpo7Dw_7gTYg58eFnfOh0UOTOoicGlGzePvzFXL5PkPsy0NxIJ0tJlPc_SN9uKf8UeARtN6IktkUjlCN6p.7TZPy8gvcs1K838JZzxlgvvLSG1rTYLSg6wRC6iTZSAsKp58G22jrqjnaOlFfH1fER

JBrowse 2 link http://localhost:3000/?config=test_data%2Fconfig.json&session=share-SMJRxmTDfB&password=K6ZqY

These links are good performance tests because the BAI can fit into cache, so after you get the BAI file in cache, then you can test some of the raw page speed just doing a page reload on each link

In JBrowse the raw time from page load to displaying the results is about 10-12s, with IGV it's about 3-4 seconds

If there is interest, more performance investigation could be done but these links help put it the pages head to head

@cmdcolin
Copy link
Collaborator

Loading long nanopore reads in BAM format can really can demonstrate slowdowns. There is large memory pressure and the GC takes up a significant portion of stack traces

Loading this took 120 seconds for me http://localhost:3000/?config=test_data%2Fconfig_demo.json&session=eJztVltv2jAU_ivIT5vEJYRLS95opxa0rkOFtdKmCZnkkHhL7Mx2uCr_fcdJBmFAtWnVnvpQ1Jzb9_n4fLa3hNMIiEPuYVkZg1JM8Ipt2VbN6tXs3sS2nVbX6bTqnYvmZ1IlEZU-48SxqsSTdAnyiXk6IE7rsl0lCwZLRZwvW8I8U7LVD-WjzYaY5zEVh3R9n4MFfrOHRr2Ozdcd40DlLXARwSOWQI-YzxXo0Yo4Xbvb7LbsyyqZxSOQxtTclQPvAXxknINKmBf1m1hCaSp1RhQ4srHbPbtjdW1MlrAAqQCNcxoqqBKqFESz8IBd-hX5Sep-Ly2Iw7s2Y1a02XPvh8znEXCtJiYYHQEwP0BgRKsSV_A58xNJNbI0FSgXsZAwTUIsHgruY8aIhZDEeb5TQDWH79tJLOGhtccqx-1xmtYxzvZ0SraeoXeSxzTOQqertVldMRSDW8uyK_dFcGUXXHlz1f_wtpKnkMMGmn7lLcQOulSDL-Ta2G6H_Stjox6NNcgSyysa9QsjbjON7oS7W0gimdkSrWPlNBqqVacR3QhOl6ruiqjxbSbFUkFdSL_hZxOkGga8kXGtGbI1wfU0UK0LrzONA4obX0cMklYJ4x6sDEj4HwDxj5EUURX8SIC70D_ug79h8Q3Fyd23Y24-zzfkLJ0sL_u3Pqd1f2PWO6fsBSrhLzPVkOxLVMMy2JbUyBK3Q6IyTbV8cB8K09FE7xyYN174N0B1gmI5Dj_hTDO08f3oWuA5QH04VJ60xOCTy31rL7yj2L362s-I70TaswpUPHaL-L-SYSnv37VYIr0fQpXMXlX7qtpj1f4u29L0nBLjCW-aNzcEV5vb3FgZ3ooOMWRVIJa_MoijZQK5LT8Ecou5qAPmwQCoZ9CKO31v-oj55mWyc2UiHGeQQk5yagFDEOkGzKUhyUEyzd7RGYSqjH2Nlz1I82YpChoCS-b5oLMWlCtNyki7E-Z8xO7EGZwLecqAiDl0wpDG2Stma4aEhchqAittWpc_xQ6eYKbN1NVsAU9_xPUZX5odeRw_cIiGHCcDdWHqpelPAnWShA

Probably partly network speed, but largely program time

Two sources of almost half the time are (program) and (garbage collector) take up 33% and 27% (35seconds and 30seconds respectively) of time

I don't know how much of that can be trimmed off but it pops out strongly in our profiling

Example trace attached, see https://www.speedscope.app/ for info
trace.zip

www speedscope app_ (2)
www speedscope app_ (1)

The speedscope app screenshots show all the calls that "arrive at" (program) and (garbage collector)

@cmdcolin
Copy link
Collaborator

Note that it goes a bit faster without performance tracing on but still about 60 seconds

@cmdcolin
Copy link
Collaborator

cmdcolin commented Nov 5, 2020

I think performance improvements on BAM would go a long way. CRAM is already a fair bit faster but it would actually be better if both BAM and CRAM were improved. Due to the fact that many things end up "rerendering" e.g. side scroll, height changes on snpcoverage, new axis calculated, full rerender happens...etc. it can be really good to make rerendering as optimal as possible. Reducing rerenderings is half the battle, but optimizing the render is really important too

I found this awhile ago and it intrigued me

https://github.com/ocxtal/udon

@ihh would be curious what you think because you also considered compressed CIGAR type strings

@cmdcolin
Copy link
Collaborator

cmdcolin commented Nov 5, 2020

this is a typical trace where someone has to wait 30-40 seconds to have it render a neighboring block even when the data is downloaded (note that this is snpcoverage and pileup)

www speedscope app_

@rbuels
Copy link
Contributor Author

rbuels commented Nov 24, 2020

Is there any way we could split this issue into specific things to be done to the code? Do we know enough now for specific recommendations?

@cmdcolin
Copy link
Collaborator

It is challenging to deliver actionable recommendations at this stage. I think more compressed in memory representations would be valuable on some level because there is large "GC pressure" observed in the code from what I can tell, but I don't know exactly what will deliver on that yet. The Udon project listed above is a cool data structure but there may be other ways that would work too

@cmdcolin
Copy link
Collaborator

I think it would be worth making a measured benchmark including BAM and CRAM of jbrowse 2 vs jbrowse 1

There was a concern on the mailing list about jbrowse 2 being a slower and we should know the numbers regarding this

@cmdcolin
Copy link
Collaborator

My hypothesis is that memory pressure from complete serialization of features could be a factor, and suggests that shared array buffer or rpc feature details might be beneficial

@cmdcolin
Copy link
Collaborator

cmdcolin commented Feb 2, 2021

@cmdcolin
Copy link
Collaborator

cmdcolin commented Feb 5, 2021

I added some performance benchmarking for the embedded mode, it actually performs a bit faster than the webworker in some cases, but could be worth digging into. Reproducible cra with embedded here https://github.com/cmdcolin/jb2_lgv_benchmarking_demo

@cmdcolin
Copy link
Collaborator

One of the things that can be somewhat unexpected behavior is sometimes a sort of "recalculating" state that the tracks go through. The code will often recalculate the score for example of a SNPCoverageAdapter, and if it finds out the score has changed, then it updates stats and fires off multiple new block updates which go through the full process of getFeatures from the data adapter. This results in a feeling of slowness compared to, perhaps, having the features cached and quickly updating the rendering

@cmdcolin
Copy link
Collaborator

Another area where performance differences are clear are things like sorting and coloring. These operations are nearly instantaneous on IGV, but on jbrowse 2, with a large alignments track, you can expect to wait 30s or so

Would recommend trying IGV to see

@cmdcolin
Copy link
Collaborator

At least one slow factor is slowness of calling readConfObject. This is called multiple times for each read and may explain why short read datasets can be slower than long read datasets

@daz10000
Copy link

Not sure I have a lot to add but we just implemented jbrowse2 showing nanopore data on yeast genomes (using IGV a lot in parallel) and the performance of the alignment view was surprisingly slow. Panned it can completely break the browser, but even gene or several gene scale viewing of 30x data can result in 5-10 second delays in chrome panning around or switching to a new location. THe data fetches are sub half second, but the remaining time is very CPU heavy (200% usage typically) rendering (~0.65Gb RAM). I did some profiling but not super familiar with JS browsing and didn't see any names of jbrowse functions (lots of anonymous funcs). If anyone has guidance on best profiling practices, I would be happy to build or share a profiling or bam data set. This is probably a controversional observation, but wondering if the data sets are just slightly beyond the capacity of real time processing in a browser - even IGV is a bit sluggish with a lot of RAM. We switched to wig files for coverage plots so users can leave the alignments off and move around, but they really need to see the alignment plots. I'm guessing profiling and performance optimization is the path forward, but wondering if other tooling like WebAsm (I'll show myself out now), or techniques like Deep zoom and prerendering a bunch of stuff and storing it are the path forward for alignment viewing.

Let me know how I can be helpful - we are mainly F# devs, transpiling with the awesome Fable tool, so less facile with raw JS, although @alfonsogarciacaro might have some thoughts there.

@cmdcolin
Copy link
Collaborator

@daz10000 definitely good to know. this issue is really important and we want to make sure the alignments tracks are speedy. things like webassembly is definitely not out of the question. prerendering could be an option for some specialized track type but we would want to try to avoid that for most cases, since sometimes you don't have control and the files are so big...avoid another data conversion step if possible. if you are interested in sharing the dataset we'd be happy to look at it

Also do you know if you are using the "@jbrowse/react-linear-genome-view" or the full jbrowse-web app with webworkers by chance?

@ihh
Copy link
Member

ihh commented Sep 14, 2021

I think it's @jbrowse/react-linear-genome-view

@alfonsogarciacaro
Copy link

Yes, the app @daz10000 mentions uses the @jbrowse/react-linear-genome-view. It's a slight variation of this demo. We use the React component because we need some customization, and it's my understanding that this is not possible with the full jbrowse-web app. Ideally we wanted to integrate JBrowse into our bigger app but we had some troubles with the build and right now is a separate frontend app but with custom selectors for genomes and genes.

Wasn't aware the React component didn't use webworkers. It'd be nice to enable them although I assume that unless there's a good way to parallelize the work the main benefit of the webworkers would be not blocking the UI, but it would still take time to render the regions when scrolling.

@cmdcolin
Copy link
Collaborator

#2523 adds some modest improvements, depending on your data. pending next release

deep short read sequencing e.g. 1000x coverage could get as much as a 10x speedup
other datasets may generally get a more modest e.g. 20% speedup
these stats calculated from our main app with webworkers, though benefits the proportional improvements likely apply to embedded also

@cmdcolin
Copy link
Collaborator

There have been modest speedups added here and there and we will continue to work on it

One challenge is high coverage sequencing, say the mitocondrial genome

This file has https://s3.amazonaws.com/jbrowse.org/genomes/hg19/HG002.hs37d5.2x250.bam 560MB of data on the chrMT chromosome alone

Trying to visit this track in our config_demo crashes the browser. It has about 55,000-60,000x coverage (calculated by mosdepth)

@cmdcolin
Copy link
Collaborator

This track is in our config_demo.json for reference as "HG002 Illumina hs37d5.2x250"

@cmdcolin
Copy link
Collaborator

For reference, the above MT genome also crashes igv.js. It is simpler gigantic to unzip 560MB of BAM, and we don't have methods to lazily parse that

@cmdcolin
Copy link
Collaborator

This issue could maybe be considered closed for now. I am happy with the performance improvements we have made to alignments tracks, particularly with the removal of serialization helped jbrowse-web/jbrowse-desktop.

For better embedded performance maybe need to get support for workers in embedded mode #2942

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants