Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SkeletonMergeTask occupies too much memory #126

Open
JackieZhai opened this issue Feb 24, 2022 · 10 comments
Open

SkeletonMergeTask occupies too much memory #126

JackieZhai opened this issue Feb 24, 2022 · 10 comments
Labels
performance Lower memory or faster computation. question

Comments

@JackieZhai
Copy link
Contributor

JackieZhai commented Feb 24, 2022

Hi, @william-silversmith
I have tried a 100^3 um^3 EM stack with Igneous skeletonization.
Unfortunately, the memory usage exceeds 350 GB at the second pass of SkeletonMergeTask.
This is impossible on our server ...

Our code of skeletonization:

from taskqueue import LocalTaskQueue
import igneous.task_creation as tc

tq = LocalTaskQueue(parallel=16)
tasks = tc.create_skeletonizing_tasks(cloudpath, \
    mip=2, shape=(512, 512, 512), sharded=True, \
    teasar_params={
        'scale': 4, 
        'const': 500,
        'pdrf_exponent': 4,
        'pdrf_scale': 100000,
        'soma_detection_threshold': 1100,
        'soma_acceptance_threshold': 3500,
        'soma_invalidation_scale': 1.0,
        'soma_invalidation_const': 300,
        'max_paths': None
    }, dust_threshold=1000)
tq.insert(tasks)
tq.execute()

tq = LocalTaskQueue(parallel=1)
tasks = tc.create_sharded_skeleton_merge_tasks(cloudpath, \
    dust_threshold=1000,
    tick_threshold=3500)
tq.insert(tasks)
tq.execute()

Our stack configure (skeletonize at mip=2 [40, 40, 40]):

info = CloudVolume.create_new_info(
    num_channels 	= 1,
    layer_type	= 'segmentation',
    data_type	= 'uint64',
    encoding	= 'compressed_segmentation',
    mesh    = 'mesh_mip_2',
    skeletons   = 'skeletons_mip_2',
    resolution	= [10, 10, 40],
    voxel_offset	= [0, 0, 0],
    chunk_size	= [512, 512, 50],
    volume_size	= [10000, 10000, 2500]
)
tasks = tc.create_downsampling_tasks(
    cloudpath,
    mip=0, # Start downsampling from this mip level (writes to next level up)
    axis='z',
    num_mips=2,
    compress='gzip', # None, 'gzip', and 'br' (brotli) are options
    factor=(2, 2, 1), # common options are (2,2,1) and (2,2,2)
  )
tasks = tc.create_downsampling_tasks(
    cloudpath,
    mip=2, # Start downsampling from this mip level (writes to next level up)
    axis='z',
    num_mips=3,
    compress='gzip', # None, 'gzip', and 'br' (brotli) are options
    factor=(2, 2, 2), # common options are (2,2,1) and (2,2,2)
  )

I think it may not need so much memory.
Could you please tell me how to run it without occupying so much memory?

Many thanks!

@william-silversmith william-silversmith added performance Lower memory or faster computation. question labels Feb 24, 2022
@william-silversmith
Copy link
Contributor

Hi Jackie! The sharded format is a little harder to run because you need to consider how large files need to be downloaded, filtered, and aggregated. Here's some tips for making this easier:

  1. If you're running the execution in parallel, consider running fewer processes on each machine.
  2. You can reduce memory usage a lot if you run against a regular file system instead of cloud storage. This is because the .frag files are https://github.com/seung-lab/mapbuffer files and can be mmapped to extract the appropriate skeletons for a shard.
  3. You can reduce the run time somewhat by creating a sqlite or mysql database from the spatial index and then referencing it during the merge process.
cv = CloudVolume(path)
cv.skeleton.spatial_index.to_sql("spatial_index.db")

It might give me some more insight if you can share a representative list of the fragment files with their size in bytes and the number of merge tasks generated.

@william-silversmith
Copy link
Contributor

Sorry, I just realized you inserted and executed in the same script. Usually I do it in two steps so I confused myself. Getting an idea of the number of tasks and the sizes of the fragments will help a lot. I might be able to recommend some tweaks to the shard parameters.

@william-silversmith
Copy link
Contributor

I just added igneous skeleton spatial-index create/db to the CLI so that should make (3) easier.

@JackieZhai
Copy link
Contributor Author

Thanks for your reply! I am researching this database of the spatial index.

@JackieZhai
Copy link
Contributor Author

JackieZhai commented Mar 2, 2022

Besides, I have tried more things about this and found even I set:

  1. parallel=1 (actually, create_sharded_skeleton_merge_tasks only make 1 task to execute)
  2. local files and the MapBuffer
  3. max_cable_length=50000

the running memory still grew suddenly at some point in ShardedSkeletonMergeTask.process_skeletons()

@william-silversmith
Copy link
Contributor

william-silversmith commented Mar 2, 2022

Interesting. That says to me that the size of the skeletons themselves is very large... but you clipped them to <50000. You can try changing the settings for create_sharded_skeleton_merge_tasks to create smaller shards by changing the shard_index_bytes and minishard_index_bytes to be smaller. How large are your skeleton .frag files?

This is still really weird though. If this were on my machine, what I would do is start tracing the merge task to find out where all the memory usage was going using import pdb; pdb.set_trace() and memory_profiler. If you can share some memory profiles, that might be helpful. (both graphs of memory usage over time and line by line profiles in important functions)

@JackieZhai
Copy link
Contributor Author

Besides, I have tried more things about this and found even I set:

  1. parallel=1 (actually, create_sharded_skeleton_merge_tasks only make 1 task to execute)
  2. local files and the MapBuffer
  3. max_cable_length=50000

the running memory still grew suddenly at some point in ShardedSkeletonMergeTask.process_skeletons()

Finally, I set max_cable_length=10000 (a very small number just for testing) and just used < 5 GB memory to finish.

I imported the result into Neuroglancer. Here is one of them:
soma_eg

It seems that some extremely messy segments (caused by unaligned images or merged errors of somas) lead to the memory explosion in kimimaro.postprocess().

By the way, I have 125 .frags files, a total of 417 MB.

@william-silversmith
Copy link
Contributor

This makes a lot more sense to me. My skeletons have been (mostly) well behaved and I was able to screen out extremely large mergers while sparing the rest. If you can send me your fragment files, I might be able to do some debugging to figure out where that memory spike is coming from (I won't share them or use them for any other purpose). It might take me a bit to get to it though.

If you figure out which messy segments are causing the problem, you can try filtering them out specifically by editing the merge code. That might be the best approach.

@JackieZhai
Copy link
Contributor Author

OK, I will send you my .frags files.

Next, I am going to optimize the images and segments.

Anyway, it's a pleasure to talk to you!

@william-silversmith
Copy link
Contributor

Thank you! Looking forward to the frag files!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Lower memory or faster computation. question
Projects
None yet
Development

No branches or pull requests

2 participants