You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There seem to be quite a few race conditions if one runs lifton in parallel. A project I'm working on requires running lifton from several dozen source annotations to several hundred references, and so I use snakemake to parallelise runs across a cluster. However (at least) the following race conditions appear:
If the output files are something like output/$SOURCE/$TARGET_NAME.gff, there's a race condition as lifton writes to output/$SOURCE/lifton_output regardless of which genome is being annotated, which corrupts the intermediate files.
It seems like at certain stages the gffutils sqlite database is written to, even if it already exists before creating (e.g. with ANALYSE). This causes race conditions and crashes as only one process can write to a sqlite db at once (normally).
With liftoff, one could work around these same issues because liftoff accepted a temp/intermediate directory name (so you could use e.g. output/$SOURCE/$TARGET_NAME/ instead of output/$SOURCE/lifton_output, making each job's directory unique). Liftoff also did not modify the gff database if it already existed, so if you pre-computed all needed gff_dbs before running any liftoff, then you were guaranteed not to have race conditions on the sqlite db.
I'd encourage you to adopt these workarounds in lifton.
best,
Kevin
The text was updated successfully, but these errors were encountered:
Hi @kdm9,
I am currently on an internship and won't have time to fix this issue in
August. I will get it back to you in September.
Thanks for reporting this issue. It is indeed important to allow users to
run LiftOn in parallel.
Best,
Kuan-Hao
Hello all,
There seem to be quite a few race conditions if one runs lifton in parallel. A project I'm working on requires running lifton from several dozen source annotations to several hundred references, and so I use snakemake to parallelise runs across a cluster. However (at least) the following race conditions appear:
output/$SOURCE/$TARGET_NAME.gff
, there's a race condition as lifton writes tooutput/$SOURCE/lifton_output
regardless of which genome is being annotated, which corrupts the intermediate files.ANALYSE
). This causes race conditions and crashes as only one process can write to a sqlite db at once (normally).With liftoff, one could work around these same issues because liftoff accepted a temp/intermediate directory name (so you could use e.g.
output/$SOURCE/$TARGET_NAME/
instead ofoutput/$SOURCE/lifton_output
, making each job's directory unique). Liftoff also did not modify the gff database if it already existed, so if you pre-computed all neededgff_db
s before running any liftoff, then you were guaranteed not to have race conditions on the sqlite db.I'd encourage you to adopt these workarounds in lifton.
best,
Kevin
The text was updated successfully, but these errors were encountered: