-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure structure of HiGlass ingress directory #82
Conversation
|
modules/local/upload_higlass_data.nf
Outdated
cp -f $mcool $upload_dir | ||
cp -f $genome $upload_dir/${genome.baseName}.genome | ||
mkdir -p $upload_dir${higlass_data_basedir}/${species.replaceAll("\\s","_")}/${assembly} | ||
cp -f $mcool $upload_dir${higlass_data_basedir}/${species.replaceAll("\\s","_")}/${assembly}/${assembly}.mcool | ||
cp -f $genome $upload_dir/${higlass_data_basedir}/${species.replaceAll("\\s","_")}/${assembly}/${assembly}.genome | ||
|
||
# Load them in Kubernetes | ||
echo "Loading .mcool file" | ||
kubectl exec \$pod_name -- python /home/higlass/projects/higlass-server/manage.py ingest_tileset --filename /higlass-temp/$mcool.name --filetype cooler --datatype matrix --project-name $assembly --name ${assembly}_map | ||
kubectl exec \$pod_name -- python /home/higlass/projects/higlass-server/manage.py ingest_tileset --filename /higlass-temp/${higlass_data_basedir}/${species.replaceAll("\\s","_")}/${assembly}/${assembly}.mcool --filetype cooler --datatype matrix --project-name ${higlass_data_basedir}/${species.replaceAll("\\s","_")}/$assembly --name ${assembly}_map | ||
echo "Loading .genome file" | ||
kubectl exec \$pod_name -- python /home/higlass/projects/higlass-server/manage.py ingest_tileset --filename /higlass-temp/${genome.baseName}.genome --filetype chromsizes.tsv --datatype chromsizes --coordSystem ${assembly}_assembly --project-name $assembly --name ${assembly}_grid | ||
kubectl exec \$pod_name -- python /home/higlass/projects/higlass-server/manage.py ingest_tileset --filename /higlass-temp/${higlass_data_basedir}/${species.replaceAll("\\s","_")}/${assembly}/${assembly}.genome --filetype chromsizes.tsv --datatype chromsizes --coordSystem ${assembly}_assembly --project-name ${higlass_data_basedir}/${species.replaceAll("\\s","_")}/$assembly --name ${assembly}_grid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduce a variable named something like assembly_path
with the value "${higlass_data_basedir}/${species.replaceAll("\\s","_")}/${assembly}"
and use it throughout. It'll be easier to read and maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up from some recent discussions with @muffato and @gq1:
Would it be possible to use the same file name in both the contact_maps
output folder as well as the data_to_upload
folder. This is mostly if we decide to update certain entries in the future, it keeps things simple. Also makes sure the names are unique.
@@ -27,6 +27,7 @@ params { | |||
|
|||
// Input data for genome_metadata subworkflow | |||
assembly = 'GCA_946965045.2' | |||
species = 'Epithemia_sp._CRS-2021b' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual species name is Epithemia pelagica.
That is just the name given in our local file structure and might break when going to NCBI or other external sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I noticed the difference in names. Do you know why we don't use the official species name in our local file structure? I used this as the param because currently the only time this parameter is used is when creating the directory structure for higlass and I was trying to replicate what we use elsewhere as closely as possible. Would you suggest I change this to the official species name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are not using the species name for any queries, it will not matter either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why we don't use the official species name in our local file structure?
Probably because when we detected the name Epithemia pelagica, it had gone too far into the pipeline. That's the whole species-name change workflow that Paul is coordinating: we don't try to align everything to the NCBI taxonomy, because it's changing all the time.
This seems sensible, the naming format I've used in the |
From what I understand, you are planning to name the |
def project_name = "${higlass_data_project_dir}/${species.replaceAll('\\s','_')}/${assembly}" | ||
def file_name = "${assembly}_${meta.id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two variables make the script below very clean 👍🏼
Co-authored-by: Matthieu Muffato <[email protected]>
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).