Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure structure of HiGlass ingress directory #82

Merged
merged 7 commits into from
Oct 10, 2023

Conversation

BethYates
Copy link
Collaborator

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@github-actions
Copy link

github-actions bot commented Sep 21, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 48effa5

+| ✅ 130 tests passed       |+
#| ❔  20 tests were ignored |#
!| ❗   1 tests had warnings |!

❗ Test warnings:

❔ Tests ignored:

  • files_exist - File is ignored: assets/nf-core-genomenote_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-genomenote_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-genomenote_logo_dark.png
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: conf/igenomes.config
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File does not exist: .github/ISSUE_TEMPLATE/config.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File does not exist: assets/nf-core-genomenote_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-genomenote_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-genomenote_logo_dark.png
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
  • actions_ci - actions_ci
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/genomenote/genomenote/.github/workflows/awstest.yml

✅ Tests passed:

Run details

  • nf-core/tools version 2.8
  • Run at 2023-10-09 15:45:16

@BethYates BethYates requested review from muffato and gq1 September 21, 2023 16:53
Comment on lines 38 to 48
cp -f $mcool $upload_dir
cp -f $genome $upload_dir/${genome.baseName}.genome
mkdir -p $upload_dir${higlass_data_basedir}/${species.replaceAll("\\s","_")}/${assembly}
cp -f $mcool $upload_dir${higlass_data_basedir}/${species.replaceAll("\\s","_")}/${assembly}/${assembly}.mcool
cp -f $genome $upload_dir/${higlass_data_basedir}/${species.replaceAll("\\s","_")}/${assembly}/${assembly}.genome

# Load them in Kubernetes
echo "Loading .mcool file"
kubectl exec \$pod_name -- python /home/higlass/projects/higlass-server/manage.py ingest_tileset --filename /higlass-temp/$mcool.name --filetype cooler --datatype matrix --project-name $assembly --name ${assembly}_map
kubectl exec \$pod_name -- python /home/higlass/projects/higlass-server/manage.py ingest_tileset --filename /higlass-temp/${higlass_data_basedir}/${species.replaceAll("\\s","_")}/${assembly}/${assembly}.mcool --filetype cooler --datatype matrix --project-name ${higlass_data_basedir}/${species.replaceAll("\\s","_")}/$assembly --name ${assembly}_map
echo "Loading .genome file"
kubectl exec \$pod_name -- python /home/higlass/projects/higlass-server/manage.py ingest_tileset --filename /higlass-temp/${genome.baseName}.genome --filetype chromsizes.tsv --datatype chromsizes --coordSystem ${assembly}_assembly --project-name $assembly --name ${assembly}_grid
kubectl exec \$pod_name -- python /home/higlass/projects/higlass-server/manage.py ingest_tileset --filename /higlass-temp/${higlass_data_basedir}/${species.replaceAll("\\s","_")}/${assembly}/${assembly}.genome --filetype chromsizes.tsv --datatype chromsizes --coordSystem ${assembly}_assembly --project-name ${higlass_data_basedir}/${species.replaceAll("\\s","_")}/$assembly --name ${assembly}_grid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce a variable named something like assembly_path with the value "${higlass_data_basedir}/${species.replaceAll("\\s","_")}/${assembly}" and use it throughout. It'll be easier to read and maintain.

Copy link
Contributor

@priyanka-surana priyanka-surana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up from some recent discussions with @muffato and @gq1:
Would it be possible to use the same file name in both the contact_maps output folder as well as the data_to_upload folder. This is mostly if we decide to update certain entries in the future, it keeps things simple. Also makes sure the names are unique.

@@ -27,6 +27,7 @@ params {

// Input data for genome_metadata subworkflow
assembly = 'GCA_946965045.2'
species = 'Epithemia_sp._CRS-2021b'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual species name is Epithemia pelagica.
That is just the name given in our local file structure and might break when going to NCBI or other external sources.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I noticed the difference in names. Do you know why we don't use the official species name in our local file structure? I used this as the param because currently the only time this parameter is used is when creating the directory structure for higlass and I was trying to replicate what we use elsewhere as closely as possible. Would you suggest I change this to the official species name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are not using the species name for any queries, it will not matter either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why we don't use the official species name in our local file structure?

Probably because when we detected the name Epithemia pelagica, it had gone too far into the pipeline. That's the whole species-name change workflow that Paul is coordinating: we don't try to align everything to the NCBI taxonomy, because it's changing all the time.

@BethYates
Copy link
Collaborator Author

Follow up from some recent discussions with @muffato and @gq1: Would it be possible to use the same file name in both the contact_maps output folder as well as the data_to_upload folder. This is mostly if we decide to update certain entries in the future, it keeps things simple. Also makes sure the names are unique.

This seems sensible, the naming format I've used in the data_to_upload folder is based on what was suggested by @muffato. Would you be happy with me renaming the file in contact_maps to match this?

@priyanka-surana
Copy link
Contributor

This seems sensible, the naming format I've used in the data_to_upload folder is based on what was suggested by @muffato. Would you be happy with me renaming the file in contact_maps to match this?

From what I understand, you are planning to name the .mcool, .cool and .genome files with the prefix $assembly. I would recommend using $assembly_$hic or something similar instead. This ensures that the prefix is always unique. And yes happy for you to rename the file in contact_maps.

nextflow.config Outdated Show resolved Hide resolved
Comment on lines +27 to +28
def project_name = "${higlass_data_project_dir}/${species.replaceAll('\\s','_')}/${assembly}"
def file_name = "${assembly}_${meta.id}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two variables make the script below very clean 👍🏼

Co-authored-by: Matthieu Muffato <[email protected]>
@BethYates BethYates merged commit f999acb into public_dev Oct 10, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants