Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2.0.0 #146

Merged
merged 325 commits into from
Oct 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
325 commits
Select commit Hold shift + click to select a range
f2bc386
removed whitespace
BethYates Jun 27, 2023
3b00a13
requested changes from PR 70
BethYates Jun 30, 2023
3fe9461
give partially completed genome note template a meaningful name
BethYates Jul 12, 2023
997effd
renamed input to something more useful
BethYates Jul 12, 2023
558fe76
Update linting.yml fix to nf-core 2.8.10
BethYates Jul 12, 2023
45d7387
added linting.yml to files_unchanged
BethYates Jul 12, 2023
1c2d556
reformatted naming of output file
BethYates Jul 18, 2023
c942d51
renamed genome_statistics directory to genome_note and added in the g…
BethYates Jul 18, 2023
8b82728
changed output for statistics table and completed genome note templat…
BethYates Jul 18, 2023
1936f29
rmoved print statement
BethYates Jul 18, 2023
c143f0e
passsed meta data through in the standard way
BethYates Jul 18, 2023
525559f
changed the way that meta data is handled
BethYates Jul 18, 2023
150151c
changed how meta data gets passed to the combine_metadata module
BethYates Jul 18, 2023
8054f9e
changed tag to include meta.id
BethYates Jul 18, 2023
2fc66aa
fixed linting
BethYates Jul 18, 2023
aa44b41
Merge pull request #70 from sanger-tol/genome_metadata
BethYates Jul 21, 2023
b6c21ed
edited to match style used elsewhere in the document
BethYates Jul 21, 2023
fce26ee
write parameter values to the genome notes portal as an optional proc…
BethYates Aug 6, 2023
71c70ab
Set params for writing to database in test configs
BethYates Aug 8, 2023
afc0cb5
fixed linting issues
BethYates Aug 8, 2023
969ba33
fixed linting
BethYates Aug 8, 2023
cb69c23
fixed linting
BethYates Aug 8, 2023
f9adca0
Added usage instructions for setting nextflow secret for storing API …
BethYates Aug 9, 2023
4531bd0
fixed linting
BethYates Aug 9, 2023
a622254
added fix from ncbi branch
BethYates Aug 10, 2023
85f0462
Merge pull request #75 from sanger-tol/genome_metadata
BethYates Aug 10, 2023
588f9fd
merged public_dev into update_higlass
BethYates Aug 10, 2023
939f08e
merge in stashed changes
BethYates Aug 15, 2023
50fb922
merge stashed files
BethYates Aug 15, 2023
a53c18e
config changes
BethYates Aug 17, 2023
6fcca6d
Upload of .mcool and .genome files works
BethYates Aug 17, 2023
803181b
Updated to include higlass update information
BethYates Aug 17, 2023
03f830e
fixed linting
BethYates Aug 17, 2023
c5f9f27
fixed linting
BethYates Aug 17, 2023
8a75a62
These two parameters should be set to false by default, so that every…
muffato Aug 22, 2023
76cedd5
map is superfluous, the channels are already well formed
muffato Aug 22, 2023
7ae563d
Removed trailing whitespace
muffato Aug 22, 2023
ed26cc7
typo
muffato Aug 22, 2023
17963af
The `replace` in conf/modules.config only affects the last extension
muffato Aug 22, 2023
b4bc8bf
.getName() is the same as .name
muffato Aug 22, 2023
90de768
Output a versions.yml too
muffato Aug 22, 2023
33f3500
Should be a different variable name not to overwrite the previous meta
muffato Aug 22, 2023
aa3490a
Do the copy to the loading folder from within the module
muffato Aug 22, 2023
40c3db9
This module doesn't support Conda
muffato Aug 22, 2023
82a124d
Should be using `assembly` directly, since it is a parameter of the m…
muffato Aug 22, 2023
51295ed
Merge pull request #81 from sanger-tol/update_higlass_mm49
BethYates Aug 30, 2023
1688916
Update docs/usage.md
BethYates Aug 30, 2023
50500e8
updated process name and flag controlling if it runs
BethYates Aug 30, 2023
2a457f2
Update conf/test_full.config
BethYates Sep 1, 2023
f0b6da8
Merge pull request #78 from sanger-tol/update_higlass
BethYates Sep 1, 2023
33619f8
Configure structure of higlass ingress directory
BethYates Sep 21, 2023
27e28b2
fixed linting
BethYates Sep 21, 2023
f33de3d
change file name on higlass
BethYates Sep 22, 2023
3b1008d
introduced variables and renamed params to make code easier to maintain
BethYates Oct 4, 2023
cd43c03
Change contact map results files to have name that matches that used …
BethYates Oct 4, 2023
28833e5
fixed linting
BethYates Oct 4, 2023
48effa5
fix alignment
BethYates Oct 9, 2023
f999acb
Merge pull request #82 from sanger-tol/update_higlass
BethYates Oct 10, 2023
c64f557
Changes to fix failing tests. Refactored parsing of metadata files to…
BethYates Oct 10, 2023
28f1713
trailing whitespace fix
BethYates Oct 10, 2023
e9377e0
replaced with a generic parse_metadata.nf module
BethYates Oct 10, 2023
34dbdcf
Simplified name of output file
BethYates Oct 12, 2023
594ef3f
included prefix in name of output file of parse_metadata and updated …
BethYates Oct 12, 2023
6fed631
output files given more meaningful names
BethYates Oct 12, 2023
73995b3
Merge pull request #85 from sanger-tol/genome_metadata
BethYates Oct 12, 2023
c5b052e
Changed way we get the chromosome number
BethYates Nov 17, 2023
ed09cdd
prefixed params with ENA
BethYates Nov 17, 2023
a83061c
Code to fetch collection date
BethYates Nov 17, 2023
e05b26e
Standardised format of TAX_STRING variable
BethYates Nov 17, 2023
b30c081
preserve undefined variables in template
BethYates Nov 21, 2023
7d39cc0
Fetch specimen ID correctly
BethYates Nov 21, 2023
68d7796
set params to None as default
BethYates Nov 21, 2023
86aa523
latest version of the template, param names have been standardised
BethYates Nov 21, 2023
e4bfa1d
standardised param used to represent ToL ID
BethYates Nov 21, 2023
91e1624
changes to maximise the amount of meta data passed through to the gen…
BethYates Dec 1, 2023
caa1906
pass through genbank accession for each chr to include in table 2
BethYates Dec 1, 2023
e609822
Maximise the amounte of metadata passed through to genome note template
BethYates Dec 1, 2023
b5812e7
updated to latest template
BethYates Dec 1, 2023
ca05591
changes to allow metadata collected by the GENOME_STATISTICS subworkf…
BethYates Dec 1, 2023
3431bcb
Black changes
BethYates Dec 1, 2023
8ee8893
don't duplicate authors
BethYates Dec 1, 2023
62f71a3
format author list correctly
BethYates Dec 4, 2023
e4144c0
template tweaks
BethYates Dec 4, 2023
ef0331a
Fixed bug with dumping software versions for the metadata parsing scr…
BethYates Dec 5, 2023
feaf08c
Removed code that is now run in a later subworkflow
BethYates Dec 8, 2023
119ddaa
removed extra spaces
BethYates Dec 8, 2023
b22647d
convert genome length to Mbp correctly
BethYates Dec 8, 2023
2839069
publish consistent and inconsistent parameter sets to results directory
BethYates Dec 8, 2023
8f92cd4
add consistent and inconsistent parameter files to the output
BethYates Dec 8, 2023
51abfb8
reformated setting of meta and removed unnecessary channel duplication
BethYates Dec 8, 2023
6cfe432
prettier and black fixes
BethYates Dec 8, 2023
cf686df
updated to use correct params for genome_metadata subworkflow
BethYates Dec 12, 2023
edbf9d6
combine inconsistent parameters from the genome statistics and genome…
BethYates Dec 12, 2023
4425f1a
Merge branch 'public_dev' into genome_metadata
BethYates Dec 12, 2023
6010995
added back species name, required for higlass upload
BethYates Dec 13, 2023
393e15d
removed unnecessary conversion of ints to strings
BethYates Dec 13, 2023
5d09fd7
black fix
BethYates Dec 13, 2023
83b8a0e
Conversion of int/float to string in necessary for the next step whic…
BethYates Dec 13, 2023
1d6bd23
Fixed issue with collection location being reported as being inconsis…
BethYates Dec 13, 2023
f1e1369
Remove file if already present on server before re-uploading, output …
BethYates Dec 13, 2023
ad53d28
linting fixes
BethYates Dec 13, 2023
8460883
New module to generate a link to the uploaded data on higlass. Link i…
BethYates Dec 13, 2023
292c0cf
Merge pull request #98 from sanger-tol/genome_metadata
BethYates Dec 14, 2023
d24500d
fix to display correct tracks when adding data through UI
BethYates Dec 15, 2023
4eff633
Updated test species and test data to use a species for which a genom…
BethYates Jan 22, 2024
464d805
missing new line at end of file
BethYates Jan 22, 2024
c596ae4
Pin prettier version
BethYates Jan 22, 2024
1a114d9
Merge pull request #107 from sanger-tol/update_test_data
BethYates Jan 23, 2024
3e99801
Changes to assign meaningful names as the uid for the tilesets and vi…
BethYates Jan 25, 2024
cf5ee85
black fixes
BethYates Jan 25, 2024
e3f4383
fixed error with passing extra argument to request_viewconfig function
BethYates Jan 26, 2024
c6e8503
uids cannot contain "."
BethYates Jan 26, 2024
20c6fe5
Separated upload code to it's own bash script
BethYates Feb 6, 2024
89527d5
removed suffix used while testing
BethYates Feb 6, 2024
52df2f8
Added some comments
BethYates Feb 6, 2024
dd24f6a
black fixes
BethYates Feb 6, 2024
3ebeb24
linting fixes
BethYates Feb 6, 2024
7db99f2
Merge pull request #101 from sanger-tol/update_higlass
BethYates Feb 8, 2024
04f0ba6
Merge branch 'dev' into public_dev
BethYates Feb 8, 2024
a471f51
Merge branch 'public_dev' of github.com:sanger-tol/genomenote into pu…
BethYates Feb 8, 2024
1ddd53b
bugfix - create inconsistent file when combining metadata even if all…
BethYates Feb 8, 2024
ba3aafd
added colour to template parameters to make them easier to see
BethYates Feb 8, 2024
659026f
Renamed test data file
BethYates Feb 8, 2024
17a449c
Allow population of either an XML or docx genome note template
BethYates Mar 22, 2024
e3c5e52
fixed black linting
BethYates Mar 22, 2024
0a8bd8c
linting
BethYates Mar 22, 2024
c21ecb7
Stopped running linting test for template strings
BethYates Mar 25, 2024
61214b3
Added jinja variable that should never be replaced to test that the t…
BethYates Mar 25, 2024
8e5687a
Update bin/populate_genome_note_template.py
BethYates Mar 25, 2024
d4781dc
Fix for Nextflow 24.01-edge: functions have to be defined in the main…
BethYates Mar 27, 2024
ac7b5ea
Merge branch 'xml_template' of github.com:sanger-tol/genomenote into …
BethYates Mar 27, 2024
12bff64
Merge pull request #110 from sanger-tol/xml_template
BethYates Mar 27, 2024
a2446eb
Allow adding multiple biosample
reichan1998 Jul 14, 2024
bebcec7
Rename params.biosample to biosample_wgs, add params.biosample_rna an…
reichan1998 Jul 30, 2024
e0e8c07
fix format error
reichan1998 Jul 30, 2024
8acc0a0
Merge pull request #132 from reichan1998/metadata_subworkflow
BethYates Jul 30, 2024
0db39d5
Merge branch 'dev' into public_dev
BethYates Aug 22, 2024
8287974
added url for COPO as metadata source
SandraBabirye Sep 3, 2024
bbcd7ab
added python script that parses json file to extract metadata from CO…
SandraBabirye Sep 3, 2024
8d6e999
added the copo files
SandraBabirye Sep 3, 2024
6570144
edited the file permissions for the python script
SandraBabirye Sep 3, 2024
dc2db13
Update parse_json_copo_biosample.py
SandraBabirye Sep 3, 2024
53e1110
Added COPO as biosample
SandraBabirye Sep 3, 2024
3111b32
Added the prefix of the biosample as COPO
SandraBabirye Sep 3, 2024
51a8dca
Fix Python Black linting issues
SandraBabirye Sep 3, 2024
80dffed
Edited file ; remove COPO added a s biosample
SandraBabirye Sep 4, 2024
1de3bb2
linting and test fixes
BethYates Sep 10, 2024
9823831
edited the parse_json_copo_biosample.py file
SandraBabirye Sep 10, 2024
d87b8a3
edited the parse_json_copo_biosample.py file
SandraBabirye Sep 10, 2024
a9758ac
edited the parse_json_copo_biosample.py file
SandraBabirye Sep 10, 2024
a04facf
removed LONGITUDE as its missing in the json file
SandraBabirye Sep 10, 2024
22f41da
Fixing black linting issues
SandraBabirye Sep 10, 2024
c902495
added new arguments in the script
SandraBabirye Sep 10, 2024
4f24601
added new arguments in the script
SandraBabirye Sep 10, 2024
8deaf14
edited the parse_json_copo_biosample.py file
SandraBabirye Sep 10, 2024
940bd5d
edited the parse_json_copo_biosample.py file
SandraBabirye Sep 10, 2024
e474905
edited the file to Extract biosample type from FILE_OUT
SandraBabirye Sep 11, 2024
b708ede
edited the file to Extract biosample type from FILE_OUT
SandraBabirye Sep 11, 2024
7111172
Merge pull request #137 from SandraBabirye/copo_metadata
BethYates Sep 11, 2024
942be16
added gbif input taxonomy file as an argument
SandraBabirye Sep 16, 2024
6957522
added process to fetch gbif metadata
SandraBabirye Sep 16, 2024
5e9ab86
added url for gbif data parsing a json file
SandraBabirye Sep 16, 2024
db359d8
added a new module process to fetch metadata from gbif
SandraBabirye Sep 16, 2024
b0eb826
added a new module process to fetch metadata from gbif
SandraBabirye Sep 16, 2024
7ba5eb3
edited the input channels for the FETCHGBIFMETADATA process
SandraBabirye Sep 16, 2024
2404296
edited the input channels for the FETCHGBIFMETADATA process
SandraBabirye Sep 16, 2024
3411ff4
edited the input channels for the FETCHGBIFMETADATA process
SandraBabirye Sep 16, 2024
e384c38
edited the input channels for the FETCHGBIFMETADATA process
SandraBabirye Sep 16, 2024
a4c3100
edded python script to fetch authorship for the species
SandraBabirye Sep 17, 2024
9544a27
added process to fetch GBIF metadata
SandraBabirye Sep 17, 2024
1971fe1
edited the file to include GBIF as a metadata source
SandraBabirye Sep 17, 2024
9c2aa36
added new local module to fetch gbif metadata
SandraBabirye Sep 17, 2024
9a189e9
fix linting issue
SandraBabirye Sep 17, 2024
8919ace
edited the file permissions for the file
SandraBabirye Sep 24, 2024
aadfd97
edited the file
SandraBabirye Sep 24, 2024
df88a18
edited the python script name
SandraBabirye Sep 24, 2024
052cd1e
edited the output file name
SandraBabirye Sep 24, 2024
a0db81a
Edited the file
SandraBabirye Sep 25, 2024
05991ac
edited the input channel for the COMBINE_METADATA process
SandraBabirye Sep 25, 2024
2596322
edited the files
SandraBabirye Sep 25, 2024
3051125
fix the linting issues
SandraBabirye Sep 25, 2024
ae7b93a
edited the modules in the python script
SandraBabirye Sep 25, 2024
b241a0a
added other key value pairs to be outputed in the csv file
SandraBabirye Sep 26, 2024
7e0ebdf
edited the input channels
SandraBabirye Sep 26, 2024
3ce19f8
edited the file permissions for the python script
SandraBabirye Sep 26, 2024
6c05ca4
edited the input channels for the FETCH_GBIF_METADATA process
SandraBabirye Sep 26, 2024
23afc81
added the gbif_metadata argument
SandraBabirye Sep 26, 2024
207d935
edited the output file name
SandraBabirye Sep 26, 2024
8af680e
added the GBIF to the key values pairs
SandraBabirye Sep 26, 2024
085dc32
edited the file
SandraBabirye Sep 26, 2024
d41b4ee
edited the file
SandraBabirye Sep 26, 2024
147379a
edited the TAXONOMY_AUTHORITY key to have the authorship value in quotes
SandraBabirye Sep 26, 2024
20aaf3d
edited the output file name to have taxonomy and not metadata
SandraBabirye Sep 27, 2024
073a402
edited the files
SandraBabirye Sep 27, 2024
be630e2
Merge branch 'dev' into merge_dev
BethYates Oct 1, 2024
f410a35
pin the version of editorconfig like in the readmapping pipeline
BethYates Oct 1, 2024
ca23581
Add chr genbank accession
BethYates Oct 1, 2024
eb89d32
black fix
BethYates Oct 1, 2024
bde28ff
edited line 22 in the file
SandraBabirye Oct 1, 2024
58063e5
removed trailing white space
SandraBabirye Oct 1, 2024
e3d9696
removed trailing white space
SandraBabirye Oct 1, 2024
3bb90f0
removed trailing white space
SandraBabirye Oct 1, 2024
7e18cbd
Merge pull request #140 from SandraBabirye/gbif_metadata
BethYates Oct 1, 2024
ef1c8dc
Merge pull request #143 from sanger-tol/merge_dev
BethYates Oct 2, 2024
c3945f9
Rely on GBIF to provide correct Taxonomic authority information
BethYates Oct 3, 2024
91779a4
Fixed rounding issues on genome/chr length
BethYates Oct 3, 2024
1f7ef0b
pre-process collection location to have same format across resources
BethYates Oct 3, 2024
7a7e323
GAL to title case
BethYates Oct 3, 2024
2facc6c
Lifestage in lower case
BethYates Oct 3, 2024
b5a2c72
Added BUSCO data, fixed rounding issues and replaced 'tissue type' wi…
BethYates Oct 3, 2024
523cb23
Don't strip parentheses from authority
BethYates Oct 3, 2024
a125cc6
Removed kingdom variable as it's not used directly
BethYates Oct 3, 2024
8ff5563
Ensure correct biosample accessions are set and check for "United Kin…
BethYates Oct 3, 2024
382b09f
Update Higlass view config and add url to set of data to be passed to…
BethYates Oct 4, 2024
9e61667
minor fixes
BethYates Oct 4, 2024
5e2b088
black/prettier fixes
BethYates Oct 7, 2024
95a941b
added alt_hap_accession to the set of parameters returned
BethYates Oct 7, 2024
8a76e00
New test species
BethYates Oct 7, 2024
9771db4
black fix
BethYates Oct 7, 2024
b462e14
minor fixes to parameter formating
BethYates Oct 7, 2024
ae449f0
add in percent assembled from goat
BethYates Oct 7, 2024
93031d3
Don't output higlass link file - url now added to parameter files and…
BethYates Oct 7, 2024
b1cc4c6
Removed unnecessary metadata params and added checking of params prov…
BethYates Oct 7, 2024
267b8f3
documentation updates
BethYates Oct 7, 2024
bc82e35
prettier fixes
BethYates Oct 8, 2024
19c1710
linting fixes
BethYates Oct 8, 2024
e280a88
No longer have params.species so need to set value in meta correctly
BethYates Oct 8, 2024
c48817c
typo
BethYates Oct 8, 2024
02b7dda
changes required as no longer have params.species
BethYates Oct 8, 2024
3b10acc
updates ahead of release
BethYates Oct 8, 2024
07cfc1c
Update CHANGELOG.md
tkchafin Oct 9, 2024
93e3c33
increase default memory for POPULATE_TEMPLATE
tkchafin Oct 9, 2024
f0f13ed
remove duplicate container directive
tkchafin Oct 9, 2024
7642cf1
Change to same container as other requests module
tkchafin Oct 9, 2024
b6a3098
Don't upload to Higlass in full test
BethYates Oct 9, 2024
f8f027c
used elif rather than lots of ifs in parsing scripts, handle collecti…
BethYates Oct 9, 2024
4dc6f03
Merge branch 'release_2.0_fixes' of github.com:sanger-tol/genomenote …
BethYates Oct 9, 2024
9414f63
black fix
BethYates Oct 9, 2024
a5eb096
fixed extending meta
BethYates Oct 9, 2024
7257505
add required parameters
BethYates Oct 9, 2024
7013322
simplify meta setting
BethYates Oct 9, 2024
2dff231
added comments and appropriate error raising
BethYates Oct 9, 2024
8eb93d9
Moved sanger specific running instructions to the end of the file
BethYates Oct 9, 2024
a896521
prettier fix
BethYates Oct 9, 2024
e873d30
Fix KeyError Exceptions and ensure getting all biosample collectors/i…
BethYates Oct 9, 2024
efc6169
black fixes
BethYates Oct 9, 2024
ddafa1e
Merge pull request #144 from sanger-tol/release_2.0_fixes
BethYates Oct 10, 2024
0ef1ea9
changed release data
BethYates Oct 10, 2024
73375e6
Merge pull request #145 from sanger-tol/public_dev
BethYates Oct 10, 2024
5a54958
bump version
BethYates Oct 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
- uses: actions/setup-node@v3

- name: Install editorconfig-checker
run: npm install -g editorconfig-checker
run: npm install -g editorconfig-checker@3.0.2

- name: Run ECLint check
run: editorconfig-checker -exclude README.md $(find .* -type f | grep -v '.git\|.py\|.md\|cff\|json\|yml\|yaml\|html\|css\|work\|.nextflow\|build\|nf_core.egg-info\|log.txt\|Makefile')
Expand All @@ -32,7 +32,7 @@ jobs:
- uses: actions/setup-node@v3

- name: Install Prettier
run: npm install -g prettier
run: npm install -g prettier@3.1.0

- name: Run Prettier --check
run: prettier --check ${GITHUB_WORKSPACE}
Expand Down
1 change: 1 addition & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ lint:
multiqc_config:
- report_comment
actions_ci: false
template_strings: False
30 changes: 30 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,36 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [[2.0.0](https://github.com/sanger-tol/genomenote/releases/tag/2.0.0)] - English Cocker Spaniel [2024-10-10]

### Enhancements & fixes

- New genome_metadata subworkflow to fetch metadata linked to the genome assembly from various sources (COPO, GoaT, GBIF, ENA, NCBI). The options `--assembly`, `--biosample_wgs`, `--biosample_hic` and `--biosample_rna` specify what metadata to fetch and process.
- Now outputs a partially completed genome note document based on a template file which contains placeholder parameters. These placeholders are replaced with data generated by the pipeline. The template file to use can be specified using the `--note_template` option.
- Added the `--write_to_portal` option to write a set of key-value data parameters to a Genome Notes database.
- Added the `--upload_higlass_data` option to automatically upload the Hi-C Map to a kubernetes hosted Hi-Glass server.
- Bugfix: don't rely on fasta file name to correctly set assembly accession needed for use with `ncbi datasets`.
- Bugfix: ensure meta.id is used consistently.

### Parameters

| Old parameter | New parameter |
| ------------- | -------------------------- |
| | --assembly |
| | --biosample_wgs |
| | --biosample_hic |
| | --biosample_rna |
| | --write_to_portal |
| | --genome_notes_api |
| | --note_template |
| | --upload_higlass_data |
| | --higlass_url |
| | --higlass_deployment_name |
| | --higlass_namespace |
| | --higlass_kubeconfig |
| | --higlass_upload_directory |
| | --higlass_data_project_dir |

## [[1.2.2](https://github.com/sanger-tol/genomenote/releases/tag/1.2.2)] - Pyrenean Mountain Dog (patch 2) - [2024-09-10]

### Enhancements & fixes
Expand Down
12 changes: 10 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,24 @@
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: sanger-tol/genomenote v1.2.2
title: sanger-tol/genomenote v2.0.0
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Sandra
family-names: Babiyre
affiliation: Wellcome Sanger Institute
orcid: "https://orcid.org/0009-0004-7773-7008"
- given-names: Tyler
family-names: Chafin
affiliation: Wellcome Sanger Institute
orcid: "https://orcid.org/0000-0001-8687-5905"
- given-names: Chau
family-names: Duong
affiliation: Wellcome Sanger Institute
orcid: "https://orcid.org/0009-0001-0649-2291"
- given-names: Matthieu
family-names: Muffato
affiliation: Wellcome Sanger Institute
Expand All @@ -38,5 +46,5 @@ identifiers:
repository-code: "https://github.com/sanger-tol/genomenote"
license: MIT
commit: TODO
version: 1.2.2
version: 2.0.0
date-released: "2022-10-07"
23 changes: 14 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,15 @@

<!--![sanger-tol/genomenote workflow](https://raw.githubusercontent.com/sanger-tol/genomenote/main/docs/images/sanger-tol-genomenote_workflow.png)-->

1. Summary statistics ([`NCBI datasets summary genome accession`](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/reference-docs/command-line/datasets/summary/genome/datasets_summary_genome_accession/))
2. Convert alignment to BED ([`samtools view`](https://www.htslib.org/doc/samtools-view.html), [`bedtools bamtobed`](https://bedtools.readthedocs.io/en/latest/content/tools/bamtobed.html))
3. Filter BED ([`GNU sort`](https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html), [`filter bed`](https://raw.githubusercontent.com/sanger-tol/genomenote/main/bin/filter_bed.sh))
4. Contact maps ([`Cooler cload`](https://cooler.readthedocs.io/en/latest/cli.html#cooler-cload-pairs), [`Cooler zoomify`](https://cooler.readthedocs.io/en/latest/cli.html#cooler-zoomify), [`Cooler dump`](https://cooler.readthedocs.io/en/latest/cli.html#cooler-dump))
5. Genome completeness ([`NCBI API`](https://www.ncbi.nlm.nih.gov/datasets/docs/v1/reference-docs/rest-api/), [`BUSCO`](https://busco.ezlab.org))
6. Consensus quality and k-mer completeness ([`FASTK`](https://github.com/thegenemyers/FASTK), [`MERQURY.FK`](https://github.com/thegenemyers/MERQURY.FK))
7. Collated summary table ([`createtable`](bin/create_table.py))
8. Present results and visualisations ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/))
1. Fetches genome metadata from [ENA](https://www.ebi.ac.uk/ena/browser/api/#/ENA_Browser_Data_API), [NCBI](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/reference-docs/rest-api), and [GoaT](https://goat.genomehubs.org/api-docs/)
2. Summary statistics ([`NCBI datasets summary genome accession`](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/reference-docs/command-line/datasets/summary/genome/datasets_summary_genome_accession/))
3. Convert alignment to BED ([`samtools view`](https://www.htslib.org/doc/samtools-view.html), [`bedtools bamtobed`](https://bedtools.readthedocs.io/en/latest/content/tools/bamtobed.html))
4. Filter BED ([`GNU sort`](https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html), [`filter bed`](https://raw.githubusercontent.com/sanger-tol/genomenote/main/bin/filter_bed.sh))
5. Contact maps ([`Cooler cload`](https://cooler.readthedocs.io/en/latest/cli.html#cooler-cload-pairs), [`Cooler zoomify`](https://cooler.readthedocs.io/en/latest/cli.html#cooler-zoomify), [`Cooler dump`](https://cooler.readthedocs.io/en/latest/cli.html#cooler-dump))
6. Genome completeness ([`NCBI API`](https://www.ncbi.nlm.nih.gov/datasets/docs/v1/reference-docs/rest-api/), [`BUSCO`](https://busco.ezlab.org))
7. Consensus quality and k-mer completeness ([`FASTK`](https://github.com/thegenemyers/FASTK), [`MERQURY.FK`](https://github.com/thegenemyers/MERQURY.FK))
8. Collated summary table ([`createtable`](bin/create_table.py))
9. Present results and visualisations ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/))

## Usage

Expand Down Expand Up @@ -52,6 +53,9 @@ nextflow run sanger-tol/genomenote \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--fasta genome.fasta \
--assembly GCA_922984935.2 \
--bioproject PRJEB49353 \
--biosample SAMEA7524400 \
--outdir <OUTDIR>
```

Expand All @@ -69,8 +73,9 @@ sanger-tol/genomenote was originally written by [Priyanka Surana](https://github
We thank the following people for their assistance in the development of this pipeline:

- [Matthieu Muffato](https://github.com/muffato)
- [Beth Yates](https://github.com/BethYates)
- [Shane McCarthy](https://github.com/mcshane) and [Yumi Sims](https://github.com/yumisims) for providing software and algorithm guidance.
- [Cibin Sadasivan Baby](https://github.com/cibinsb) and [Beth Yates](https://github.com/BethYates) for providing reviews.
- [Cibin Sadasivan Baby](https://github.com/cibinsb) for providing reviews.

## Contributions and Support

Expand Down
9 changes: 9 additions & 0 deletions assets/genome_metadata_template.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#File_source,File_type,Url,Output_type
ENA,Assembly,https://www.ebi.ac.uk/ena/browser/api/xml/ASSEMBLY_ACCESSION,xml
ENA,Bioproject,https://www.ebi.ac.uk/ena/browser/api/xml/BIOPROJECT_ACCESSION,xml
ENA,Biosample,https://www.ebi.ac.uk/ena/browser/api/xml/BIOSAMPLE_ACCESSION,xml
ENA,Taxonomy,https://www.ebi.ac.uk/ena/browser/api/xml/TAXONOMY_ID,xml
NCBI,Assembly,https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/ASSEMBLY_ACCESSION/dataset_report?filters.exclude_atypical=false&filters.assembly_version=current&chromosomes=1&chromosomes=2&chromosomes=3&chromosomes=X&chromosomes=Y&chromosomes=M,json
NCBI,Taxonomy,https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id=TAXONOMY_ID,xml
GOAT,Assembly,http://goat.genomehubs.org/api/v2/record?recordId=ASSEMBLY_ACCESSION&result=assembly&taxonomy=ncbi,json
COPO,Biosample,https://copo-project.org/api/sample/biosampleAccession/BIOSAMPLE_ACCESSION?standard=tol&return_type=json,json
Binary file added assets/genome_note_template.docx
Binary file not shown.
34 changes: 34 additions & 0 deletions assets/genome_note_template.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article>
<article>
<body>
<sec>
<title>Species taxonomy</title>
<p>{{ TAX_STRING }};
<italic>{{ GENUS }}</italic>;
<italic>{{ GENUS_SPECIES }}</italic> ($TAXONOMY_AUTHORITY) (NCBI:txid{{ NCBI_TAXID }}) {{ TEST_NOT_REPLACED }}.
</p>
</sec>
<sec>
<table>
<thead>
<tr>
<th align="center" valign="top">INSDC accession</th>
<th align="center" valign="top">Chromosome</th>
<th align="center" valign="top">Length (Mb)</th>
<th align="center" valign="top">GC%</th>
</tr>
</thead>
<tbody>
{% for chromosome in CHR_TABLE %}
<tr>
<td align="left" valign="top">{{ chromosome.get('Accession') }}</td>
<td align="center" valign="top">{{ chromosome.get('Chromosome') }}</td>
<td align="center" valign="top">{{ chromosome.get('Length') }}</td>
<td align="center" valign="top">{{ chromosome.get('GC') }}</td>
</tr>
{% endfor %}
</tbody>
</table>
</sec>
</body>
</article>
7 changes: 3 additions & 4 deletions assets/samplesheet.csv
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
sample,datatype,datafile
uoEpiScrs1,pacbio,https://tolit.cog.sanger.ac.uk/test-data/Epithemia_sp._CRS-2021b/genomic_data/uoEpiScrs1/pacbio/m64228e_220617_134154.ccs.bc1015_BAK8B_OA--bc1015_BAK8B_OA.rmdup.subset.bam
uoEpiScrs1,pacbio,https://tolit.cog.sanger.ac.uk/test-data/Epithemia_sp._CRS-2021b/genomic_data/uoEpiScrs1/pacbio/m64016e_220621_193126.ccs.bc1008_BAK8A_OA--bc1008_BAK8A_OA.rmdup.subset.bam
uoEpiScrs1c,hic,https://tolit.cog.sanger.ac.uk/test-data/Epithemia_sp._CRS-2021b/analysis/uoEpiScrs1.1/read_mapping/hic/GCA_946965045.1.unmasked.hic.uoEpiScrs1.subsampled.cram
uoEpiScrs1b,hic,https://tolit.cog.sanger.ac.uk/test-data/Epithemia_sp._CRS-2021b/analysis/uoEpiScrs1.1/read_mapping/hic/GCA_946965045.1.unmasked.hic.uoEpiScrs1.subsampled.bam
ilCerPisi1,pacbio,https://tolit.cog.sanger.ac.uk/test-data/Ceramica_pisi/genomic_data/ilCerPisi1/pacbio/m84047_230817_174414_s3.ccs.bc2048.subsampled.bam
ilCerPisi1,pacbio,https://tolit.cog.sanger.ac.uk/test-data/Ceramica_pisi/genomic_data/ilCerPisi1/pacbio/m64097e_230309_154741.ccs.bc1012_BAK8A_OA--bc1012_BAK8A_OA.subsampled.bam
ilCerPisi1,hic,https://tolit.cog.sanger.ac.uk/test-data/Ceramica_pisi/analysis/ilCerPisi1.1/read_mapping/hic/GCA_963859965.1.unmasked.hic.ilCerPisi2.subsampled.cram
145 changes: 145 additions & 0 deletions bin/check_parameters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
#!/usr/bin/env python3

import os
import sys
import requests
import argparse


def parse_args(args=None):
Description = "Use the genome assembly accession to fetch additional infromation on genome from ENA"
Epilog = "Example usage: python check_parameters.py --assembly --wgs_biosample --output"

parser = argparse.ArgumentParser(description=Description, epilog=Epilog)
parser.add_argument("--assembly", required=True, help="The INSDC accession for the assembly")
parser.add_argument("--wgs_biosample", required=True, help="The biosample accession for the WGS data")
parser.add_argument("--hic_biosample", required=False, help="The biosample accession for the Hi-C data")
parser.add_argument("--rna_biosample", required=False, help="The biosample accession for the RNASeq data")
parser.add_argument("--output", required=True, help="Output file path")
return parser.parse_args()


def make_dir(path):
if len(path) > 0:
os.makedirs(path, exist_ok=True)


def fetch_assembly_data(assembly, wgs_biosample, hic_biosample, rna_biosample, output_file):
url = f"https://www.ebi.ac.uk/ena/portal/api/search?query=assembly_set_accession%3D%22{assembly}%22&result=assembly&fields=assembly_set_accession%2Ctax_id%2Cscientific_name%2Cstudy_accession&limit=0&download=true&format=json"
response = requests.get(url)

if response.status_code == 200:
assembly_data = response.json()
taxon_id = assembly_data[0].get("tax_id", None)
species = assembly_data[0].get("scientific_name", None).replace(" ", "_")
study = assembly_data[0].get("study_accession", None)
params = [assembly, species, taxon_id]
header = ["assembly", "species", "taxon_id"]

if study:
study_url = f"https://www.ebi.ac.uk/ena/portal/api/search?query=study_accession%3D%22{study}%22&result=study&fields=parent_study_accession&limit=0&download=true&format=json"
study_response = requests.get(study_url)

if study_response.status_code == 200:
study_data = study_response.json()
studies = study_data[0].get("parent_study_accession").split(";")
params.append(studies[0])
header.append("bioproject")

else:
raise AssertionError(f"Could not determine the Bioproject linked to this assembly {assembly}\n")
else:
raise AssertionError(f"Could not determine the Bioproject linked to this assembly {assembly}\n")

# Validate wgs_biosample
wgs_url = f"https://www.ebi.ac.uk/ena/portal/api/search?query=sample_accession%3D%22{wgs_biosample}%22&result=sample&fields=sample_accession%2Ctax_id&limit=0&download=true&format=json"
wgs_response = requests.get(wgs_url)

if wgs_response.status_code == 200:
wgs_data = wgs_response.json()
tax_id = wgs_data[0].get("tax_id")

if tax_id != taxon_id:
raise AssertionError(
f"The WGS biosample taxon id: {tax_id} does not match the assembly taxon id: {taxon_id}\n"
)
else:
params.append(wgs_biosample)
header.append("wgs_biosample")

else:
raise AssertionError(f"The WGS biosample id: {wgs_biosample} could not retrieved from ENA\n")

# Validate hic_biosample
if hic_biosample and hic_biosample != "null":
print(hic_biosample)
hic_url = f"https://www.ebi.ac.uk/ena/portal/api/search?query=sample_accession%3D%22{hic_biosample}%22&result=sample&fields=sample_accession%2Ctax_id&limit=0&download=true&format=json"
hic_response = requests.get(hic_url)

if hic_response.status_code == 200:
hic_data = hic_response.json()
hic_tax_id = hic_data[0].get("tax_id")

if hic_tax_id != taxon_id:
raise AssertionError(
f"The Hi-C biosample taxon id: {hic_tax_id} does not match the assembly taxon id: {taxon_id}\n"
)
else:
header.append("hic_biosample")
params.append(hic_biosample)

else:
raise AssertionError(f"The Hi-C biosample id: {hic_biosample} could not retrieved from ENA\n")
else:
header.append("hic_biosample")
params.append("null")

# Validate rna_biosample
if rna_biosample and rna_biosample != "null":
rna_url = f"https://www.ebi.ac.uk/ena/portal/api/search?query=sample_accession%3D%22{rna_biosample}%22&result=sample&fields=sample_accession%2Ctax_id&limit=0&download=true&format=json"
rna_response = requests.get(rna_url)

if rna_response.status_code == 200:
rna_data = rna_response.json()
rna_tax_id = rna_data[0].get("tax_id")

if rna_tax_id != taxon_id:
raise AssertionError(
f"The RNASeq biosample taxon id: {rna_tax_id} does not match the assembly taxon id: {taxon_id}\n"
)
else:
header.append("rna_biosample")
params.append(rna_biosample)

else:
raise AssertionError(f"The RNASeq biosample id: {rna_biosample} could not retrieved from ENA\n")

else:
header.append("rna_biosample")
params.append("null")

with open(output_file, "w") as fout:
# Write header
fout.write(",".join(header) + "\n")
fout.write(",".join(params) + "\n")

return output_file
else:
raise AssertionError(f"The assemby accession: {assembly} was not found\n")


def main(args=None):
args = parse_args(args)
hic_biosample = args.hic_biosample
rna_biosample = args.rna_biosample
fetch_assembly_data(
args.assembly,
args.wgs_biosample,
hic_biosample,
rna_biosample,
args.output,
)


if __name__ == "__main__":
sys.exit(main())
Loading
Loading