Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #19

Merged
merged 26 commits into from
May 13, 2024
Merged

Dev #19

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
e0ad270
remove comment miniac_gw script getStats process
nicomaper Sep 21, 2023
d127597
Remove code duplication in main workflow
Sep 21, 2023
a059ef2
Add missing curly brace
Sep 21, 2023
2c92d71
Use switch for setting species id
Sep 21, 2023
b52ec46
Add missing breaks to switch
Sep 21, 2023
3138b1c
Exclude nextflow cache dir and logs from version control
Sep 21, 2023
9dfe14d
Rename gene metadata files to match other file path structure
Sep 21, 2023
ddc800e
Rename MotMapsFile_gw and *_lb parameters to just MotifMapsFile
Sep 21, 2023
f8aa8cf
Refine gitignore
Sep 21, 2023
3b1acb2
Merge branch 'dev' into feature/remove-duplication-from-main-workflow
Sep 21, 2023
f67e021
Merge branch 'dev' into feature/remove-commented-code
nicomaper Sep 21, 2023
b414990
Ignore bin folder
Sep 21, 2023
768eddb
Merge branch 'dev' into feature/remove-commented-code
nicomaper Sep 27, 2023
8a7df76
Merge branch 'dev' into feature/remove-duplication-from-main-workflow
nicomaper Sep 27, 2023
c645c02
Merge branch 'feature/automated-e2e-test' into feature/remove-duplica…
Sep 28, 2023
b81569c
Revert accidental change of p-value for locus-based
Sep 28, 2023
99ea93b
Provide gene coords for gw mode only
Sep 28, 2023
b5637ee
Correct test input file paths due to rename
Sep 28, 2023
d70ee8d
Pass full params object from main workflow to gw and lb workflows
Sep 28, 2023
6baf449
Merge branch 'dev' into feature/remove-duplication-from-main-workflow
nicomaper Sep 29, 2023
bef0d07
update on .gitignore
nicomaper Sep 29, 2023
9200afb
Merge branch 'dev' into feature/remove-duplication-from-main-workflow
Oct 16, 2023
ed3f43d
Merge pull request #3 from VIB-PSB/feature/remove-duplication-from-ma…
nicomaper Oct 27, 2023
6503a60
Merge branch 'dev' into feature/remove-commented-code
hdbeukel May 13, 2024
d637a1f
Merge pull request #2 from VIB-PSB/feature/remove-commented-code
hdbeukel May 13, 2024
7d191b4
Merge pull request #20 from VIB-PSB/main
nicomaper May 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/configuration_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ There are mainly two cases in which the user might want to alter the internal MI

### Modification of the motif mapping file for the locus-based mode of maize

By default, the maize MINI-AC locus-based mode (for both genome versions) runs on the "medium" non-coding genomic space, which corresponds, for each locus in the genome, to the 5kb upstream of the translation start site, the 1kb downstream of the translation end site, and the introns. However, we generated two additional motif mapping files for the locus-based mode of maize, that cover "large" (15kb upstream of the translation start site, the 2.5kb downstream of the translation end site, and the introns), and "small" (1kb upstream of the translation start site, the 1kb downstream of the translation end site, and the introns) non-coding genomic spaces. For Arabidopsis only the "medium" non-coding genomic space motif mapping file was generated because it already covers 73.5% of the whole non-coding genomic psace (see publication). To use these files, first they need to be downloaded, and then, the corresponding parameters of the motif mapping file (```MotMapsFile_lb```) and the non-coding genomic space coordinates file (```Promoter_file```) should be modified either on the command line or in the configuration file.
By default, the maize MINI-AC locus-based mode (for both genome versions) runs on the "medium" non-coding genomic space, which corresponds, for each locus in the genome, to the 5kb upstream of the translation start site, the 1kb downstream of the translation end site, and the introns. However, we generated two additional motif mapping files for the locus-based mode of maize, that cover "large" (15kb upstream of the translation start site, the 2.5kb downstream of the translation end site, and the introns), and "small" (1kb upstream of the translation start site, the 1kb downstream of the translation end site, and the introns) non-coding genomic spaces. For Arabidopsis only the "medium" non-coding genomic space motif mapping file was generated because it already covers 73.5% of the whole non-coding genomic psace (see publication). To use these files, first they need to be downloaded, and then, the corresponding parameters of the motif mapping file (```MotMapsFile```) and the non-coding genomic space coordinates file (```Promoter_file```) should be modified either on the command line or in the configuration file.

To download the maize "large" motif mapping file and coordinates of the "large" non-coding genomic space:

Expand Down Expand Up @@ -192,14 +192,14 @@ wget https://zenodo.org/record/8386283/files/zma_v5_promoter_1kbup_1kbdown_sorte
Then (using the "small" definition as example), change the parameters on the command line:

```
nextflow -C mini_ac.config run mini_ac.nf --mode locus_based --species maize_v4 --MotMapsFile_lb data/zma_v4/zma_v4_locus_based_motif_mappings_1kbup_1kbdown.bed --Promoter_file data/zma_v4/zma_v4_promoter_1kbup_1kbdown_sorted.bed
nextflow -C mini_ac.config run mini_ac.nf --mode locus_based --species maize_v4 --MotMapsFile data/zma_v4/zma_v4_locus_based_motif_mappings_1kbup_1kbdown.bed --Promoter_file data/zma_v4/zma_v4_promoter_1kbup_1kbdown_sorted.bed
```
or add them to the configuration file, along with the other parameters:

```nextflow
params {
/// [Other parameters...]
MotMapsFile_lb = "$projectDir/data/zma_v4/zma_v4_locus_based_motif_mappings_1kbup_1kbdown.bed"
MotMapsFile = "$projectDir/data/zma_v4/zma_v4_locus_based_motif_mappings_1kbup_1kbdown.bed"
Promoter_file = "$projectDir/data/zma_v4/zma_v4_promoter_1kbup_1kbdown_sorted.bed"
/// [Other parameters...]
}
Expand Down
142 changes: 33 additions & 109 deletions mini_ac.nf
Original file line number Diff line number Diff line change
Expand Up @@ -10,129 +10,53 @@ workflow MINIAC {
params.Shuffle_seed = -1
params.Csv_output = false

if (params.mode == "genome_wide" && params.species == "maize_v4") {

params.MotMapsFile_gw = "$projectDir/data/zma_v4/zma_v4_genome_wide_motif_mappings.bed"
params.Non_cod_genome = "$projectDir/data/zma_v4/zma_v4_noncod_merged.bed"
params.Faix_file = "$projectDir/data/zma_v4/zma_v4.fasta.fai"
params.Motif_tf_file = "$projectDir/data/zma_v4/zma_v4_motif_TF_file.txt"
params.Genes_coords = "$projectDir/data/zma_v4/zma_v4_genes_coords_sorted.bed"
params.Feature_file = "$projectDir/data/zma_v4/zma_v4_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/zma_v4/zma_v4_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/zma_v4/maize_v4_gene_metadata_file.txt"
params.P_val = 0.1

genome_wide_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.Second_gene_annot, params.Second_gene_dist, params.MotMapsFile_gw,
params.Non_cod_genome, params.Faix_file, params.Motif_tf_file, params.Genes_coords, params.Feature_file,
params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)
// define species id used for data subfolder and data file prefix
def species
switch(params.species) {
case "arabidopsis":
species = "ath"
break
case "maize_v4":
species = "zma_v4"
break
case "maize_v5":
species = "zma_v5"
break
default:
exit 1, "MINI-AC can only be run for the species 'arabidopsis', 'maize_v4' and 'maize_v5'. Instead it got '${params.species}'."
}

else if (params.mode == "genome_wide" && params.species == "maize_v5") {
// set input data parameters shared between genome-wide and locus-based modes
params.Faix_file = "$projectDir/data/${species}/${species}.fasta.fai"
params.Motif_tf_file = "$projectDir/data/${species}/${species}_motif_TF_file.txt"
params.Feature_file = "$projectDir/data/${species}/${species}_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/${species}/${species}_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/${species}/${species}_gene_metadata_file.txt"

params.MotMapsFile_gw = "$projectDir/data/zma_v5/zma_v5_genome_wide_motif_mappings.bed"
params.Non_cod_genome = "$projectDir/data/zma_v5/zma_v5_noncod_merged.bed"
params.Faix_file = "$projectDir/data/zma_v5/zma_v5.fasta.fai"
params.Motif_tf_file = "$projectDir/data/zma_v5/zma_v5_motif_TF_file.txt"
params.Genes_coords = "$projectDir/data/zma_v5/zma_v5_genes_coords_sorted.bed"
params.Feature_file = "$projectDir/data/zma_v5/zma_v5_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/zma_v5/zma_v5_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/zma_v5/maize_v5_gene_metadata_file.txt"
params.P_val = 0.1
if (params.mode == "genome_wide") {

genome_wide_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.Second_gene_annot, params.Second_gene_dist, params.MotMapsFile_gw,
params.Non_cod_genome, params.Faix_file, params.Motif_tf_file, params.Genes_coords, params.Feature_file,
params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)
}

else if (params.mode == "genome_wide" && params.species == "arabidopsis") {
params.MotMapsFile = "$projectDir/data/${species}/${species}_genome_wide_motif_mappings.bed"
params.Non_cod_genome = "$projectDir/data/${species}/${species}_noncod_merged.bed"
params.Genes_coords = "$projectDir/data/${species}/${species}_genes_coords_sorted.bed"

params.MotMapsFile_gw = "$projectDir/data/ath/ath_genome_wide_motif_mappings.bed"
params.Non_cod_genome = "$projectDir/data/ath/ath_noncod_merged.bed"
params.Faix_file = "$projectDir/data/ath/ath.fasta.fai"
params.Motif_tf_file = "$projectDir/data/ath/ath_motif_TF_file.txt"
params.Genes_coords = "$projectDir/data/ath/ath_genes_coords_sorted.bed"
params.Feature_file = "$projectDir/data/ath/ath_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/ath/ath_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/ath/arabidopsis_gene_metadata_file.txt"
params.P_val = 0.1

genome_wide_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.Second_gene_annot, params.Second_gene_dist, params.MotMapsFile_gw,
params.Non_cod_genome, params.Faix_file, params.Motif_tf_file, params.Genes_coords, params.Feature_file,
params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)

}

else if (params.mode == "locus_based" && params.species == "maize_v4") {

params.MotMapsFile_lb = "$projectDir/data/zma_v4/zma_v4_locus_based_motif_mappings_5kbup_1kbdown.bed"
params.Promoter_file = "$projectDir/data/zma_v4/zma_v4_promoter_5kbup_1kbdown_sorted.bed"
params.Faix_file = "$projectDir/data/zma_v4/zma_v4.fasta.fai"
params.Motif_tf_file = "$projectDir/data/zma_v4/zma_v4_motif_TF_file.txt"
params.Feature_file = "$projectDir/data/zma_v4/zma_v4_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/zma_v4/zma_v4_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/zma_v4/maize_v4_gene_metadata_file.txt"
params.P_val = 0.01

locus_based_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.MotMapsFile_lb, params.Promoter_file, params.Faix_file, params.Motif_tf_file,
params.Feature_file, params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)

}
genome_wide_miniac(params)

} else if (params.mode == "locus_based") {

else if (params.mode == "locus_based" && params.species == "maize_v5") {
params.MotMapsFile = "$projectDir/data/${species}/${species}_locus_based_motif_mappings_5kbup_1kbdown.bed"
params.Promoter_file = "$projectDir/data/${species}/${species}_promoter_5kbup_1kbdown_sorted.bed"

params.MotMapsFile_lb = "$projectDir/data/zma_v5/zma_v5_locus_based_motif_mappings_5kbup_1kbdown.bed"
params.Promoter_file = "$projectDir/data/zma_v5/zma_v5_promoter_5kbup_1kbdown_sorted.bed"
params.Faix_file = "$projectDir/data/zma_v5/zma_v5.fasta.fai"
params.Motif_tf_file = "$projectDir/data/zma_v5/zma_v5_motif_TF_file.txt"
params.Feature_file = "$projectDir/data/zma_v5/zma_v5_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/zma_v5/zma_v5_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/zma_v5/maize_v5_gene_metadata_file.txt"
params.P_val = 0.01

locus_based_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.MotMapsFile_lb, params.Promoter_file, params.Faix_file, params.Motif_tf_file,
params.Feature_file, params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)

locus_based_miniac(params)

} else {
exit 1, "MINI-AC can only be run using the modes 'genome_wide' or 'locus_based'. Instead it got '${params.mode}'."
}

else if (params.mode == "locus_based" && params.species == "arabidopsis") {

params.MotMapsFile_lb = "$projectDir/data/ath/ath_locus_based_motif_mappings_5kbup_1kbdown.bed"
params.Promoter_file = "$projectDir/data/ath/ath_promoter_5kbup_1kbdown_sorted.bed"
params.Faix_file = "$projectDir/data/ath/ath.fasta.fai"
params.Motif_tf_file = "$projectDir/data/ath/ath_motif_TF_file.txt"
params.Feature_file = "$projectDir/data/ath/ath_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/ath/ath_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/ath/arabidopsis_gene_metadata_file.txt"
params.P_val = 0.01

locus_based_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.MotMapsFile_lb, params.Promoter_file, params.Faix_file, params.Motif_tf_file,
params.Feature_file, params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)
}

else {
exit 1, "MINI-AC can only be run using the modes 'genome_wide' and 'locus_based', and with the species 'arabidopsis', 'maize_v4' and 'maize_v5'. Instead it got '${params.species}' and '${params.mode}' "
}
}


workflow {
MINIAC()
}
16 changes: 8 additions & 8 deletions tests/mini_ac.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@ nextflow_workflow {
Shuffle_seed = 42

//// Hard code data paths
MotMapsFile_gw = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_genome_wide_motif_mappings_chr1.bed"
MotMapsFile = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_genome_wide_motif_mappings_chr1.bed"
Non_cod_genome = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_noncod_merged_chr1.bed"
Faix_file = "${baseDir}/data/zma_v4/zma_v4.fasta.fai"
Motif_tf_file = "${baseDir}/data/zma_v4/zma_v4_motif_TF_file.txt"
Genes_coords = "${baseDir}/data/zma_v4/zma_v4_genes_coords_sorted.bed"
Feature_file = "${baseDir}/data/zma_v4/zma_v4_go_gene_file.txt"
TF_fam_file = "${baseDir}/data/zma_v4/zma_v4_TF_family_file.txt"
Genes_metadata = "${baseDir}/data/zma_v4/maize_v4_gene_metadata_file.txt"
Genes_metadata = "${baseDir}/data/zma_v4/zma_v4_gene_metadata_file.txt"
OBO_file = "${baseDir}/data/ontologies/go.obo"

//// Output folder
Expand Down Expand Up @@ -91,13 +91,13 @@ nextflow_workflow {
Shuffle_seed = 42

//// Hard code data paths
MotMapsFile_lb = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_locus_based_motif_mappings_5kbup_1kbdown_chr1.bed"
MotMapsFile = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_locus_based_motif_mappings_5kbup_1kbdown_chr1.bed"
Promoter_file = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_promoter_5kbup_1kbdown_sorted_chr1.bed"
Faix_file = "${baseDir}/data/zma_v4/zma_v4.fasta.fai"
Motif_tf_file = "${baseDir}/data/zma_v4/zma_v4_motif_TF_file.txt"
Feature_file = "${baseDir}/data/zma_v4/zma_v4_go_gene_file.txt"
TF_fam_file = "${baseDir}/data/zma_v4/zma_v4_TF_family_file.txt"
Genes_metadata = "${baseDir}/data/zma_v4/maize_v4_gene_metadata_file.txt"
Genes_metadata = "${baseDir}/data/zma_v4/zma_v4_gene_metadata_file.txt"
OBO_file = "${baseDir}/data/ontologies/go.obo"

//// Output folder
Expand Down Expand Up @@ -153,14 +153,14 @@ nextflow_workflow {
Shuffle_seed = 42

//// Hard code data paths
MotMapsFile_gw = "${baseDir}/data/ath/ath_genome_wide_motif_mappings.bed"
MotMapsFile = "${baseDir}/data/ath/ath_genome_wide_motif_mappings.bed"
Non_cod_genome = "${baseDir}/data/ath/ath_noncod_merged.bed"
Faix_file = "${baseDir}/data/ath/ath.fasta.fai"
Motif_tf_file = "${baseDir}/data/ath/ath_motif_TF_file.txt"
Genes_coords = "${baseDir}/data/ath/ath_genes_coords_sorted.bed"
Feature_file = "${baseDir}/data/ath/ath_go_gene_file.txt"
TF_fam_file = "${baseDir}/data/ath/ath_TF_family_file.txt"
Genes_metadata = "${baseDir}/data/ath/arabidopsis_gene_metadata_file.txt"
Genes_metadata = "${baseDir}/data/ath/ath_gene_metadata_file.txt"
OBO_file = "${baseDir}/data/ontologies/go.obo"

//// Output folder
Expand Down Expand Up @@ -220,13 +220,13 @@ nextflow_workflow {
Shuffle_seed = 42

//// Hard code data paths
MotMapsFile_lb = "${baseDir}/data/ath/ath_locus_based_motif_mappings_5kbup_1kbdown.bed"
MotMapsFile = "${baseDir}/data/ath/ath_locus_based_motif_mappings_5kbup_1kbdown.bed"
Promoter_file = "${baseDir}/data/ath/ath_promoter_5kbup_1kbdown_sorted.bed"
Faix_file = "${baseDir}/data/ath/ath.fasta.fai"
Motif_tf_file = "${baseDir}/data/ath/ath_motif_TF_file.txt"
Feature_file = "${baseDir}/data/ath/ath_go_gene_file.txt"
TF_fam_file = "${baseDir}/data/ath/ath_TF_family_file.txt"
Genes_metadata = "${baseDir}/data/ath/arabidopsis_gene_metadata_file.txt"
Genes_metadata = "${baseDir}/data/ath/ath_gene_metadata_file.txt"
OBO_file = "${baseDir}/data/ontologies/go.obo"

//// Output folder
Expand Down
Loading
Loading