Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ancestral: --genes does not accept file in contrast to --help string #1319

Closed
corneliusroemer opened this issue Sep 26, 2023 · 3 comments · Fixed by #1353
Closed

ancestral: --genes does not accept file in contrast to --help string #1319

corneliusroemer opened this issue Sep 26, 2023 · 3 comments · Fixed by #1353
Assignees
Labels
bug Something isn't working

Comments

@corneliusroemer
Copy link
Member

corneliusroemer commented Sep 26, 2023

Current Behavior

Passing a file with a list of strings as genes, e.g. augur ancestral --genes genes.txt throws error:

$   augur ancestral             --tree results/b1/tree.nwk             --annotation resources/genemap.gff       \
      --alignment results/b1/aligned.fasta             --infer-ambiguous            \
 --translations results/b1/%gene.fasta             --genes resources/genes.txt       \
      --output-node-data results/b1/nt_muts.json             --inference joint
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Couldn't find gene resources/genes.txt in GFF or GenBank file
Read in 1 features from reference sequence file
Processing gene: resources/genes.txt
Traceback (most recent call last):
  File "/Users/corneliusromer/code/augur/augur/__init__.py", line 66, in run
    return args.__command__.run(args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/corneliusromer/code/augur/augur/ancestral.py", line 313, in run
    feat = features[gene]
           ~~~~~~~~^^^^^^
KeyError: 'resources/genes.txt'


An error occurred (see above) that has not been properly handled by Augur.
To report this, please open a new issue including the original command and the error above:
    <https://github.com/nextstrain/augur/issues/new/choose>

Expected behavior

Help string says this should work:

 --genes GENES [GENES ...]
                        genes to translate (list or file containing list) (default: None)

Repro

See failing test in #1320
See action failure for example: https://github.com/nextstrain/augur/actions/runs/6307494622/job/17124219915#step:9:19

Possible solution

Quick fix: change docs

But better: accept file, as for monkeypox, it's not so nice to have to pass a string of 800 characters (~140 genes)

Additional context

Add any other context about the problem here.

@corneliusroemer
Copy link
Member Author

Without being able to pass a file the command looks like this 😬

 $ augur ancestral             --tree results/b1/tree.nwk             --annotation resources/genemap.gff             --alignment results/b1/aligned.fasta             --infer-ambiguous             --translations results/b1/%gene.fasta             --genes OPG001 OPG002 OPG003 OPG005 OPG015 OPG019 OPG021 OPG022 OPG023 OPG024 OPG025 OPG027 OPG029 OPG030 OPG031 OPG034 OPG035 OPG036 OPG037 OPG038 OPG039 OPG040 OPG042 OPG043 OPG044 OPG045 OPG046 OPG047 OPG048 OPG049 OPG050 OPG051 OPG052 OPG053 OPG054 OPG055 OPG056 OPG057 OPG058 OPG059 OPG060 OPG061 OPG062 OPG063 OPG064 OPG065 OPG066 OPG068 OPG069 OPG070 OPG071 OPG072 OPG074 OPG075 OPG076 OPG077 OPG078 OPG079 OPG080 OPG081 OPG082 OPG083 OPG084 OPG085 OPG086 OPG087 OPG088 OPG089 OPG090 OPG091 OPG092 OPG093 OPG094 OPG095 OPG096 OPG097 OPG098 OPG099 OPG100 OPG101 OPG102 OPG103 OPG104 OPG105 OPG106 OPG107 OPG108 OPG109 OPG110 OPG111 OPG112 OPG113 OPG114 OPG115 OPG116 OPG117 OPG118 OPG119 OPG120 OPG121 OPG122 OPG123 OPG124 OPG125 OPG126 OPG127 OPG128 OPG129 OPG130 OPG131 OPG132 OPG133 OPG134 OPG135 OPG136 OPG137 OPG138 OPG139 OPG140 OPG141 OPG142 OPG143 OPG144 OPG145 OPG146 OPG147 OPG148 OPG149 OPG150 OPG151 OPG153 OPG154 OPG155 OPG156 OPG157 OPG158 OPG159 OPG160 OPG161 OPG162 OPG163 OPG164 OPG165 OPG167 OPG170 OPG171 OPG172 OPG173 OPG174 OPG175 OPG176 OPG178 OPG180 OPG181 OPG185 OPG187 OPG188 OPG189 OPG190 OPG191 OPG192 OPG193 OPG195 OPG197 OPG198 OPG199 OPG200 OPG204 OPG205 OPG208 OPG209 OPG210             --output-node-data results/b1/nt_muts.json             --inference joint

@victorlin
Copy link
Member

I'm in favor of implementing the read from file functionality. I'd repurpose the existing function that parses the file for augur filter --include FILE since it serves a similar purpose and has useful features, notably allowing # comments and empty lines.

@jameshadfield
Copy link
Member

jameshadfield commented Dec 4, 2023

Yes - we should 100% implement this. augur translate has this capability, here's how it's implemented:

augur/augur/translate.py

Lines 338 to 342 in 7cb3848

# If genes is a file, read in the genes to translate
if args.genes and len(args.genes) == 1 and os.path.isfile(args.genes[0]):
genes = get_genes_from_file(args.genes[0])
else:
genes = args.genes

augur/augur/translate.py

Lines 299 to 315 in 7cb3848

def get_genes_from_file(fname):
genes = []
if os.path.isfile(fname):
with open(fname, encoding='utf-8') as ifile:
for line in ifile:
fields = line.strip().split('#')
if fields[0].strip():
genes.append(fields[0].strip())
else:
print("File with genes not found. Looking for", fname)
unique_genes = np.unique(np.array(genes))
if len(unique_genes) != len(genes):
print("You have duplicates in your genes file. They are being ignored.")
print("Read in {} specified genes to translate.".format(len(unique_genes)))
return unique_genes

The code to actually parse the file is essentially identical to the implementation victor linked to, and so we should probably use the utils.py version in both ancestral and translate to reduce duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants