From cd150097b2b808d67d8928b6f7e6601e99c3dd14 Mon Sep 17 00:00:00 2001 From: Kim Andrews <17375001+kimandrews@users.noreply.github.com> Date: Mon, 10 Jun 2024 11:49:25 -0700 Subject: [PATCH] Fixup: Add date annotations for rare genotypes Six of the samples that are force-included in the Nextclade dataset tree have empty collection date fields in the metadata output from NCBI Datasets. This results in the samples being removed downstream by the TreeTime clock filter. This commit adds collection dates (which were manually extracted from the strain names in the NCBI metadata) for these samples so that they will be included in the Nextclade dataset tree. --- ingest/defaults/annotations.tsv | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/ingest/defaults/annotations.tsv b/ingest/defaults/annotations.tsv index 386122a..a951ca9 100644 --- a/ingest/defaults/annotations.tsv +++ b/ingest/defaults/annotations.tsv @@ -146,3 +146,13 @@ U64582 date 1988-XX-XX X84865 date 1994-XX-XX X84872 date 1990-XX-XX X84879 date 1971-XX-XX +# +# Strains with rare genotypes +# Dates are retrieved from epi-weeks reported within strain names on NCBI +# These are force-included in the nextclade tree to boost representation of rare genotypes +AF410989 date 1987-03-09 # genotype E +AY037009 date 2000-06-12 # genotype G2 +AY037043 date 2000-04-17 # genotype H2 +AY037026 date 1997-03-24 # genotype H2 +AY037028 date 2000-03-13 # genotype D2 +FJ668380 date 2003-02-10 # genotype D10