You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 17, 2023. It is now read-only.
A00111:192:HFVL5DMXX:1:1167:31439:29512 YP_009412754.1 100.0 2.3e-08 62.0 YP_009412754.1 cytochrome c oxidase subunit II (mitochondrion) [Microcebus arnholdi] 864580 Microcebus arnholdi Eukaryota Chordata
Why are these coming up?
Here is the GeneCards summary for ferritin (emphasis mine)
This gene encodes the heavy subunit of ferritin, the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in ferritin proteins are associated with several neurodegenerative diseases. This gene has multiple pseudogenes. Several alternatively spliced transcript variants have been observed, but their biological validity has not been determined. [provided by RefSeq, Jul 2008]
So it's very possible that these are only appearing in these genomes as a result of contamination, and weren't properly filtered out. If you look closely, the protein IDs start with either XP_ or YP_, whereas a "good" result looks like:
A00111:47:H2HT7DMXX:1:1172:23556:31454 NP_001334083.1 100.0 8.1e-09 63.5 NP_001334083.1 major urinary protein 22 precursor [Mus musculus] 10090 Mus musculus Eukaryota Chordata
Accession numbers that begin with the prefix XM_ (mRNA), XR_ (non-coding RNA), and XP_ (protein) are model RefSeqs produced either by NCBI’s genome annotation pipeline or copied from computationally annotated submissions to the INSDC. These RefSeq records are derived from the genome sequence and have varying levels of transcript or protein homology support. They represent the predicted transcripts and proteins annotated on the NCBI RefSeq contigs and may differ from INSDC mRNA submissions or from the subsequently curated RefSeq records (with NM_, NR_, or NP_ accession prefixes).
Therefore → ONLY NP RECORDS SHOULD BE USED FROM NCBI!!
Everything else is purely computationally generated and not to be trusted
The text was updated successfully, but these errors were encountered:
In some analyses, I was getting a ton of matches to ribosomal proteins, mitochondrial genes, and ferritin.
Ferritin example
Ribosomal example
Mitochondrial example
Why are these coming up?
Here is the GeneCards summary for ferritin (emphasis mine)
So it's very possible that these are only appearing in these genomes as a result of contamination, and weren't properly filtered out. If you look closely, the protein IDs start with either
XP_
orYP_
, whereas a "good" result looks like:Why is this happening?
From NCBI’s website: https://www.ncbi.nlm.nih.gov/books/NBK50679/#RefSeqFAQ.what_is_the_difference_between
Therefore → ONLY NP RECORDS SHOULD BE USED FROM NCBI!!
Everything else is purely computationally generated and not to be trusted
The text was updated successfully, but these errors were encountered: