Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This CL adds a new import for NCBI Gene. The data cleaning and testing is documented on [GitHub](https://github.com/datacommonsorg/data/pull/1084). NCBI Gene is updated daily. We included the following datasets in this import: #1004

Merged
merged 1 commit into from
Nov 1, 2024

Conversation

copybara-service[bot]
Copy link
Contributor

This CL adds a new import for NCBI Gene. The data cleaning and testing is documented on GitHub. NCBI Gene is updated daily. We included the following datasets in this import:

  1. NCBI Gene.
  2. gene2pubmed.
  3. gene_neighbors.
  4. gene_orthologs.
  5. gene_group.
  6. mim2gene_medgen.
  7. gene2go.
  8. gene2accession.
  9. gene2ensembl.
  10. generifs_basic.

NCBI Gene is a comprehensive resource containing information about genes from a wide range of species. It serves as a central hub for gene-specific data, integrating information from various sources and providing links to other relevant resources. It includes gene identification (e.g. official gene symbols, aliases, and cross-references to other databases), sequence information (e.g. genomic location and reference sequences (RefSeqs) for genomic DNA, transcripts, proteins, and mature peptides), functional information (gene function descriptions, associated pathways, related biological processes, orthologs, and related genes), phenotypic associations, (i.e. links to phenotypes and diseases associated with the gene), and links to relevant scientific papers (i.e. PubMed IDs).

"NCBI Gene supplies gene-specific connections in the nexus of map, sequence, expression, structure, function, citation, and homology data. Unique identifiers are assigned to genes with defining sequences, genes with known map positions, and genes inferred from phenotypic information. These gene identifiers are used throughout NCBI's databases and tracked through updates of annotation. Gene includes genomes represented by NCBI Reference Sequences (or RefSeqs) and is integrated for indexing and query and retrieval from NCBI's Entrez and E-Utilities systems. Gene comprises sequences from thousands of distinct taxonomic identifiers, ranging from viruses to bacteria to eukaryotes. It represents chromosomes, organelles, plasmids, viruses, transcripts, and millions of proteins."

@copybara-service copybara-service bot force-pushed the copybara2git_690868739 branch from 0f8b834 to abc6db7 Compare October 29, 2024 05:57
@copybara-service copybara-service bot force-pushed the copybara2git_690868739 branch from abc6db7 to d2d4d60 Compare October 29, 2024 06:15
@copybara-service copybara-service bot force-pushed the copybara2git_690868739 branch from d2d4d60 to aa40401 Compare October 30, 2024 15:35
@copybara-service copybara-service bot force-pushed the copybara2git_690868739 branch from aa40401 to 113ff2d Compare November 1, 2024 22:59
…g is documented on [GitHub](datacommonsorg/data#1084). NCBI Gene is updated daily. We included the following datasets in this import:

1. [NCBI Gene](https://ftp.ncbi.nih.gov/gene/DATA/gene_info.gz).
2. [gene2pubmed](https://ftp.ncbi.nih.gov/gene/DATA/gene2pubmed.gz).
3. [gene_neighbors](https://ftp.ncbi.nih.gov/gene/DATA/gene_neighbors.gz).
4. [gene_orthologs](https://ftp.ncbi.nih.gov/gene/DATA/gene_orthologs.gz).
5. [gene_group](https://ftp.ncbi.nih.gov/gene/DATA/gene_group.gz).
6. [mim2gene_medgen](https://ftp.ncbi.nih.gov/gene/DATA/mim2gene_medgen).
7. [gene2go](https://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz).
8. [gene2accession](https://ftp.ncbi.nih.gov/gene/DATA/gene2accession.gz).
9. [gene2ensembl](https://ftp.ncbi.nih.gov/gene/DATA/gene2ensembl.gz).
10. [generifs_basic](https://ftp.ncbi.nih.gov/gene/GeneRIF/generifs_basic.gz).

[NCBI Gene](https://www.ncbi.nlm.nih.gov/gene) is a comprehensive resource containing information about genes from a wide range of species. It serves as a central hub for gene-specific data, integrating information from various sources and providing links to other relevant resources. It includes gene identification (e.g. official gene symbols, aliases, and cross-references to other databases), sequence information (e.g. genomic location and reference sequences (RefSeqs) for genomic DNA, transcripts, proteins, and mature peptides), functional information (gene function descriptions, associated pathways, related biological processes, orthologs, and related genes), phenotypic associations, (i.e. links to phenotypes and diseases associated with the gene), and links to relevant scientific papers (i.e. PubMed IDs).

"[NCBI Gene](https://www.ncbi.nlm.nih.gov/gene) supplies gene-specific connections in the nexus of map, sequence, expression, structure, function, citation, and homology data. Unique identifiers are assigned to genes with defining sequences, genes with known map positions, and genes inferred from phenotypic information. These gene identifiers are used throughout NCBI's databases and tracked through updates of annotation. Gene includes genomes represented by [NCBI Reference Sequences](https://www.ncbi.nlm.nih.gov/refseq/) (or RefSeqs) and is integrated for indexing and query and retrieval from NCBI's Entrez and [E-Utilities](https://www.ncbi.nlm.nih.gov/books/NBK25501/) systems. Gene comprises sequences from thousands of distinct taxonomic identifiers, ranging from viruses to bacteria to eukaryotes. It represents chromosomes, organelles, plasmids, viruses, transcripts, and millions of proteins."

PiperOrigin-RevId: 692318175
@copybara-service copybara-service bot force-pushed the copybara2git_690868739 branch from 113ff2d to a391d90 Compare November 1, 2024 23:05
@copybara-service copybara-service bot merged commit a391d90 into main Nov 1, 2024
@copybara-service copybara-service bot deleted the copybara2git_690868739 branch November 1, 2024 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant