Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add script to get organisms and genomes from ncbi api (#159) #160

Merged
merged 6 commits into from
Nov 16, 2024

Conversation

hunterckx
Copy link
Collaborator

No description provided.

"taxon": genome_info["organism"]["organism_name"],
"taxonomyId": genome_info["organism"]["tax_id"],
"accession": genome_info["accession"],
"isRef": (not (refseq_category is None)) and ("reference" in refseq_category),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more appropriate way to derive isRef?

assemblies_df = pd.DataFrame(requests.get(ASSEMBLIES_URL).json()["data"])[["ucscBrowser", "genBank", "refSeq"]]

gen_bank_merge_df = genomes_source_df.merge(assemblies_df, how="left", left_on="pairedAccession", right_on="genBank")
ref_seq_merge_df = genomes_source_df.merge(assemblies_df, how="left", left_on="accession", right_on="refSeq")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do pairedAccession/genBank and accession/refSeq actually correspond like this? Do we need to be using both pairs?

@hunterckx hunterckx marked this pull request as ready for review November 8, 2024 23:11
@hunterckx hunterckx force-pushed the hunter/159-genomes-from-ncbi-api branch from 27112d0 to eb0d485 Compare November 16, 2024 00:03
@hunterckx hunterckx changed the title feat: add script to get genomes from ncbi api (#159) feat: add script to get organisms and genomes from ncbi api (#159) Nov 16, 2024
@NoopDog NoopDog self-requested a review November 16, 2024 01:03
@NoopDog NoopDog merged commit 61413a2 into main Nov 16, 2024
1 check passed
@NoopDog NoopDog deleted the hunter/159-genomes-from-ncbi-api branch November 16, 2024 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants