Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BioMart request declined #248

Open
zhenzuo2 opened this issue Dec 4, 2024 · 5 comments
Open

BioMart request declined #248

zhenzuo2 opened this issue Dec 4, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@zhenzuo2
Copy link

zhenzuo2 commented Dec 4, 2024

Description of feature

Hi,

Thank you for developing epitopeprediction! I got error when running at line 1133

transcriptProteinTable = ma.get_protein_ids_from_transcripts(transcripts, type=EIdentifierTypes.ENSEMBL).

If the input variant list is too long and then BioMart will decline my request (due to too many times). Is there a way I can run it locally? Thank you!

Best,

Zhen

@zhenzuo2 zhenzuo2 added the enhancement New feature or request label Dec 4, 2024
@jonasscheid
Copy link
Contributor

Hi! Thanks for using the Pipeline.

Currently, parsing of a local biomart version is not possible. We rely on querying biomart unfortunately

@jonasscheid
Copy link
Contributor

Solutions to use a local biomart version might be a good addition to the pipeline, e.g. implementing pyensemble in the variant prediction part

@zhenzuo2
Copy link
Author

zhenzuo2 commented Dec 4, 2024

Thank you so much for your prompt response!

@christopher-mohr
Copy link
Collaborator

Description of feature

Hi,

Thank you for developing epitopeprediction! I got error when running at line 1133

transcriptProteinTable = ma.get_protein_ids_from_transcripts(transcripts, type=EIdentifierTypes.ENSEMBL).

If the input variant list is too long and then BioMart will decline my request (due to too many times). Is there a way I can run it locally? Thank you!

Best,

Zhen

Hi @zhenzuo2,

Did you try using the "splitting functionality" that is implemented in the pipeline? You can get an overview of the parameters that can be used here under "Run optimisation": https://nf-co.re/epitopeprediction/2.3.1/parameters/

Not sure if it would help in your case but it's worth a try.

Best,
Chris

@zhenzuo2
Copy link
Author

Description of feature

Hi,
Thank you for developing epitopeprediction! I got error when running at line 1133
transcriptProteinTable = ma.get_protein_ids_from_transcripts(transcripts, type=EIdentifierTypes.ENSEMBL).
If the input variant list is too long and then BioMart will decline my request (due to too many times). Is there a way I can run it locally? Thank you!
Best,
Zhen

Hi @zhenzuo2,

Did you try using the "splitting functionality" that is implemented in the pipeline? You can get an overview of the parameters that can be used here under "Run optimisation": https://nf-co.re/epitopeprediction/2.3.1/parameters/

Not sure if it would help in your case but it's worth a try.

Best, Chris

Description of feature

Hi,
Thank you for developing epitopeprediction! I got error when running at line 1133
transcriptProteinTable = ma.get_protein_ids_from_transcripts(transcripts, type=EIdentifierTypes.ENSEMBL).
If the input variant list is too long and then BioMart will decline my request (due to too many times). Is there a way I can run it locally? Thank you!
Best,
Zhen

Hi @zhenzuo2,

Did you try using the "splitting functionality" that is implemented in the pipeline? You can get an overview of the parameters that can be used here under "Run optimisation": https://nf-co.re/epitopeprediction/2.3.1/parameters/

Not sure if it would help in your case but it's worth a try.

Best, Chris

Thank you for sharing this, Chris. I haven’t had a chance to try it yet. The issue is that for security reason, the computing servers we use are not allowed to connect to the internet. My current solution is to download dataframe from Biomart and use as an input file to that function.

def get_protein_ids_from_transcripts_offline(transcripts, data_path = "mart_export.txt"):
    df = pd.read_csv(data_path)
    result = df.loc[df["Transcript stable ID version"].isin(transcripts),["Protein stable ID","RefSeq peptide ID","UniProtKB/Swiss-Prot ID", "Transcript stable ID version"]]
    result.columns = ["ensembl_id", "refseq_id",
                        "uniprot_id", "transcript_id"]
    print("Offline Now!") 
    return result

Using similar ways I changed a few other functions, such as generate_transcripts_from_variants() and generate_peptides_from_variants(). It works now. I will try "Run optimisation" you mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants