Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

help with the input files #1

Open
bitcometz opened this issue Jun 15, 2023 · 3 comments
Open

help with the input files #1

bitcometz opened this issue Jun 15, 2023 · 3 comments

Comments

@bitcometz
Copy link

hello, thanks for your hard work to redo the GeneFormer examples, very greatful job !!!

I want to redo the analysis too, but it is difficult for me to download the whole dataset files.

Just like in this notebook,
could you also provide the input files:"/content/drive/MyDrive/Genecorpus-30M/genecorpus_100K_2048.dataset"

Thanks !!!

@kzkedzierska
Copy link

I think they might have used this from the Geneformer dataset: https://huggingface.co/datasets/ctheodoris/Genecorpus-30M/tree/main/genecorpus_30M_2048.dataset

@bitcometz
Copy link
Author

thanks!!!
but the original 100M is too big for me to download.

@AnjaliS1
Copy link

AnjaliS1 commented Jun 26, 2023

Hi, could I also get the 100k subsampled version of the dataset as the original dataset is also too large for me to download?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants