Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non existent hacker_news.stories public BQ dataset is referenced in multiple notebooks #2432

Open
MrCsabaToth opened this issue Sep 10, 2023 · 0 comments · May be fixed by #2433
Open

Non existent hacker_news.stories public BQ dataset is referenced in multiple notebooks #2432

MrCsabaToth opened this issue Sep 10, 2023 · 0 comments · May be fixed by #2433

Comments

@MrCsabaToth
Copy link

MrCsabaToth commented Sep 10, 2023

Originally I'm going through the Machine Learning Engineer Learning Path (GDRC), and in the Advanced NLP section of the NLP module ("Natural Language Processing on Google Cloud" https://www.cloudskillsboost.google/course_sessions/2920308/labs/363239) multiple labs reference a non existent public BQ dataset bigquery-public-data.hacker_news.stories.

The bigquery-public-data.hacker_news.stories which does not exist, I think it's moved to the full. There are also 9 notebooks in the deepdive/09_sequence* which also contain this deprecated reference. Fortunately the "Text classification using reusable embeddings" lab (https://www.cloudskillsboost.google/course_sessions/2920308/labs/363216 : https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive2/text_classification/labs/reusable_embeddings.ipynb) - which is in-between the two labs referenced above, so it comes after the AutoML lab - has a hint: the proper dataset probably meant to be bigquery-public-data.hacker_news.full.

I'm crafting a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant