Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intake-ifying osm #20

Closed
wants to merge 5 commits into from
Closed

Intake-ifying osm #20

wants to merge 5 commits into from

Conversation

jsignell
Copy link
Contributor

@jsignell jsignell commented May 23, 2019

This PR:

  • moves osm-1billion
  • makes that the default notebook for the project
  • intake-ifies that notebook to allow download only when called
  • adds the infrastructure for intake test catalogs

Blocked on intake/intake-parquet#11 and intake-parquet release.

osm/anaconda-project.yml Show resolved Hide resolved
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the file isn't downloaded yet. The following step will take some time:"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know at this point whether it's been downloaded yet? Won't it be cached? Seems like it should say instead that "Note that the first time this cell is executed, the file will take some time to download, but subsequent runs will skip that step".

@@ -0,0 +1,6 @@
sources:
osm_one_billion:
description: Test data points to same fake osm-3billion file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as what? Try to reword in a way that makes sense when reading just this one file.

@jsignell
Copy link
Contributor Author

There is a lingering issue here where the urlpath that is returned in https://github.com/intake/intake-parquet/blob/f029a36ba5c7a644b4faebd2a40d6ec21dbc5681/intake_parquet/source.py#L134 is actually a list of paths where the first one is the path to the parquet dir which is all that you actually need. This crops up when you run df.persist()

@jsignell
Copy link
Contributor Author

jsignell commented Jun 6, 2019

Closing this for now, although I won't delete the branch yet.

@jsignell jsignell closed this Jun 6, 2019
@philippjfr philippjfr deleted the osm branch January 21, 2022 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants