Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R package for accessing data.gov.au open data sets via API #2

Open
jonocarroll opened this issue Mar 2, 2017 · 7 comments
Open

R package for accessing data.gov.au open data sets via API #2

jonocarroll opened this issue Mar 2, 2017 · 7 comments

Comments

@jonocarroll
Copy link

As per ropensci/auunconf#16 -- this has a lot going for it, not the least of which is a similarity to #1.

  • An existing package framework to access the existing API (https://github.com/ropensci/ckanr)
  • Well-defined allocation of tasks (individuals working on getting their datasets extracted, cleaned, organised/dashboard/API backend/export methods/etc...)
  • Opportunity for learning (different ways of doing things)
  • Potential for a CRAN-releasable package

The data is mostly well-organised with attached metadata, various formats, and proper attributions to the relevant department. It's an under-utilised resource as far as I can tell, and there are currently big pushes to better use this (e.g. GovHack challenges).

@adamhsparks
Copy link

This would make a nice package.

There's a lot of data here as you've noted. It would be good to have some focus I think, at least for the Unconf so that it's achievable. A package that accesses a certain group of data would be achievable or at least a good start could be made, I think. For example, there are 637 shape files available, http://www.data.gov.au/dataset?tags=Earth+Sciences&res_format=SHP or maybe more accessible 7 arcgrid files available, http://www.data.gov.au/dataset?tags=Earth+Sciences&res_format=arcgrid.

I'm looking at spatial files since I tend to use those quite a bit, but I'm willing to help with other files. This type of data access is in the realm of my two R packages on CRAN right now.

@jonocarroll
Copy link
Author

Pulling the data in at all would be the first step, but a valuable second step would be getting them R-ready, e.g. converting to sf objects. We could see how variable the data configurations are and whether or not an approach can be generalised.

If that all works out too easily, we could put some effort towards displaying them neatly like http://location.sa.gov.au/viewer/ or http://www.aginsight.sa.gov.au/ .

@jeffreyhanson
Copy link

This would make an awesome R package.

Yeah I agree with @jonocarroll, importing the datasets would make it much easier to work with.

So I guess the package would need at least two functions. One function to list all the available data sets (with names, descriptions, and links), a second function to download and import a given data set. Like @jonocarroll says, we could also include a shiny app display function to explore data sets.

Do you think the package should implement caching similar to raster::getData? Ie. if the data is already detected in the output directory, the package should just load it?

@jonocarroll
Copy link
Author

Sounds like a useful feature, @jeffreyhanson -- especially if we're saving the transformed/R-ready versions (too?).

The data should be accessible via the API which I believe https://github.com/ropensci/ckanr should handle okay. There's a good chance that there's lots we can get done in just 2 days on this, especially with a division of labour across the various aspects.

@adamhsparks
Copy link

@jeffreyhanson, for caching data, I might suggest looking at rappdirs. I use getData() and it frustrates me how it pollutes the folder but doesn't tell me that it will or ask where I want it.

@jeffreyhanson
Copy link

jeffreyhanson commented Mar 30, 2017

@adamhsparks rappdirs looks really handy - thanks for the heads up!

Perhaps we could list rappdirs under Suggests in the DESCRIPTION and use it if it's installed. Otherwise, it could save the data to the working directory (or a temporary directory?).

@adamhsparks
Copy link

adamhsparks commented Mar 30, 2017

@jeffreyhanson, already ahead of you mate.

See my getCRUCLdata package: https://github.com/adamhsparks/getCRUCLdata

It uses exactly that functionality, tempdir() unless cache = TRUE when fetching data from the FTP site. Though I have it as a Depends in DESCRIPTION, not just a Suggest.

The CRU data won't change, if the data here change, we'll need to check the local vs server files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants