Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to manage multiple projects with the same data files #11

Open
khailper opened this issue Jan 15, 2019 · 2 comments
Open

How to manage multiple projects with the same data files #11

khailper opened this issue Jan 15, 2019 · 2 comments

Comments

@khailper
Copy link

At work, I'm working multiple reports that all rely on common data files that occasionally update. Originally, each report lived in its own project with its own Data folder. I then ran into the issue of not being able to easily track which projects had the most recent data. As a result, I migrated them all to one my_reports mega-project with a single Data folder. Now I have trouble organizing my files in a way that follows project-oriented a workflow. Is there a way to have both reliable data and a good file structure?

@rbjanis
Copy link

rbjanis commented Jan 15, 2019

I have a similar organizational issue! My work collects annual data sets, which we've been storing in a central Data folder. These data files get used in numerous projects, and it's been difficult to know how to store data and other project files that still works with RStudio Projects. Besides copying the data files into the folders from each project, is there a better way to handle this that avoids having many versions of data files that can easily get out of sync with the "true" data file in the Data folder?

@jennybc
Copy link
Member

jennybc commented Jan 16, 2019

I'll recap an in-person conversation here. There are a few points to consider:

  • If you have a dataset that is used in many projects, the dataset should probably be its own "thing".
  • You might make into a proper R package (a data package, like gapminder or babynames, but for your data). You could still keep it personal to you or your group, i.e. it doesn't have to go to CRAN. But the packaging infrastructure gets you a lot of useful structure.
  • Otherwise, you could create a symlink (a.k.a. shortcut or alias) from the central data store into the relevant projects. This makes the data look local to each project, but keeps you from copying it several times. I.e. you'd still have a single source of data truth. Some more words about this here: https://community.rstudio.com/t/project-oriented-workflow-setwd-rm-list-ls-and-computer-fires/3549/35?u=jennybryan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants