Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflows failing for DataSets with multiple input files #32

Open
captainceramic opened this issue May 19, 2015 · 7 comments
Open

Workflows failing for DataSets with multiple input files #32

captainceramic opened this issue May 19, 2015 · 7 comments
Labels

Comments

@captainceramic
Copy link
Contributor

See #30

@DamienIrving has a workflow that is failing whenever it is run with multiple input models.

@captainceramic captainceramic self-assigned this May 19, 2015
@captainceramic
Copy link
Contributor Author

@DamienIrving - would you be able to check the failing workflow into cwsl-workflows repo? Maybe we could open a development branch - it would be good to see your exact workflow to repeat the bug.

@captainceramic
Copy link
Contributor Author

@DamienIrving - actually, don't worry. I can replicate this. Sorry about this - I really thought this was covered by a unit test. I'm on it - it is a recent regression from something I did.

@DamienIrving
Copy link
Contributor

@captainceramic Ok. If you change your mind let me know - I'm happy to push my workflows to cwsl-workflows if need be.

captainceramic added a commit that referenced this issue May 19, 2015
This included
	- Checking for valid combinations of attributes instead of hash values
	- Using the ArgumentCreator as an iterator (__iter__)
	- Removing confusing logger debugging statements
	- Continuing iteration instead of returning None when no matching
	  inputs are found for an output file.
@captainceramic
Copy link
Contributor Author

@DamienIrving - I've pushed some code that should fix the simplest version of this bug (the one where models and institutes are getting mixed up). Can you have a go and see if it fixes it in your workflow?

The second, more complex issue is around "mapping" constraints and is issue #14 - Would you be able to post your desired workflow (the one with the arithmetic comparisons of two datasets) under that issue? I think it could be a good test case for it.

If you can get the multiple inputs going I'll close this issue.

@DamienIrving
Copy link
Contributor

As noted in #35, I think I just found a bug with the handling of multiple input files in ensemble operations (i.e. where you want to pass all the files at once for constraints that your are overwriting). The bug happens when I run a workflow where in_dataset has more than one experiment (e.g. rcp45 and rcp85). VisTrails basically just freezes at the Ensemble Aggregation step and eventually stops the incomplete workflow with no error message or anything (i.e. the VisTrails application freezes and I have to kill it). It works fine if there is only one experiment, but fails as soon as there is more than one.

@captainceramic
Copy link
Contributor Author

I can reproduce this - looking at it now.

@captainceramic
Copy link
Contributor Author

I think this is a combination of two things -

  • To implement some of the more complex combinations of keyword/positional/added constraint arguments I got a bit loose in trimming out unused Constraint values. This meant that for big ensembles we got enough possible combinations to slow things down. I have pushed some code to tighten this up.
  • Doing one of these big ensembles means moving around a lot of data, and I think that sometimes it just takes a really long time to stage in what is basically the entire tas or tos for rcp85 and rcp45 from the CMIP5 archive, then copy the whole thing, subset it, regrid it etc. Implementing some way to avoid overwriting existing files could be a big advantage here.

@captainceramic captainceramic removed their assignment Jun 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants