Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dataset Collections to Object Oriented API #179

Merged
merged 12 commits into from
Jan 13, 2016
Merged

Add Dataset Collections to Object Oriented API #179

merged 12 commits into from
Jan 13, 2016

Conversation

alexjsmac
Copy link
Contributor

Not ready for merge, just posting what I have working so far.

Hoping to resolve #162 so that dataset collections can be created and used in workflows with the object oriented API.

The new test to create a dataset collection passes. Next up is to modify convert_input_map() or add a different method so that dataset collections can be used as inputs for workflows.

@alexjsmac
Copy link
Contributor Author

Not too sure what's causing the checks to fail. I get the same errors with $ tox from a fresh clone too. Any suggestions?

@nsoranzo
Copy link
Member

nsoranzo commented Jan 6, 2016

@amclean199 Thanks for working on this! I'm coming back from holidays, I hope to help in the next days. The tests fails because of the following PEP-8 error:

./tests/TestGalaxyObjects.py:632:31: E225 missing whitespace around operator

collections.HistoryDatasetElement(name="sample2", id=dataset2.id),
]
)
return collection_description

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add one more blank line here, flake8 is complaining:

./tests/TestGalaxyObjects.py:653:1: E302 expected 2 blank lines, found 1

@alexjsmac
Copy link
Contributor Author

Oh okay I see how it works now. Thanks @nsoranzo. I think with that last commit everything for this PR should be ready for review unless I missed anything.

@@ -16,6 +16,7 @@
import bioblend.galaxy.objects.wrappers as wrappers
import bioblend.galaxy.objects.galaxy_instance as galaxy_instance
from bioblend import ConnectionError
from bioblend.galaxy import dataset_collections as collections
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you not rename the import? I find it easier to grep the whole name.

@nsoranzo
Copy link
Member

nsoranzo commented Jan 8, 2016

You should add 'DatasetCollection' and 'HistoryDatasetCollectionAssociation' strings to __all__ in bioblend/bioblend/galaxy/objects/wrappers.py .

BASE_ATTRS = Wrapper.BASE_ATTRS + (
'state', 'deleted'
)
POLLING_INTERVAL = 1 # for state monitoring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POLLING_INTERVAL is unused, can be removed.

return self.gi.histories

@property
def _stream_url(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not used and does not make sense for a dataset collection, please remove.

@alexjsmac
Copy link
Contributor Author

Okay I have made all the changes you have requested but I found a flaw in running workflows with dataset collections. Right now the workflow run method only looks for datasets in the output history on line 456: outputs = [out_hist.get_dataset(_) for _ in res['outputs']]. Since it might also return a dataset collection ID now, how should both cases be handled? I've found that there are datasets and dataset collections with the same IDs so both get_dataset() and get_dataset_collection() return something different.

@alexjsmac
Copy link
Contributor Author

Currently, the object-oriented run() will operate correctly with a dataset collection as an input if datasets are expected to be the outputs. If a dataset collection is expected to be the output of a workflow then it will not be returned because only matching datasets are searched for from the IDs returned in the response from Galaxy.

@nsoranzo
Copy link
Member

@amaclean199 I'm having a look at this, but please see also related issue #171.

def tearDown(self):
self.wf.delete()

@test_util.skip_unless_galaxy('release_14.06')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move 'release_14.06' to the class decorator at line 743 and remove this line.

@jmchilton
Copy link
Member

To distinguish between datasets and dataset collections - I'd just look at history_content_type and create the create object wrapper type.

@nsoranzo
Copy link
Member

I'm testing this patch which seems to work

diff --git a/bioblend/galaxy/objects/wrappers.py b/bioblend/galaxy/objects/wrappers.py
index 2d6bf63..c843ddb 100644
--- a/bioblend/galaxy/objects/wrappers.py
+++ b/bioblend/galaxy/objects/wrappers.py
@@ -447,9 +447,17 @@ class Workflow(Wrapper):
             raise TypeError(
                 'history must be either a history wrapper or a string')
         res = self.gi.gi.workflows.run_workflow(self.id, **kwargs)
-        # res structure: {'history': HIST_ID, 'outputs': [DS_ID, DS_ID, ...]}
+        # res structure: {'history': HIST_ID, 'outputs': [CI_ID, CI_ID, ...]}
         out_hist = self.gi.histories.get(res['history'])
-        outputs = [out_hist.get_dataset(_) for _ in res['outputs']]
+        content_infos_dict = dict()
+        for ci in out_hist.content_infos:
+            content_infos_dict[ci.id] = ci
+        outputs = []
+        for output_id in res['outputs']:
+            if content_infos_dict[output_id].type == 'file':
+                outputs.append(out_hist.get_dataset(output_id))
+            elif content_infos_dict[output_id].type == 'collection':
+                outputs.append(out_hist.get_dataset_collection(output_id))

         if wait:
             self.gi._wait_datasets(outputs, polling_interval=polling_interval,

@alexjsmac
Copy link
Contributor Author

@nsoranzo Awesome yeah that looks good, works for me too!

@alexjsmac
Copy link
Contributor Author

@nsoranzo are you going to push the patch?

@nsoranzo
Copy link
Member

@amaclean199 I'm still checking if everything works on older releases. I can push the patch to your branch if you give permissions to my account or I can open a pull request.

@alexjsmac
Copy link
Contributor Author

Okay sounds good you have permissions now.

Also:
- Expose collection_type attribute of DatasetCollection.
- Rework new tests.
@nsoranzo
Copy link
Member

@amaclean199 Pushed, let's see how the Travis build goes.

@nsoranzo
Copy link
Member

@amaclean199 Build passed, this can be merged if it's ok with you.

@alexjsmac
Copy link
Contributor Author

Great, yeah that's fine with me 👍

nsoranzo added a commit that referenced this pull request Jan 13, 2016
Add Dataset Collections to Object Oriented API
@nsoranzo nsoranzo merged commit 85d49f3 into galaxyproject:master Jan 13, 2016
@nsoranzo
Copy link
Member

Thanks a lot @amaclean199!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot select a dataset collection when using the objects.GalaxyInstance History client module
3 participants