-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DataChain.listings()
method and use it in getting storages
#331
Conversation
Deploying datachain-documentation with Cloudflare Pages
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #331 +/- ##
==========================================
+ Coverage 86.78% 86.81% +0.03%
==========================================
Files 92 93 +1
Lines 10069 10078 +9
Branches 2047 2048 +1
==========================================
+ Hits 8738 8749 +11
+ Misses 988 984 -4
- Partials 343 345 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
… objects for each cached listing
2e125ae
to
8537347
Compare
for more information, see https://pre-commit.ci
…datachain into ilongin/329-refactor-storages
dependency_name = dataset_name | ||
|
||
if is_listing_dataset(dataset_name): | ||
dependency_type = DatasetDependencyType.STORAGE # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use LISTING as a type (since we use listing
as term in some other places)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this as well. Not sure, but both is correct IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but both is correct IMO
yes, but it just make it simpler to maintain, search, understand
not critical, but I would try to go everywhere with the same term
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will consider that in upcoming PRs just to not block this one
src/datachain/lib/dc.py
Outdated
object_name: str = "listing", | ||
**kwargs, | ||
) -> "DataChain": | ||
"""Generate chain with list of cached listing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
listing -> listings
Q: do we really need to make it a public dc method? (I don't have any strong opinion on this - just thinking if we can for now expose less and see if there are scenarios for this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean we will need from_listing
btw? (just trying to think if we can keep at all about datasets / datachain and avoid exposing "listing" concept to end users (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we do have usage of both listings
and storages
words around interface which is ok IMO. Listing is just listed storage so I don't think those two words cannot live together.
I don't think we need from_listing
as we have from_storage
which in background creates new listing if doesn't exist and uses it then.
I think it's ok for us to expose this as we will be using it in Studio for example which can also be seen as one client.
src/datachain/lib/dc.py
Outdated
object_name: str = "listing", | ||
**kwargs, | ||
) -> "DataChain": | ||
"""Generate chain with list of cached listing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be great to explain a bit what the listing is - just a few words
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one sentence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, my only question if we really want to expose this as method that returns datachain (what is the use case for this?)
also the name listing
vs storage
that we use in some places
About exposing it as a method that returns |
This PR adds class method similar to
DataChain.datasets()
calledDataChain.listings()
which returns list ofListingInfo
objects from chain.ListingInfo
describes one specific cached listing (version) and is simply sub-classingDatasetInfo
which is returned from mentionedDataChain.datasets()
as listing is just special form of dataset.The idea is to use this instead of deprecated
Storage
class which will be, along side with related codebase, removed in upcoming followups.Additional changes:
Dataset.listings()
in CLI methodls_local
which lists storages / listingsCatalog
methods:unlist_source
,storage_stats
,ls_storage_uris
,get_storage
,ls_storages
and underlying metastore methodsstorage
type dependency is also pointing to dataset (just listing type of dataset)Blocked by #294