-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requirements for SODA Crystal - Add your pain points OR use cases OR feature requirements #9
Comments
Adding the input from @thatsdone here #8 |
(Possible) Feature Request Add OCI (Oracle Cloud Infrastructure) Object Storage Service support. As I think we need to discuss support matrix topic with Strato, I filed an issue (sodafoundation/strato#1425) there too. |
Comments from Rakesh, IBM: |
Comments from Rakesh, IBM (SODA TOC) - Apache Parquet file management for large data set, schema, - there are challenges. Lake House solution solves (Apache Iceberg, Hudi). However if we can provide a specific solution then many organizations, it will be useful where they are not able to deploy lake house kind of solution. |
From the competitive analysis, it looks like there could be two different focus areas for Crystal - (1) intelligent search of data sets and (2) data management using metadata. These two would have different requirements. Searching inside unstructured data could be an area to explore (Subhankar) |
We should keep track of old metadata for objects as well as new metadata. This will allow us to answer queries such as (1) what is the rate at which unstructured data is growing (2) Which are the fastest growing datasets (3) which department is adding data at the fastest rate (4) what should be the backup strategy for a dataset based upon how fast it is growing |
Hi, I want to add one use case here:
|
To provide compelling features against the competition, Crystal should be extensible to support semantic understanding of the metadata |
To provide compelling features against the competition, Crystal should be extensible to support multiple query languages including natural language |
Issue/Feature Description:
You can add all your pain points OR use cases OR feature requirements for SODA Crystal project
Project Focus : Unstructured Metadata management
You can add all your inputs in the comments. We will brainstorm to bring the first list
Reference
You can refer to some of the basic information collected or prepared for SODA Crystal here
Example 1:
We are struggling with metadata search for s3, especially the performance. Please find our concern details here OR attach the information.
Example 2:
I found a project which handles unstructured metadata management. Can we take inputs from there?
Example 3:
Feature request : Storing huge amount of IOT data in a common format can be a good feature? <More info here - link can be added or attach>
Example 4:
Do we consider data lake pain points like ?
The text was updated successfully, but these errors were encountered: