Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #515: Add ColumnStats Schema for JSON parsing #522

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

osopardo1
Copy link
Member

@osopardo1 osopardo1 commented Dec 16, 2024

Description

Fixes #515 .

In this PR, we are adding a way of building a columnStats Schema using the current column Transformers and the actual Schema of the Data. We want to:

  • Ensure the fields are properly parsed from the JSON string.
  • If a JSON string is specified, but the row is returned null, we assume the string is not following the correct syntax.

For that, I've added a QbeastColumnStats case class that contains the columnStatsSchema and the columnStatsRow. Also, a QbeastColumnStatsBuilder is needed to retrieve all the information given the parameters mentioned above.

case class QbeastColumnStats(columnStatsSchema: StructType, columnStatsRow: Row)


object QbeastColumnStatsBuilder {
  /**
   * Builds the QbeastColumnStats
   *
   * @param statsString
   *   the stats in a JSON string
   * @param columnTransformers
   *   the set of columnTransformers to build the Stats from
   * @param dataSchema
   *   the data schema to build the Stats from
   * @return
   */
  def build(
      statsString: String,
      columnTransformers: Seq[Transformer],
      dataSchema: StructType): QbeastColumnStats
}

Type of change

Bug fix.

Checklist:

Here is the list of things you should do before submitting this pull request:

  • New feature / bug fix has been committed following the Contribution guide.
  • Add logging to the code following the Contribution guide.
  • Add comments to the code (make it easier for the community!).
  • Change the documentation.
  • Add tests.
  • Your branch is updated to the main branch (dependent changes have been merged).

How Has This Been Tested? (Optional)

Testing different parsings on QbeastColumnStatsTestBuilder.

@osopardo1 osopardo1 changed the title Issue #515: Introduce ColumnStats schema for parsing Issue #515: Add ColumnStats Schema for JSON parsing Dec 16, 2024
# Conflicts:
#	core/src/main/scala/io/qbeast/spark/index/SparkRevisionFactory.scala
#	src/main/scala/io/qbeast/table/IndexedTable.scala
#	src/test/scala/io/qbeast/spark/index/SparkRevisionFactoryTest.scala
@osopardo1 osopardo1 requested review from Jiaweihu08 and removed request for Jiaweihu08 December 20, 2024 07:58
Copy link

codecov bot commented Dec 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.50%. Comparing base (b2e2f85) to head (81ab4a2).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #522   +/-   ##
=======================================
  Coverage   88.50%   88.50%           
=======================================
  Files          21       21           
  Lines         774      774           
  Branches      115      115           
=======================================
  Hits          685      685           
  Misses         89       89           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@osopardo1 osopardo1 requested a review from Jiaweihu08 December 20, 2024 10:12
@osopardo1 osopardo1 marked this pull request as ready for review December 20, 2024 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant