Skip to content

Commit

Permalink
[comet-parquet-exec] Comet parquet exec 2 (copy of Parth's PR) (#1138)
Browse files Browse the repository at this point in the history
* WIP: (POC2) A Parquet reader that uses the arrow-rs Parquet reader directly

* Change default config

---------

Co-authored-by: Parth Chandra <[email protected]>
  • Loading branch information
andygrove and parthchandra authored Dec 4, 2024
1 parent 29b2b77 commit ab09337
Show file tree
Hide file tree
Showing 7 changed files with 1,018 additions and 17 deletions.
52 changes: 52 additions & 0 deletions common/src/main/java/org/apache/comet/parquet/Native.java
Original file line number Diff line number Diff line change
Expand Up @@ -234,4 +234,56 @@ public static native void setPageV2(
* @param handle the handle to the native Parquet column reader
*/
public static native void closeColumnReader(long handle);

///////////// Arrow Native Parquet Reader APIs
// TODO: Add partitionValues(?), improve requiredColumns to use a projection mask that corresponds
// to arrow.
// Add batch size, datetimeRebaseModeSpec, metrics(how?)...

/**
* Initialize a record batch reader for a PartitionedFile
*
* @param filePath
* @param start
* @param length
* @param required_columns array of names of fields to read
* @return a handle to the record batch reader, used in subsequent calls.
*/
public static native long initRecordBatchReader(
String filePath, long start, long length, Object[] required_columns);

public static native int numRowGroups(long handle);

public static native long numTotalRows(long handle);

// arrow native version of read batch
/**
* Read the next batch of data into memory on native side
*
* @param handle
* @return the number of rows read
*/
public static native int readNextRecordBatch(long handle);

// arrow native equivalent of currentBatch. 'columnNum' is number of the column in the record
// batch
/**
* Load the column corresponding to columnNum in the currently loaded record batch into JVM
*
* @param handle
* @param columnNum
* @param arrayAddr
* @param schemaAddr
*/
public static native void currentColumnBatch(
long handle, int columnNum, long arrayAddr, long schemaAddr);

// arrow native version to close record batch reader

/**
* Close the record batch reader. Free the resources
*
* @param handle
*/
public static native void closeRecordBatchReader(long handle);
}
Loading

0 comments on commit ab09337

Please sign in to comment.