Skip to content

Commit

Permalink
feat: add showDeltaFileSizes function (MrPowers#43)
Browse files Browse the repository at this point in the history
  • Loading branch information
sebastian2296 committed Nov 17, 2023
1 parent f6f1a62 commit 17f1eac
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 0 deletions.
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,24 @@ Map("size_in_bytes" -> 1320,
"number_of_files" -> 2,
"average_file_size_in_bytes" -> 660)
```
## Show Delta File Sizes

The function `showDeltaFileSizes` displays the size, average size and amount of files of a Delta table in a human readable fashion.

Suppose you have the following table, partitioned by `col1`:

```
+----+----+----+
|col1|col2|col3|
+----+----+----+
| 1| A| A|
| 2| A| B|
+----+----+----+
```

Running `DeltaHelpers.showDeltaFileSizes` will display the following into the console:

`"The delta table contains 2 files with a size of 1.32 kB.The average file size is 660 B"`

## Humanize Bytes

Expand Down
15 changes: 15 additions & 0 deletions src/main/scala/mrpowers/jodie/DeltaHelpers.scala
Original file line number Diff line number Diff line change
Expand Up @@ -535,4 +535,19 @@ object DeltaHelpers {
val result: String = resultOption.getOrElse(f"$n%.0f" + " B")
result
}

def showDeltaFileSizes(deltaTable: DeltaTable) = {
val details: Row = deltaTable.detail().select("numFiles", "sizeInBytes").collect()(0)
val (sizeInBytes, numberOfFiles) =
(details.getAs[Long]("sizeInBytes"), details.getAs[Long]("numFiles"))
val avgFileSizeInBytes = if (numberOfFiles == 0) 0 else Math.round(sizeInBytes / numberOfFiles)
val formatter = java.text.NumberFormat.getIntegerInstance

val humanized_number_of_files = numberOfFiles.toInt
val humanized_size_in_bytes = humanizeBytes(sizeInBytes)
val humanized_average_file_size = humanizeBytes(avgFileSizeInBytes)

println( s"The delta table contains ${humanized_number_of_files} files with a size of ${humanized_size_in_bytes}."
+ s" The average file size is ${humanized_average_file_size}")
}
}
7 changes: 7 additions & 0 deletions src/test/scala/mrpowers/jodie/DeltaHelperSpec.scala
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,13 @@ class DeltaHelperSpec
actual("humanized_number_of_files") == "1"
actual("humanized_average_file_size_in_bytes") == "1.088 kB"
}

it("should display delta file sizes in a human readable fashion") {
val path = (os.pwd / "tmp" / "delta-table").toString()
createBaseDeltaTable(path, rows)
val deltaTable = DeltaTable.forPath(path)
DeltaHelpers.showDeltaFileSizes(deltaTable)
}
}
describe("remove duplicate records from delta table") {
it("should remove duplicates successful") {
Expand Down

0 comments on commit 17f1eac

Please sign in to comment.