Skip to content

Commit

Permalink
add h2o script backgroud
Browse files Browse the repository at this point in the history
  • Loading branch information
MrPowers committed Sep 6, 2024
1 parent f1220f8 commit 7c48239
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,15 @@ maturin develop --release
falsa --help
```

## h2o groupby dataset
## h2o datasets

The h2o datasets are used to benchmark query engines on a single machine, [see here](https://duckdblabs.github.io/db-benchmark/).

Here are [the original R Scripts](https://github.com/duckdblabs/db-benchmark/tree/main/_data) to generate the sample datasets. These still work if you know how to run R (the large dataset generation can error out if you machine doesn't have sufficient memory).

falsa is good if you want to generate these datasets with a Python interface or if you are facing memory issues with the R scripts.

### h2o groupby dataset

The h2o groupby dataset has 9 columns and 10 million/100 million/1 billion rows of data.

Expand Down

0 comments on commit 7c48239

Please sign in to comment.