Add 10k-v2 and stock_simulated for testing and benchmarking Rust library #4

alecmocatta · 2019-01-25T14:37:23Z

In the import of the Rust parquet library from http://github.com/sunchao/parquet-rs, the benchmarks and their required files 10k-v2.parquet and stock_simulated.parquet weren't copied across.

This adds them so that updated benchmarks can be included in apache/arrow#3461.

wesm · 2019-02-01T03:17:45Z

These files are a bit big. Can you generate some data files using pyarrow or similar as part of the benchmarking process?

alecmocatta · 2019-02-01T12:17:26Z

Good idea. Is it possible though that the generated files could change as pyarrow changes over time? I'd like to ensure the files are bit-for-bit identical over time, for consistent benchmark results.

wesm · 2019-02-01T13:57:53Z

You could pin the pyarrow version used to generate them

alecmocatta · 2019-02-05T10:11:09Z

Thanks. I'm not familiar with how to do that but I'll take a look once the aforementioned PR is merged. I removed the benchmarks from it so this PR isn't needed immediately.

wesm · 2019-02-13T21:23:09Z

I have a lot on my plate. @andygrove or @sunchao could you take ownership of this? I don't like the idea of checking 1MB files into source control, particularly for benchmarking, so please don't do that without checking with me first

andygrove · 2019-02-13T22:30:44Z

I'm happy to take this on. I am in the final week of a release in my day job so may not have much time until this weekend.

…

On Wed, Feb 13, 2019 at 2:23 PM Wes McKinney ***@***.***> wrote: I have a lot on my plate. @andygrove <https://github.com/andygrove> or @sunchao <https://github.com/sunchao> could you take ownership of this? I don't like the idea of checking 1MB files into source control, particularly for benchmarking, so please don't do that without checking with me first — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA5AxGzpauzu9JvRbqHlg_8NoiXRYh3Mks5vNII-gaJpZM4aS_k5> .

andygrove · 2019-02-16T17:39:16Z

I'm about to embark on some benchmarking with large files too so I face similar challenges.

Generating test data as part of the build seems like the way to go in order to catch performance regressions in the core libraries.

I am also maintaining my own repo where I have benchmarks against popular public data sets (and I'm just relying on wget or curl to download them).

Add 10k-v2 and stock_simulated for testing and benchmarking Rust library

04777b9

alecmocatta force-pushed the master branch from 6f9c7fc to 04777b9 Compare August 24, 2019 13:16

alecmocatta closed this Aug 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 10k-v2 and stock_simulated for testing and benchmarking Rust library #4

Add 10k-v2 and stock_simulated for testing and benchmarking Rust library #4

alecmocatta commented Jan 25, 2019

wesm commented Feb 1, 2019

alecmocatta commented Feb 1, 2019

wesm commented Feb 1, 2019

alecmocatta commented Feb 5, 2019

wesm commented Feb 13, 2019

andygrove commented Feb 13, 2019 via email

andygrove commented Feb 16, 2019

Add 10k-v2 and stock_simulated for testing and benchmarking Rust library #4

Add 10k-v2 and stock_simulated for testing and benchmarking Rust library #4

Conversation

alecmocatta commented Jan 25, 2019

wesm commented Feb 1, 2019

alecmocatta commented Feb 1, 2019

wesm commented Feb 1, 2019

alecmocatta commented Feb 5, 2019

wesm commented Feb 13, 2019

andygrove commented Feb 13, 2019 via email

andygrove commented Feb 16, 2019