From 3e730fe1d38512802e3bbb6b672ab46938b60c84 Mon Sep 17 00:00:00 2001 From: Matthew Rocklin Date: Fri, 24 Nov 2023 08:42:17 -0600 Subject: [PATCH] Change default parquet compression format from Snappy to LZ4 Snappy's status as default is maybe just due to history. Snappy had better Java support and LZ4 wasn't always available in systems like Spark. Today Spark and other systems support LZ4 as well, and LZ4 generally performs a bit better, especially on decompression. This is a significant change, but I think the only reason not to do it is historical, which I think maybe isn't a good enough reason these days. --- dask_expr/io/parquet.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dask_expr/io/parquet.py b/dask_expr/io/parquet.py index 525478ab..f9ca7f88 100644 --- a/dask_expr/io/parquet.py +++ b/dask_expr/io/parquet.py @@ -173,7 +173,7 @@ def _layer(self): def to_parquet( df, path, - compression="snappy", + compression="lz4", write_index=True, append=False, overwrite=False,