Time taken to get an object. #1328
-
I'm experimenting with vineyard. This is all ipc - no rpc. I've put an object from one process (to the extent it's relevant it's polars dataframe of about 300Mb). In another process I call
Now calling |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Vineyard has no built-in integration with polars dataframe (see also #1015) so the put/get is falling back to pickle (serialization/deserialization). I'm drafting the builder/resolver for polars and that should resolve the issue. Actually, the most significant gains of vineyard are from avoiding the costly serialization/deserialization. |
Beta Was this translation helpful? Give feedback.
-
As a workaround, you could convert polars dataframe to pandas dataframe before put and convert it back after getting from vineyard. From the following example you can see a great performance gain when vineyard helps avoid the serialization and deserialization, even some to/from pd.DataFrame conversion is needed. With native polars integration. The performance should be improved further (will be published soon). In [23]: df
Out[23]:
shape: (800_000, 80) # 512M
....
In [24]: %timeit -n 1 -r 1 client.put(ddf)
1.69 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
In [25]: %timeit -n 1 -r 1 client.put(ddf.to_pandas())
369 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
In [26]: object_id = client.put(ddf)
In [27]: %timeit -n 1 -r 1 client.get(object_id)
4.81 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
In [28]: object_id = client.put(ddf.to_pandas())
In [29]: %timeit -n 1 -r 1 pl.DataFrame(client.get(object_id))
185 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
In [30]: df.estimated_size()
Out[30]: 512000000 |
Beta Was this translation helpful? Give feedback.
As a workaround, you could convert polars dataframe to pandas dataframe before put and convert it back after getting from vineyard.
From the following example you can see a great performance gain when vineyard helps avoid the serialization and deserialization, even some to/from pd.DataFrame conversion is needed.
With native polars integration. The performance should be improved further (will be published soon).