Time taken to get an object. #1328

PaulRudin · 2023-04-21T08:46:58Z

PaulRudin
Apr 21, 2023

I'm experimenting with vineyard. This is all ipc - no rpc.

I've put an object from one process (to the extent it's relevant it's polars dataframe of about 300Mb). In another process I call client.list_objects("*") and I can see it there:

Object <"o000010a1186f4af6": vineyard::PickleBuffer>

Now calling client.get() successfully retrieves the object. The thing that I'm a little uncertain about is why this takes several seconds if the data is all in shared memory on the same machine?

Answered by sighingnow

Apr 21, 2023

As a workaround, you could convert polars dataframe to pandas dataframe before put and convert it back after getting from vineyard.

From the following example you can see a great performance gain when vineyard helps avoid the serialization and deserialization, even some to/from pd.DataFrame conversion is needed.

With native polars integration. The performance should be improved further (will be published soon).

In [23]: df
Out[23]:
shape: (800_000, 80)  # 512M
....

In [24]: %timeit -n 1 -r 1 client.put(ddf)
1.69 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

In [25]: %timeit -n 1 -r 1 client.put(ddf.to_pandas())
369 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

…

View full answer

sighingnow · 2023-04-21T08:57:32Z

sighingnow
Apr 21, 2023
Maintainer

Vineyard has no built-in integration with polars dataframe (see also #1015) so the put/get is falling back to pickle (serialization/deserialization). I'm drafting the builder/resolver for polars and that should resolve the issue.

Actually, the most significant gains of vineyard are from avoiding the costly serialization/deserialization.

0 replies

sighingnow · 2023-04-21T09:15:29Z

sighingnow
Apr 21, 2023
Maintainer

As a workaround, you could convert polars dataframe to pandas dataframe before put and convert it back after getting from vineyard.

From the following example you can see a great performance gain when vineyard helps avoid the serialization and deserialization, even some to/from pd.DataFrame conversion is needed.

With native polars integration. The performance should be improved further (will be published soon).

In [23]: df
Out[23]:
shape: (800_000, 80)  # 512M
....

In [24]: %timeit -n 1 -r 1 client.put(ddf)
1.69 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

In [25]: %timeit -n 1 -r 1 client.put(ddf.to_pandas())
369 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

In [26]: object_id = client.put(ddf)

In [27]: %timeit -n 1 -r 1 client.get(object_id)
4.81 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

In [28]: object_id = client.put(ddf.to_pandas())

In [29]: %timeit -n 1 -r 1 pl.DataFrame(client.get(object_id))
185 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

In [30]: df.estimated_size()
Out[30]: 512000000

2 replies

PaulRudin Apr 21, 2023
Author

Yeah, thanks for that. It had occurred to me to try converting to pandas, but hadn't got to trying it out. I think in my particular use case; on the consumer side a pandas dataframe is fine. Probably I could just extract a few numpy arrays if it was worthwhile. I'll try it out and see how I get on.

sighingnow Apr 21, 2023
Maintainer

Probably I could just extract a few numpy arrays if it was worthwhile.

Both pandas dataframe and numpy ndarray already has good built-in integration with vineyard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time taken to get an object. #1328

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Time taken to get an object. #1328

PaulRudin Apr 21, 2023

Replies: 2 comments · 2 replies

sighingnow Apr 21, 2023 Maintainer

sighingnow Apr 21, 2023 Maintainer

PaulRudin Apr 21, 2023 Author

sighingnow Apr 21, 2023 Maintainer

PaulRudin
Apr 21, 2023

Replies: 2 comments 2 replies

sighingnow
Apr 21, 2023
Maintainer

sighingnow
Apr 21, 2023
Maintainer

PaulRudin Apr 21, 2023
Author

sighingnow Apr 21, 2023
Maintainer