Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to convert a native protobuf object from/to clojure wrapper object? #33

Open
smessems opened this issue Jul 12, 2018 · 10 comments
Labels

Comments

@smessems
Copy link

Hi, is there a way to somehow convert from and to a native protobuf object?

I want to (->
read native protobuf objects from parquet ( via org.apache.parquet.proto/ProtoReadSupport)
Transform them into clojusc/protobuf based objects
Manipulate them as edn via this promising lib :)
Build native protobuf objects from the transformed values
Write to parquet ( via ProtoWriteSupport )
)

I am not sure if this approach is sound, but currently I see no way to convert to/from the native protobuf object.
In other words, the following blog/project demonstrates the approach, and it even references this library(in its older form?) as an alternative. But it will not really work without some conversion - as it rely on Kryo serialisation and really expects the native protobuf object and not the wrapper object:
https://adambard.com/blog/parquet-protobufs-spark/
https://github.com/adambard/sparkquet/blob/master/src/clj/sparkquet/core.clj

The actual objects are highly nested/repeated and manipulating them as edn would be much simpler.
Sorry if I am missing something basic here.

@oubiwann
Copy link
Member

oubiwann commented Jul 12, 2018

You should be able to do this by:

  1. using the core.create function, passing the protobuf class only (no data)
  2. then calling the core/bytes-> function with the result of the create call as the first parameter and the native bytes as the second parameter ...

This will, of course, require that you have done all the rest of the setup in your Clojure project (point to your protobuf schema, compiled to java, etc.).

Caveat: I said should ;-) I've not tried this at all; let's see how it goes. If what I outlined above doesn't work, we'll figure something out.

Once we get something working, we should be able to update the API to make it a little easier to do the conversion ... reduce the number of steps/amount of setup ... I'm already getting some ideas ;-)

Keep me posted!

@oubiwann
Copy link
Member

Just did a little playing around ... this approach won't work as outlined for schemas that have defined required fields ... still poking at it, though.

@smessems
Copy link
Author

Thanks for you prompt reply!
Today after posting, I continued to experiment and was able to do a full cycle - along the same lines:

  • WriteTo the "protobuf-obj" to com.google.protobuf.CodedOutputStream
  • core/bytes-> to populate the "protobuf-edn"
  • Manipulate the values
  • core/->bytes to a buffer
  • Build a protobuf-obj from buffer

I had an issue with enums ( will look into that later ).
I was not able to build an empty protobuf-edn though. Maybe because there are required fields - I am using the example proto so far. Not sure still why ( Maybe getDefaultInstance could be used )

-- Just seen that you have noticed the required issue as well --

@oubiwann
Copy link
Member

I may have a solution; testing something now ...

@oubiwann
Copy link
Member

oubiwann commented Jul 12, 2018

Okay, I have something in place that I've only tested in the REPL and with just one compiled protobuf class. I've pushed the latest up to Clojars:

  • [clojusc/protobuf "3.6.0-v1.2-SNAPSHOT"]

Once we get this hammered out, I'll add tests and docs for the new capability.

Here's what I did to test:

  1. lein repl
  2. Copy-and-paste of (def phones ...) from the tutorial (https://clojusc.github.io/protobuf/current/1050-tutorial.html)
  3. Got byte array for a protobuf: (def phone-bytes (protobuf/->bytes (first phones)))
  4. Used that byte array to create a new Clojure protobuf: (protobuf/create AddressBookProtos$Person$PhoneNumber phone-bytes)

Let me know how this goes, and how any other testing of different protobufs/compiled classes go ...

@oubiwann
Copy link
Member

oubiwann commented Jul 12, 2018

Basically, this involved updating the constructor to allow creating a new Clojure protobuf from a byte-array ([B), If you can find an easy way of converting your protobuf data to a Java-native byte array, this should work for you ... modulo any complications with how you've created your *.proto files.

Keep in mind that right now, this project only explicitly supports the proto 2 language (we generally track what Google puts up in its tutorials, and they haven't updated their tutorials for proto 3 yet; we do have it in the roadmap, though: #32 ). If you don't have control over the protobuf language version, you might not be able to use Clojure protobuf as things stand today :-(

@oubiwann
Copy link
Member

I'll explore adding ->stream and stream-> functions, as well as accepting a stream as a constructor ...

@oubiwann
Copy link
Member

I've added support for creating protobuf instance from com.google.protobuf.CodedInputStream and
java.io.InputStream. The latest code has been published on Clojars, so the next time you start up your REPL, the new SNAPSHOT will be downloaded (if you've got [clojusc/protobuf "3.6.0-v1.2-SNAPSHOT"] in your deps).

Using the same approach as given above, here's how you can create an instance from a com.google.protobuf.CodedInputStream:

[protobuf.dev] λ=> (def phones [(protobuf/create AddressBookProtos$Person$PhoneNumber
                                                  {:number "555-1212" :type :home})
                                (protobuf/create AddressBookProtos$Person$PhoneNumber
                                                  {:number "555-1213" :type :mobile})
                                (protobuf/create AddressBookProtos$Person$PhoneNumber
                                                  {:number "555-1214" :type :work})])
[protobuf.dev] λ=> (def phone-bytes (protobuf/->bytes (first phones)))
[protobuf.dev] λ=> (import com.google.protobuf.CodedInputStream)
[protobuf.dev] λ=> (def s (com.google.protobuf.CodedInputStream/newInstance phone-bytes))
[protobuf.dev] λ=> (protobuf/create AddressBookProtos$Person$PhoneNumber s)
{:number "555-1212", :type :home}

And here's how you create one from java.io.InputStream:

[protobuf.dev] λ=> (def s (new java.io.ByteArrayInputStream phone-bytes))
[protobuf.dev] λ=> (protobuf/create AddressBookProtos$Person$PhoneNumber s)
{:number "555-1212", :type :home}

@smessems
Copy link
Author

Thanks again for this next level response time!
Yes, this is working for me (I also saw that I can push this convert-transform-convert to the spark workers )
Next week my plan is to look further into this - check more complex protobuf ( the proto3 support might indeed be a problem ). I'll keep you posted :)

@reutsharabani
Copy link

works for me as well, thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants