Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support alternative serialization formats #414

Closed
oberstet opened this issue Jan 28, 2016 · 13 comments
Closed

Support alternative serialization formats #414

oberstet opened this issue Jan 28, 2016 · 13 comments

Comments

@oberstet
Copy link

autobahn-cpp is a WAMP client library, and uses msgpack-c to support MessagePack serialization variants of WAMP.

WAMP in addition supports JSON and CBOR serialization formats. E.g. Crossbar.io, a WAMP router, supports all 3 serialization format, and is able to transparently translate between different serialization formats.

In autobahn-cpp, we had some discussions about how to support JSON and CBOR.

One aspect that came up is that msgpack-c actually provides 2 things in 1:

  1. a MessagePack serializer/unserializer
  2. a Variant type value library

Eg autobahn-cpp was perviously using a Boost Variant type in it's public API, but now uses msgpack-c for that.

I was wondering if we could retain 2. as a public API type in autobahn-cpp, while expanding 1. for supporting not only MessagePack, but also JSON and CBOR - transparently.

Users of autobahn-cpp would continue programming using msgpack-c types, and serialization format negotiation could happen dynamically and completely "under the hood" - without any user code changes.

Is this something that has been considered for msgpack-c?

@jpetso
Copy link
Contributor

jpetso commented Jan 28, 2016

I think a more practical variant of this feature request could be for msgpack-c to support a SAX-style deserialization API.

In https://github.com/tplgy/bonefish/blob/master/src/bonefish/serialization/json_serializer.cpp and https://github.com/tplgy/json-msgpack/blob/master/include/json_msgpack/json_msgpack_sax.hpp, I use RapidJSON's SAX API to read JSON and write directly into a msgpack::object, without the need for an intermediate JSON document object. It would be nice if msgpack also offered an API that lets users parse into their own variants (e.g. JSON document) without having a msgpack::object created first.

For writing msgpack, the Packer API already is a SAX API that's entirely independent from msgpack::object itself. I don't believe any changes there would be necessary or worthwhile, it works well and efficiently.

@oberstet, if you want to keep msgpack::object as the variant type then there's nothing that msgpack-c needs to add: grab our code (ask for Boost relicensing if necessary) and you've got a msgpack::object <-> JSON conversion without unnecessary inefficiencies. The same could be done with msgpack::object <-> CBOR.

@oberstet
Copy link
Author

@jpetso yes, I'd like to keep msgpack::object as the variant type in the user facing API of Autobahn - this was a longer discussion, I initially opposed that, but Dave (and you?) convinced my;) So let's stick to that. The only piece missing is ser/unser JSON/CBOR into msgpack::object.

Ah, I see .. so instead of extending msgpack-c, you are converting at WAMP message level https://github.com/tplgy/bonefish/blob/master/src/bonefish/serialization/json_serializer.cpp

@jpetso
Copy link
Contributor

jpetso commented Jan 29, 2016

There's really nothing WAMP-specific about it, except how to parse the JSON string for binary data, but that's pluggable. It spits out a msgpack::object, which that code happens to use as WAMP message.

@davidchappelle
Copy link

My transport abstraction work includes some refactoring that will make
adding support for JSON and CBOR quite easy.

On Thu, Jan 28, 2016 at 7:44 PM, Jakob Petsovits [email protected]
wrote:

There's really nothing WAMP-specific about it, except how to parse the
JSON string for binary data, but that's pluggable. It spits out a
msgpack::object, which that code happens to use as WAMP message.


Reply to this email directly or view it on GitHub
#414 (comment).

@redboltz
Copy link
Contributor

msgpack-c doesn't provide other types converter except JSON output. However, it provides an adaptation point named msgpack::type::variant. Let me explain in detail about that.

msgpack-c provides variant type named msgpack::object. It can convert to JSON string using output stream operator <<. msgpack::object is difficult to modify. So I added another variant type named msgpack::type::variant based on boost::variant. See the following example:
http://melpon.org/wandbox/permlink/NWAlWzPNWHt6msii

Note: In order to use msgpack::type::variant, you need to define MSGPACK_USE_BOOST macro. When you use wandbox, online compiler that supports msgpack-c, add -DMSGPACK_USE_BOOST to the extra options text box and check MessagePack check box.

You can build msgpack::type::variant structure step by step. I think that insertion functions would be called in JSON and CBOR parser to convert to msgpack::type::variant.

You can use msgpack::type::variant with boost::static_visitor. So you can implement any converters using that. See the following example:
https://github.com/msgpack/msgpack-c/tree/master/example/boost

msgpack::type::variant is not documented yet in wiki

Here is a little documentation:
#349

@oberstet
Copy link
Author

Now I am confused;)

autobahn-cpp _previously used boost::any as the dynamic type in the public, user-facing API and for shuffling around data parsed from the wire internally:

   /// A map holding any values and string keys.
   typedef std::map<std::string, boost::any> anymap;

   /// A vector holding any values.
   typedef std::vector<boost::any> anyvec;

   /// A pair of ::anyvec and ::anymap.
   typedef std::pair<anyvec, anymap> anyvecmap;

After a long discussion, we switch to msgpack::object instead of boost::any.

Are you suggesting to move to msgpack::type::variant, which seems to be a typedef for boost::variant?

It's funny: in a way, this the kind of situation I wanted to avoid originally by choosing boost::any: have a neutral ground w.r.t. serialization formats;)

@redboltz
Copy link
Contributor

Are you suggesting to move to msgpack::type::variant, which seems to be a typedef for boost::variant?

I just wanted to inform you the recent version of the msgpack-c situation.

msgpack::type::variant is comparatively new introduced feature since version 1.2.0. It requires the boost libraries. Users can choose msgpack::object, msgpack::type::variant, or both.

Let's sum up the information to make decision.

  • msgpack::object is simpler than msgpack::type::variant.
  • msgpack::object exists from the initial version of msgpack-c.
  • msgpack::type::variant requires the boost libraries.
  • msgpack::type::variant is easy to modify such as insert, delete, and update. So it is easy to create (convert) from different types.
  • msgpack::object is located on msgpack::zone.
  • Both can easy to convert to different types.

I don't know much about autobahn-cpp's API history. I think that if there are already implemented converters, I recommend that check them out. And that have been introduced by @jpetso and @davidchappelle.

@oberstet
Copy link
Author

msgpack::object is located on msgpack::zone

That was a main reason for switching from boost::any to msgpack::object for both internal and API in autobahn: memory efficiency and less dynamic allocations.

Both can easy to convert to different types.

"convert" sounds like double effort / memory.

I guess my question is: how would I parse CBOR from the wire, while creating a native msgpack::object on the fly and vice-versa, without creating some CBOR native object from the wire, and subsequently convert that to msgpack::object?

@davidchappelle
Copy link

boost::any was not well suited for custom data types. The previous api
required that all information passed in was already converted to
boost::any. Then internally there was a switch statement to figure out what
the actual boost::any type was so that it could be serialized
appropriately and placed on the wire. This was not an extensible solution.
What if I was passing in a map of custom objects? As soon as we started
using autobahn-cpp beyond the trivial use cases using intrinsic types it
was immediately obvious that boost::any was a poor choice of
abstraction. It also has performance overhead issues since all boost::any
objects allocate storage on the heap even for intrinsic types. That equates
to a lot of unnecessary malloc and copying overhead.

Converting to other types in our case is no different than what was being
done before with respect to converting between wire format to boost:any to
user-defined-types.

What we have now is really no different than what we had before except that
msgpack::object has taken the place of boost::any.

Before

user-defined-type <=> boost::any <=> wire-format

Now

user-defined-type <=> msgpack::object <=> wire-format

I think that regardless of what is chosen as the variant there will be an
intermediate step required to convert to and from the variant type.

On Saturday, 30 January 2016, Tobias Oberstein [email protected]
wrote:

msgpack::object is located on msgpack::zone

That was a main reason for switching from boost::any to msgpack::object
for both internal and API in autobahn: memory efficiency and less dynamic
allocations.

Both can easy to convert to different types.

"convert" sounds like double effort / memory.

I guess my question is: how would I parse CBOR from the wire, while
creating a native msgpack::object on the fly and vice-versa, without
creating some CBOR native object from the wire, and subsequently convert
that to msgpack::object?


Reply to this email directly or view it on GitHub
#414 (comment).

@oberstet
Copy link
Author

@davidchappelle Thanks for reminding me of the other arguments there were for switching from boost::any! I have filed crossbario/autobahn-cpp#86

Sidenote (opinionated): Why use a "user-defined-type" in the app in the first place? Why not use msgpack::object as the native app type? But that's a different, somewhat unrelated question anyway.

In above, I was more concerned that we might end up with:

  • msgpack::object <=> wire-format (MessagePack)
  • msgpack::object <=> JSON intermediate object <=(*)> wire-format (JSON)
  • msgpack::object <=> CBOR intermediate object <=(*)> wire-format (CBOR)

Effectively putting JSON/CBOR second class (because of the necessary additional translation step in ()), and putting the translationg () code into autobahn-cpp, whereas if it would reside in msgpack-c, all users of msgpack-c would profit.

And hence my question of natively supporting JSON and CBOR directly within msgpack-c so we can have:

  • msgpack::object <=> wire-format (MessagePack)
  • msgpack::object <=> wire-format (JSON)
  • msgpack::object <=> wire-format (CBOR)

msgpack-c could still be "rightly" named "msgpack", as the API type is still msgpack::object though it would then support not only MessagePack, but also JSON and CBOR.

Would that be even feasible?

@jpetso Is that what you hinted at with the

I think a more practical variant of this feature request could be for msgpack-c to support a SAX-style deserialization API.

because having that would allow an efficient and cleanly separated implementation of the (*) translation?

@jpetso
Copy link
Contributor

jpetso commented Feb 11, 2016

There is no need for msgpack-c to support any other serialization formats as part of the project as long as it offers interfaces for efficient packing and unpacking. Support for alternative formats can then be added as part of an independent project such as json-msgpack.

  1. In the case of "(anything) => wire-format (MessagePack)", the existing pack interfaces are sufficient.
  2. In the case of "wire-format (MessagePack) => (anything)", there are unnecessary inefficiencies because the current interfaces require the user to convert "wire-format (MessagePack) => msgpack::object => (anything)".

I filed msgpack-c #418 which has a slightly different focus than this issue but would take care of item 2. I also believe we have sorted out any remaining related issues in crossbario/autobahn-cpp#86.

I think that means this issue can be closed, since there is nothing else to do that requires tracking by this issue specifically. Is that correct? If so, please close.

@davidchappelle
Copy link

One thing to also note here. Unmarshaling to a msgpack::object doesn't have
to include a full copy of bytes out of the received buffer. Instead, an
option exists while unpacking to simply point msgpack::internal pointers to
relevant offsets into the user supplied buffer. So although it is an extra
step, any unnecessary copying can be avoided. With that said, I just had a
read through msgpack-c #418 and I think that it would be a great
optimization to take advantage of. However, since it is an internal detail,
it shouldn't hold up any forward progress on supporting different
serialization types.

On Wed, Feb 10, 2016 at 7:04 PM, Jakob Petsovits [email protected]
wrote:

There is no need for msgpack-c to support any other serialization formats
as part of the project as long as it offers interfaces for efficient
packing and unpacking. Support for alternative formats can then be added as
part of an independent project such as json-msgpack
https://github.com/tplgy/json-msgpack.

  1. In the case of "(anything) => wire-format (MessagePack)", the
    existing pack interfaces are sufficient.
  2. In the case of "wire-format (MessagePack) => (anything)", there are
    unnecessary inefficiencies because the current interfaces require the user
    to convert "wire-format (MessagePack) => msgpack::object => (anything)".

I filed msgpack-c #418 #418
which has a slightly different focus than this issue but would take care of
item 2. I also believe we have sorted out any remaining related issues in
crossbario/autobahn-cpp#86
crossbario/autobahn-cpp#86.

I think that means this issue can be closed, since there is nothing else
to do that requires tracking by this issue specifically. Is that correct?
If so, please close.


Reply to this email directly or view it on GitHub
#414 (comment).

@oberstet
Copy link
Author

Thanks all for explaining / discussion. I am closing this issue as I was convinced the "actual itch" is better solved differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants