Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: implement Avro output #36

Open
candlerb opened this issue Apr 3, 2018 · 2 comments
Open

FR: implement Avro output #36

candlerb opened this issue Apr 3, 2018 · 2 comments

Comments

@candlerb
Copy link

candlerb commented Apr 3, 2018

Avro is a much more compact output format than JSON, whilst being straightforward to convert to/from JSON. It is natively supported in Kafka and has good support for schema evolution.

@jimmystewpot
Copy link
Contributor

@candlerb I was literally looking at doing something along these lines in the last few days. I just want to confirm that what you are asking for is what I am looking at doing.

You want to have avro as a configurable output encoding type? instead of JSON/Messagepack etc?

@candlerb
Copy link
Author

Exactly.

When you dig down a bit more, this can mean a couple of different things:

  1. When sending to Kafka, write Avro single object format, which includes a fingerprint of the schema with each message. I believe this is the approach taken by the Confluent platform and its schema registry.

  2. When writing to disk, write Avro container files which include the schema in the header followed by batches of records, with each batch separated by a random 16-byte delimiter. These are convenient for map-reduce, and for seeking within a large file (e.g. binary chop).

Both would be nice to have, but I think the first is more important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants