Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAMOA-65 JSON serializer/deserializer dedicated for Kafka components #64

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

pwawrzyniak
Copy link
Contributor

No description provided.

private static final String ZKHOST = "127.0.0.1";
private static final String BROKERHOST = "127.0.0.1";
private static final String BROKERPORT = "9092";
private static final String TOPIC_AVRO = "samoa_test-avro";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this line is not needed?

private static final String ZKHOST = "127.0.0.1";
private static final String BROKERHOST = "127.0.0.1";
private static final String BROKERPORT = "9092";
private static final String TOPIC_AVRO = "samoa_test-avro";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this line is not needed for the JSON test

kafkaServer = TestUtils.createServer(config, mock);

// create topics
AdminUtils.createTopic(zkUtils, TOPIC_AVRO, 1, 1, new Properties(), RackAwareMode.Disabled$.MODULE$);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar with previous comments

@nicolas-kourtellis
Copy link

Thank you @pwawrzyniak for the contributions!
Regarding the current code:

  • Can you change the copyright to be current year? (I wonder if we should keep this year appearing. It needs constant updating every time we have a new release).

  • I added some minor comments on leftovers from the AVRO combined integration. If you can remove them it would be cleaner.

  • I noticed that there is redundancy / repetition between the three PRs (SAMOA-65 Apache Kafka integration components for SAMOA #59,SAMOA-65 JSON serializer/deserializer dedicated for Kafka components #64#65). Is there a way to make them unique to each other? Otherwise I think there will be conflicts when trying to merge them. @gdfm what do you think?

  • After checking the code, I realized that this is dedicated for Kafka.
    A quick question: Can this JSON serializer/deserializer be extended/abstracted to be used by other interfaces besides Kafka (e.g., even storing/retrieving JSON files from disk)? Do you think it is feasible or needs a lot of work?

  • Can we do something similar for Avro?

@pwawrzyniak
Copy link
Contributor Author

Thank you nicolas-kourtellis for your feedback!

Regarding your comments, I removed copyright notices from the code, fixed and clened-up the code according to your comments.

Regarding AVRO code provided within SAMOA-65, the first difference between avro loader and our kafka avro mapper is that avro loader works with json files containing schema header and payload data (json or binary). In our case avro mapper works with byte stream received from kafka and the schema is defined in separate file. The other difference is that avro loader produces Instance object while Kafka Mapper was designed to work with InstanceContentEvent. Moreover kafka avro mapper serializes whole InstanceContentEvent object while avro loader reads data file, creates avro structure and, based on that, builds Instance object.It should be possible to use the concept from avro loader for processing kafka but I suppose it would require implementation of new generic loader, instead of using the old one. Other thing is that we need to have two way serialization and avro loader is used only for reading data, not writing.

And regarding JSON parser, as of now it is prepared to parse messages coming from Apache Kafka in "one-by-one" style, serializing InstanceContentEvent class. It could be potenatially used to parse file (in line-by-line manner for example), but I believe currently, as the mapper accepts byte array as the input, it can be easily used to this task (i.e. as the parser when reading data from text/json file).

@nicolas-kourtellis
Copy link

nicolas-kourtellis commented Jun 28, 2017

Hi @pwawrzyniak,
trying to keep the discussion here to more specifics, do you think we will have conflicts if we merge this one and the other PRs (#59,#65)? (as I had mentioned, there seemed to be a repetition of some classes/methods).
Does it make sense to merge first the Kafka PR and then simplify the Avro and Json ones before merging those as well?

@pwawrzyniak
Copy link
Contributor Author

Hi @nicolas-kourtellis,
It is true that the 3 PRs are dependent, but I think if #59 would be merged first, then there would be no merge conflicts caused by JSON/AVRO components (i.e. #65 and #64).
I think we can proceed in the way that Kafka components (without JSON/AVRO support) are merged and then simplify/combine AVRO/JSON components to improve reusability of the code- but that would require more time to investigate, of course.

@@ -0,0 +1,103 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class the main difference from #59 ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same comment also applies for #65

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the main difference, as well as the test case where KafkaJsonMapper is being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants