Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic stream solution #2

Open
wants to merge 14 commits into
base: reviewed-stream
Choose a base branch
from
Open

Basic stream solution #2

wants to merge 14 commits into from

Conversation

jingxihuang411
Copy link
Member

Description

Implemented basic functionalities for MonitorStream

Solution

Created MonitorStream and Alertstream. MonitorStream currently support "update" and "alert" operator:

  • "update" will send matching tuples returned by topic stream to destinationCollection
  • "alert" will send matching tuples returned by topic stream via http request to user-specified URL

Tests

MonitorStream Update Solution

  1. Create destinationCollection, collection1
  2. Update some documents in the collection1: {"id": "id1", "a_s": "hello", "a_i": "0", "a_f":"1"}
  3. Create MonitorStream in collection1:
    monitor(id="percolate1", operator="update", collection1, destinationCollection, q="a_s:hello", fl="id")
  4. Update additional documents in collection1:
    {"id": "id2", "a_s": "hello", "a_i": "0", "a_f":"1"}
    {"id": "id3", "a_s": "hello", "a_i": "0", "a_f":"1"}
    {"id": "id4", "a_s": "hellohello", "a_i": "0", "a_f":"1"}
  5. Check for matching ids id2 and id3 in destinationCollection
    Note: As per https://lucene.apache.org/solr/guide/8_4/near-real-time-searching.html#commits-and-optimizing, if want to check matches immediately in destinationCollection, need to set true in solrconfig.xml for destinationColletion; Otherwise restart Solr would show the matches

MonitorStream Alert Solution

  1. Create collection1
  2. Update some documents in the collection1: {"id": "id1", "a_s": "hello", "a_i": "0", "a_f":"1"}
  3. Set up a another server running locally to receive the http request. For example, set up a python server running at "http://localhost:8080/abc" with the below code: https://gist.github.com/mdonkers/63e115cc0c79b4f6b8b3a6b797e485c7
  4. Create MonitorStream in collection1:
    monitor(id="monitor1", operator="alert", collection1, url="http://localhost:8080/abc", q="a_s:hello", fl="id")
  5. Update additional documents in collection1:
    {"id": "id2", "a_s": "hello", "a_i": "0", "a_f":"1"}
    {"id": "id3", "a_s": "hello", "a_i": "0", "a_f":"1"}
    {"id": "id4", "a_s": "hellohello", "a_i": "0", "a_f":"1"}
  6. Check that the server running on localhost:8080 get the matching response for the specified field, in this case = {matches={id=13, version=1663375504531521536}}

Note: Currently the only alert function that is implemented is to send a http request to the url specified in the MonitorStream input. Only http requests not https requests are tested and supported now.

configureCluster(4)
.addConfig("conf", getFile("solrj").toPath().resolve("solr").resolve("configsets").resolve("streaming").resolve("conf"))
.addConfig("ml", getFile("solrj").toPath().resolve("solr").resolve("configsets").resolve("ml").resolve("conf"))
.configure();

String collection;
useAlias = random().nextBoolean();
// useAlias = random().nextBoolean();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing this?

Copy link
Member Author

@jingxihuang411 jingxihuang411 Apr 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When changed to useAlias = false since some of the tests cases are skipped becuase when useAlias has a random value of true, some tests with Assume.assumeTrue(!useAlias); will be skipped.

Changed back and remove Assume.assumeTrue(!useAlias); for these test cases

init(tupleStream, targetURL);
}

public void init(TupleStream tupleStream, String targetURL) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method necessary? This logic could go into the second constructor, and the first constructor could call the second constructor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If calling second inside the first constructor, this(tupleStream, targetURL) should be first line.

HttpPost httpPost= new HttpPost((URI.create(targetURL)));
httpPost.setEntity(new StringEntity(requestBody.toString()));

httpClient.execute(httpPost);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice if the user could receive the HTTP statuses of each alert request in the tuple.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm not sure which tuple do you mean. Right now the alert request is sent to the endpoint for each matching tuple so users could retrieve the HTTP status for each request received at the endpoint.

I added the timestamp for sending each request in the tuple.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when adding the timestamp, you should also add the HTTP status code and maybe other information so that the users know if certain documents were not processed properly

this.daemonStream = new DaemonStream(alertStream, id, runInterval, queueSize);
}

public void init(String id, String destinationCollection, String topicCollection, String zkHost, SolrParams solrParams) throws IOException{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There looks like a lot of repeated logic in these init methods, might want to have one call the other

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also similarly to AlertStream, is there a reason to have these init methods, versus just putting the logic into the second two constructors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init method follows the same format as the constructors for other Stream expressions. Also if calling one constructor from another have the problem mentioned above. Moved repeated logic to a separate function instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants