Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text and Keyword aggregation integration tests #176

Draft
wants to merge 9 commits into
base: Integ-newDataTypeForTextAggregations
Choose a base branch
from
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@
import static org.opensearch.sql.legacy.TestUtils.getPhraseIndexMapping;
import static org.opensearch.sql.legacy.TestUtils.getResponseBody;
import static org.opensearch.sql.legacy.TestUtils.getStringIndexMapping;
import static org.opensearch.sql.legacy.TestUtils.getDataTextKeywordIndexMapping;
import static org.opensearch.sql.legacy.TestUtils.getWeblogsIndexMapping;
import static org.opensearch.sql.legacy.TestUtils.isIndexExist;
import static org.opensearch.sql.legacy.TestUtils.loadDataByRestClient;
Expand Down Expand Up @@ -584,7 +585,11 @@ public enum Index {
CALCS(TestsConstants.TEST_INDEX_CALCS,
"calcs",
getMappingFile("calcs_index_mappings.json"),
"src/test/resources/calcs.json"),;
"src/test/resources/calcs.json"),
TEXTKEYWORD(TestsConstants.TEST_INDEX_TEXTKEYWORD,
"textkeyword",
getMappingFile("text_keyword_index_mapping.json"),
"src/test/resources/text_keyword_index.json");

private final String name;
private final String type;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,11 @@ public static String getDataTypeNonnumericIndexMapping() {
return getMappingFile(mappingFile);
}

public static String getDataTextKeywordIndexMapping() {
String mappingFile = "text_keyword_index_mapping.json";
return getMappingFile(mappingFile);
}

public static void loadBulk(Client client, String jsonPath, String defaultIndex)
throws Exception {
System.out.println(String.format("Loading file %s into opensearch cluster", jsonPath));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ public class TestsConstants {
public final static String TEST_INDEX_BEER = TEST_INDEX + "_beer";
public final static String TEST_INDEX_NULL_MISSING = TEST_INDEX + "_null_missing";
public final static String TEST_INDEX_CALCS = TEST_INDEX + "_calcs";
public final static String TEST_INDEX_TEXTKEYWORD = TEST_INDEX + "_textkeyword";

public final static String DATE_FORMAT = "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'";
public final static String TS_DATE_FORMAT = "yyyy-MM-dd HH:mm:ss.SSS";
Expand Down
203 changes: 203 additions & 0 deletions integ-test/src/test/java/org/opensearch/sql/sql/TextTypeIT.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.sql.sql;

import org.junit.Test;
import org.opensearch.sql.legacy.SQLIntegTestCase;


import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_TEXTKEYWORD;
import static org.opensearch.sql.util.MatcherUtils.schema;
import static org.opensearch.sql.util.MatcherUtils.verifySchema;

public class TextTypeIT extends SQLIntegTestCase {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other tests that you tried? These only look like aggregation tests, but it would be good to know if some of the following would also work:

  • WHERE field LIKE "keyFD??"
  • WHERE wildcard("field", "keyFD??")
  • SELECT field LIKE "keyFD??"
    and maybe a couple of string-like functions:
  • SELECT LOCATE("FD", field)
  • SELECT SUBSTRING(field, 3, 2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also try

  • SELECT POSITION(substring IN field) since it seems to fail for text fields (as Margarit demonstrated)

Copy link
Author

@MitchellGale MitchellGale Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT POSITION(substring IN field)

Passed:

  • selectPositionKeyword
  • selectPositionText
  • selectPositionTextKeywordFieldNoFieldData
  • selectPositionTypeTextFieldData
  • selectLocateTextDataFieldNoFields
  • selectPositionTextDataFieldNoFields

Failed:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check selectPositionTextDataFieldNoFields - its giving you a parser error.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, this looks exclusively like an issue with aggregation on text fields.
Strangely, I thought POSITION was going to fail on text fields, since it was failing for Margarit earlier.



@Override
public void init() throws Exception {
super.init();
loadIndex(Index.TEXTKEYWORD);
}

// Select

@Test
public void textKeywordTest() {
var result = executeJdbcRequest(String.format("select typeKeyword from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("typeKeyword", null, "keyword"));
}

@Test
public void aggregateOnText() {
var result = executeJdbcRequest(String.format("select sum(int0) from %s GROUP BY typeText", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("sum(int0)", null, "integer"));
}

@Test
public void aggregateOnKeyword() {
var result = executeJdbcRequest(String.format("select sum(int0) from %s GROUP BY typeKeyword", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("sum(int0)", null, "integer"));
}

@Test
public void aggregateOnTextFieldData() {
var result = executeJdbcRequest(String.format("select sum(int0) from %s GROUP BY typeTextFieldData", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("sum(int0)", null, "integer"));
}

@Test
public void aggregateOnKeywordFieldData() {
var result = executeJdbcRequest(String.format("select sum(int0) from %s GROUP BY typeKeywordFieldNoFieldData", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("sum(int0)", null, "integer"));
}

@Test
public void aggregateOnTextAndFieldDataNoFields() {
var result = executeJdbcRequest(String.format("select sum(int0) from %s GROUP BY textDataFieldNoFields", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("sum(int0)", null, "integer"));
}

// Where like

@Test
public void whereLikeKeyword() {
executeJdbcRequest(String.format("select * from %s WHERE typeKeyword LIKE \\\"key*\\\"", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void whereLikeText() {
executeJdbcRequest(String.format("select * from %s WHERE typeText LIKE \\\"text*\\\"", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void whereLikeKeywordFieldNoFieldData() {
executeJdbcRequest(String.format("select * from %s WHERE typeKeywordFieldNoFieldData LIKE \\\"keyword*\\\"", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void whereLikeTextFieldData() {
executeJdbcRequest(String.format("select * from %s WHERE typeTextFieldData LIKE \\\"keyFD*\\\"", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void whereLiketextDataFieldNoFields() {
executeJdbcRequest(String.format("select * from %s WHERE textDataFieldNoFields LIKE \\\"textFDNF*\\\"", TEST_INDEX_TEXTKEYWORD));
}

// Wildcard

@Test
public void whereWildcardKeyword() {
executeJdbcRequest(String.format("select * from %s WHERE wildcard_query(typeKeyword, \\\"key*\\\")", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void whereWildcardText() {
executeJdbcRequest(String.format("select * from %s WHERE wildcard_query(\\\"typeText\\\", \\\"text*\\\")", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void whereWildcardKeywordFieldNoFieldData() {
executeJdbcRequest(String.format("select * from %s WHERE wildcard_query(\\\"typeKeywordFieldNoFieldData\\\", \\\"keyword*\\\")", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void whereWildcardTextFieldData() {
executeJdbcRequest(String.format("select * from %s WHERE wildcard_query(\\\"typeTextFieldData\\\", \\\"keyFD*\\\")", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void whereWildcardtextDataFieldNoFields() {
executeJdbcRequest(String.format("select * from %s WHERE wildcard_query(\\\"textDataFieldNoFields\\\", \\\"textFDNF*\\\")", TEST_INDEX_TEXTKEYWORD));
}

// Locate

@Test
public void selectLocateKeyword() {
executeJdbcRequest(String.format("select typeKeyword LIKE \\\"key*\\\" from %s", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void selectLocateText() {
executeJdbcRequest(String.format("select typeText LIKE \\\"text*\\\" from %s", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void selectLocateTextKeywordFieldNoFieldData() {
executeJdbcRequest(String.format("select typeKeywordFieldNoFieldData LIKE \\\"keyword*\\\" from %s", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void selectLocateTypeTextFieldData() {
executeJdbcRequest(String.format("select typeTextFieldData LIKE \\\"keyFD*\\\" from %s", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void selectLocateTextDataFieldNoFields() {
executeJdbcRequest(String.format("select textDataFieldNoFields LIKE \\\"textFDNF*\\\" from %s", TEST_INDEX_TEXTKEYWORD));
}

// Position

@Test
public void selectPositionKeyword() {
executeJdbcRequest(String.format("select POSITION(key IN typeKeyword) from %s", TEST_INDEX_TEXTKEYWORD));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you need quotes around the text:

POSITION(\\\"key\\\" IN typeKeyword)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in a47e3ee

}

@Test
public void selectPositionText() {
executeJdbcRequest(String.format("select POSITION(text IN typeText) from %s", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void selectPositionTextKeywordFieldNoFieldData() {
executeJdbcRequest(String.format("select POSITION(keyword IN typeKeywordFieldNoFieldData) from %s", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void selectPositionTypeTextFieldData() {
executeJdbcRequest(String.format("select POSITION(keyFD IN typeTextFieldData) from %s", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void selectPositionTextDataFieldNoFields() {
executeJdbcRequest(String.format("select POSITION(textDataFieldNoFields) from %s", TEST_INDEX_TEXTKEYWORD));
}

// Substring

@Test
public void selectSubstringKeyword() {
executeJdbcRequest(String.format("select SUBSTRING(typeKeyword, 1, 1) from %s", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void selectSubstringText() {
executeJdbcRequest(String.format("select SUBSTRING(typeText, 1, 1) from %s", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void selectSubstringTextKeywordFieldNoFieldData() {
executeJdbcRequest(String.format("select SUBSTRING(typeKeywordFieldNoFieldData, 1, 1) from %s", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void selectSubstringTypeTextFieldData() {
executeJdbcRequest(String.format("select SUBSTRING(typeTextFieldData, 1, 1) from %s", TEST_INDEX_TEXTKEYWORD));
}

@Test
public void selectSubstringTextDataFieldNoFields() {
executeJdbcRequest(String.format("select SUBSTRING(textDataFieldNoFields, 1, 1) from %s", TEST_INDEX_TEXTKEYWORD));
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{
"mappings" : {
"properties" : {
"typeKeyword" : {
"type" : "keyword"
},
"typeText" : {
"type" : "text"
},
"typeKeywordFieldNoFieldData" : {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 10

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try adding a row with more than 10 words. Interesting to test this too.

}
} },
"typeTextFieldData" : {
"type": "text",
"fielddata": true,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be interesting to know how it works if we have a text field with "fielddata": true but doesn't have any fields defined. Does this behave just like typeTextFieldData?

Copy link
Author

@MitchellGale MitchellGale Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 2a8fbac. It passes.

"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 10
}
}
},
"textDataFieldNoFields" : {
"type": "text",
"fielddata": true
},
"int0" : {
"type": "integer"
}
}
}
}
16 changes: 16 additions & 0 deletions integ-test/src/test/resources/text_keyword_index.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{"index": {"_id":"1"}}
{"typeKeyword": "key00", "typeText": "text00", "typeKeywordFieldNoFieldData": "keyword00","typeTextFieldData": "keyFD00", "typeKeywordFieldData": "textFD00", "textDataFieldNoFields": "textFDNF00","int0": 0}
{"index": {"_id":"2"}}
{"typeKeyword": "key01", "typeText": "text01", "typeKeywordFieldNoFieldData": "keyword01", "typeTextFieldData": "keyFD01", "typeTextFieldData": "textFD01OverTen", "textDataFieldNoFields": "textFDNF01", "int0": 1}
{"index": {"_id":"3"}}
{"typeKeyword": "key02", "typeText": "text02", "typeKeywordFieldNoFieldData": "keyword02", "typeTextFieldData": "keyFD02", "typeKeywordFieldData": "textFD02", "textDataFieldNoFields": "textFDNF02", "int0": 2}
{"index": {"_id":"4"}}
{"typeKeyword": "key03", "typeText": "text03", "typeKeywordFieldNoFieldData": "keyword03", "typeTextFieldData": "keyFD03OverTen", "typeKeywordFieldData": "textFD03", "textDataFieldNoFields": "textFDNF03", "int0": 3}
{"index": {"_id":"5"}}
{"typeKeyword": "key04", "typeText": "text04", "typeKeywordFieldNoFieldData": "keyword04", "typeTextFieldData": "keyFD04", "typeKeywordFieldData": "textFD04", "textDataFieldNoFields": "textFDNF04", "int0": 4}
{"index": {"_id":"6"}}
{"typeKeyword": "key05", "typeText": "text05", "typeKeywordFieldNoFieldData": "keyword05", "typeTextFieldData": "keyFD05", "typeKeywordFieldData": "textFD0OverTen5", "textDataFieldNoFields": "textFDNF05", "int0": 5}
{"index": {"_id":"7"}}
{"typeKeyword": "key06", "typeText": "text06", "typeKeywordFieldNoFieldData": "keyword06", "typeTextFieldData": "keyFD06OverTen", "typeKeywordFieldData": "textFD06", "textDataFieldNoFields": "textFDNF06", "int0": 6}
{"index": {"_id":"8"}}
{"typeKeyword": "key07", "typeText": "text07", "typeKeywordFieldNoFieldData": "keyword07", "typeTextFieldData": "keyFD07", "typeKeywordFieldData": "textFD07", "textDataFieldNoFields": "textFDNF07", "int0": 7}