Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] opensearch-with-long-numerals runs into timeout #5993

Open
rlueckl opened this issue Mar 1, 2024 · 11 comments
Open

[BUG] opensearch-with-long-numerals runs into timeout #5993

rlueckl opened this issue Mar 1, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@rlueckl
Copy link

rlueckl commented Mar 1, 2024

Describe the bug

Don't really know how to describe this. OpenSearch Dashboards 2.12.0 fails to fetch data resulting in a timeout, truncated response and broken JSON where OpenSearch Dashboards 2.11.0 works perfectly fine.

To Reproduce
Don't know. Tried to compare 2.11.0 with 2.12.0. The only difference I found is that 2.12.0 calls POST /internal/search/opensearch-with-long-numerals whereas 2.11.0 calls POST /internal/search/opensearch for the exact same query. So there might be a problem with the "long-numerals" part.

The query is a simple 15 second time window on one of our indices. 2.11.0 gives back 397 hits with a response size of 1,02MB within 260ms according to the developer console. 2.12.0 runs into a timeout (120sec) then throws the following error:

JSON.parse: expected ',' or '}' after property value in object at line 1 column 306251 of the JSON data

HttpFetchError@https://host03.server.lan/7326/bundles/core/core.entry.js:15:184257
fetchResponse@https://host03.server.lan/7326/bundles/core/core.entry.js:15:191557

The response size is also 1,02MB (after all it's the same query).

No errors visible in the log (journalctl -u opensearch-dashboards.service) of OpenSearch Dashboards.

Expected behavior
Dashboards 2.12.0 works the same as 2.11.0

OpenSearch Version
2.12.0 (Debian Package installed from artifacts.opensearch.org/releases/bundle/opensearch/2.x/apt)

Dashboards Version
2.12.0 (Debian Package installed from artifacts.opensearch.org/releases/bundle/opensearch-dashboards/2.x/apt)

Plugins

# OPENSEARCH_JAVA_HOME=/usr/share/opensearch/jdk /usr/share/opensearch/bin/opensearch-plugin list 
opensearch-alerting
opensearch-anomaly-detection
opensearch-asynchronous-search
opensearch-cross-cluster-replication
opensearch-custom-codecs
opensearch-flow-framework
opensearch-geospatial
opensearch-index-management
opensearch-job-scheduler
opensearch-knn
opensearch-ml
opensearch-neural-search
opensearch-notifications
opensearch-notifications-core
opensearch-observability
opensearch-performance-analyzer
opensearch-reports-scheduler
opensearch-security
opensearch-security-analytics
opensearch-skills
opensearch-sql
prometheus-exporter

(Prometheus Exporter Plugin from: Aiven-Open/prometheus-exporter-plugin-for-opensearch)

Screenshots

Comparing request & response headers with Meld:
compare_request_response_headers

Exact same query with the same request and response sizes results in different runtimes and error on 2.12.0 (/internal/search/opensearch-with-long-numerals) vs. 2.11.0 (/internal/search/opensearch)

2.11.0 works as expected:
dashboards_2 11 0

2.12.0 timeouts and throws error:
dashboards_2 12 0

Host/Environment (please complete the following information):

  • Server OS: Debian 12 Bookworm
  • Client OS: Linux Mint 21.3 Virginia
  • Firefox 123.0
@rlueckl rlueckl added bug Something isn't working untriaged labels Mar 1, 2024
@rlueckl rlueckl changed the title [BUG] [BUG] opensearch-with-long-numerals runs into timeout Mar 1, 2024
@rlueckl
Copy link
Author

rlueckl commented Mar 1, 2024

I could narrow it down to one specific log from a Cassandra system.log which apparently causes the timeout/JSON Parse error in 2.12.0. Two examples attached:

cassandra_example1.log
cassandra_example2.log

The "message" and "logmessage" fields are quite long, but it's a normal output for Cassandra and causes no issues in Dashboards 2.11.0

Looking at the examples the error apparently happens when Dashboards parses the message field:

cassandra_example1.log:
Completing uncommitted paxos instances for ****** on ranges [(9206423891869844203,-9207833944162114199], 
                                                                                  ^ this is where the syntax error happens

cassandra_example2.log:
Completed 0 uncommitted paxos instances for ****** on ranges [(9206423891869844203,-9207833944162114199],
                                                                                   ^ this is where the syntax error happens

So it looks like that the error is happening within a String (the "message" field). Why does Dashboards try to parse this string as JSON???

Settings for this particular index and fields:
index_settings

@rlueckl
Copy link
Author

rlueckl commented Mar 1, 2024

I've created a smaller example which also throws the JSON parse error in Dashboards 2.12.0:

Steps to reproduce:

  • Add the following document to an index in your opensearch:
$ curl -v -H "Content-Type: application/json" -X POST "https://myopensearchhost01.server.lan:9200/logstash-2024.03.01/_doc" -d@minimal_example.json -u "user:pass"

minimal_example.json

  • Use "Discover" in OpenSearch Dashboards 2.12.0 and try to query a timerange which contains the document (Feb. 28th, 05:32).
  • You'll get the above mentioned exception.
  • Same thing with OpenSearch Dashboards 2.11.0 works fine.

@atreyd
Copy link

atreyd commented Mar 2, 2024

We are also facing the same problem after 2.12.0 upgrade any leads or fix would be appriciated. Surprising it's only happening with some specific indexes.

@ananzh
Copy link
Member

ananzh commented Mar 5, 2024

@AMoo-Miki is this fixed? could you double check and resolve this issue?

@ananzh ananzh removed the untriaged label Mar 5, 2024
@atreyd
Copy link

atreyd commented Mar 7, 2024

@ananzh - can you please share details about the fixed release verison for this issue or if it's included in future release.

@msoler8785
Copy link

This is occurring for me as well. Is there any update to this?

@msoler8785
Copy link

Looks like this may have been addressed in the 2.13 release here: #6134

@rlueckl
Copy link
Author

rlueckl commented Apr 10, 2024

Can anybody confirm if the bug has been fixed in 2.13.0? I don't have a test cluster unfortunately.

I've tried updating Dashboards only, but it seems that it's not backwards compatible with server version 2.12.0:

{"type":"log","@timestamp":"2024-04-10T06:03:55Z","tags":["error","savedobjects-service"],"pid":330933,"message":"This version of OpenSearch Dashboards (v2.13.0) is incompatible with the following OpenSearch nodes in your cluster: v2.12.0 @ hostname01.lan/10.x.x.x:9200 (10.x.x.x), v2.12.0 @ hostname02.lan/10.x.x.x:9200 (10.x.x.x)"}

@cinhtau
Copy link

cinhtau commented Apr 12, 2024

We have taken the hotfix and built our own 2.12 snapshot version. Using your minimal data lead to no error.

We run the data on our 2.13 test cluster and no problem either.

image

Seems it is safe to upgrade to 2.13 regarding the long numerals bug.
For safety reasons we will wait out the community experience on 2.13.

@rlueckl
Copy link
Author

rlueckl commented Apr 12, 2024

Hi @cinhtau ,

please see my comment in #6134 : the minimal example works now, but longer examples still lead to loops: #6134 (comment)

@cinhtau
Copy link

cinhtau commented Apr 12, 2024

#6377 seems to be related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants