Way too much IO #28

gerroon · 2021-11-22T16:59:37Z

Hi,

I am not sure if it is the Docker setup or LT uses way too much IO. Here I ran "Iotop -a" for like 10 mins and you can see it already read like many GBs. This happens during the day all day, this is just a snapshot.

The thing is that this whole this time while consuming IO like this, the browser client thinks the server is dead or unaccessible. I am the only user with couple browser clients, I do not know why it needs to read hundreds of GBs a day so that it can complete couple words.

This is how I run it on Debian Testing x64


docker run  -d --name=languagetool \
        --restart unless-stopped \
        -e langtool_languageModel=/ngrams \
        -e Java_Xms=1000m \
        -e Java_Xmx=2000m \
        -v /media/docker/languagetool/ngram:/ngrams \
        -p 8010:8010 \
        erikvl87/languagetool

The text was updated successfully, but these errors were encountered:

gerroon · 2021-11-22T17:18:52Z

Here is around half and hour of accumulated snapshot, so many Gbs of read data. That seems super excesive to me.

eseiler · 2021-11-30T12:02:01Z

I'm not using docker, and on startup I get around 500 MiB of I/O.

This is my call:

java -Xms256m \
     -Xmx2816m \
     -cp languagetool-server.jar org.languagetool.server.HTTPServer \
     --port 8010 \
     --public \
     --allow-origin '*' \
     --config config.properties

cat config.properties

languageModel=/home/ubuntu/ngrams
maxCheckThreads=2
cacheSize=50000
fasttextModel=/home/ubuntu/fastText/lid.176.bin
fasttextBinary=/home/ubuntu/fastText/fasttext
maxCheckTimeMillis=120000
pipelinePrewarming=true
pipelineCaching=true

I use ngrams (1-, 2-, 3-grams) for de and en.

Just some ideas:

The default maxCheckThreads is 10, i.e. 10 threads are spawned to process queries, and each thread has its own I/O. Since I'm running LT on a Raspberry Pi with 4 cores, two threads are enough for me.
Using fasttext improves language detection - no idea if this affects I/O.
While the pipeline is prewarming, the browser client cannot communicate with the LT server. If pipeline prewarming is not enabled, this usually happens on the first query (which then just times out).

Is it possible/feasible for you to run LT without docker to check if there is still so much I/O?

gerroon · 2021-11-30T16:20:49Z

Thanks, I will try your suggestions.

j-lakeman · 2022-12-19T20:03:19Z

Any news on this? I'm experiencing similar behaviour.

Erikvl87 self-assigned this Nov 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Way too much IO #28

Way too much IO #28

gerroon commented Nov 22, 2021 •

edited

Loading

gerroon commented Nov 22, 2021 •

edited

Loading

eseiler commented Nov 30, 2021 •

edited

Loading

gerroon commented Nov 30, 2021

j-lakeman commented Dec 19, 2022

Way too much IO #28

Way too much IO #28

Comments

gerroon commented Nov 22, 2021 • edited Loading

gerroon commented Nov 22, 2021 • edited Loading

eseiler commented Nov 30, 2021 • edited Loading

gerroon commented Nov 30, 2021

j-lakeman commented Dec 19, 2022

gerroon commented Nov 22, 2021 •

edited

Loading

gerroon commented Nov 22, 2021 •

edited

Loading

eseiler commented Nov 30, 2021 •

edited

Loading