Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Way too much IO #28

Open
gerroon opened this issue Nov 22, 2021 · 4 comments
Open

Way too much IO #28

gerroon opened this issue Nov 22, 2021 · 4 comments
Assignees

Comments

@gerroon
Copy link

gerroon commented Nov 22, 2021

Hi,

I am not sure if it is the Docker setup or LT uses way too much IO. Here I ran "Iotop -a" for like 10 mins and you can see it already read like many GBs. This happens during the day all day, this is just a snapshot.

The thing is that this whole this time while consuming IO like this, the browser client thinks the server is dead or unaccessible. I am the only user with couple browser clients, I do not know why it needs to read hundreds of GBs a day so that it can complete couple words.

nxplayer bin_41vfB3xtbq

This is how I run it on Debian Testing x64


docker run  -d --name=languagetool \
        --restart unless-stopped \
        -e langtool_languageModel=/ngrams \
        -e Java_Xms=1000m \
        -e Java_Xmx=2000m \
        -v /media/docker/languagetool/ngram:/ngrams \
        -p 8010:8010 \
        erikvl87/languagetool


@gerroon
Copy link
Author

gerroon commented Nov 22, 2021

Here is around half and hour of accumulated snapshot, so many Gbs of read data. That seems super excesive to me.

nxplayer bin_NSS2WHk1ju

@Erikvl87 Erikvl87 self-assigned this Nov 26, 2021
@eseiler
Copy link

eseiler commented Nov 30, 2021

I'm not using docker, and on startup I get around 500 MiB of I/O.

This is my call:

java -Xms256m \
     -Xmx2816m \
     -cp languagetool-server.jar org.languagetool.server.HTTPServer \
     --port 8010 \
     --public \
     --allow-origin '*' \
     --config config.properties

cat config.properties

languageModel=/home/ubuntu/ngrams
maxCheckThreads=2
cacheSize=50000
fasttextModel=/home/ubuntu/fastText/lid.176.bin
fasttextBinary=/home/ubuntu/fastText/fasttext
maxCheckTimeMillis=120000
pipelinePrewarming=true
pipelineCaching=true

I use ngrams (1-, 2-, 3-grams) for de and en.

Just some ideas:

  • The default maxCheckThreads is 10, i.e. 10 threads are spawned to process queries, and each thread has its own I/O. Since I'm running LT on a Raspberry Pi with 4 cores, two threads are enough for me.
  • Using fasttext improves language detection - no idea if this affects I/O.
  • While the pipeline is prewarming, the browser client cannot communicate with the LT server. If pipeline prewarming is not enabled, this usually happens on the first query (which then just times out).

Is it possible/feasible for you to run LT without docker to check if there is still so much I/O?

@gerroon
Copy link
Author

gerroon commented Nov 30, 2021

Thanks, I will try your suggestions.

@j-lakeman
Copy link

Any news on this? I'm experiencing similar behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants