Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning Message when insufficient space to hold multiple persistent queues on a file system could be clearer #14839

Open
robbavey opened this issue Jan 11, 2023 · 1 comment · May be fixed by #16656

Comments

@robbavey
Copy link
Member

** logstash version **

Logstash 7.x >= 7.17.5
Logstash 8.x >= 8.3.0

Steps to reproduce:

  1. Configure multiple pipelines where the total PQ requested is greater than the remaining number of bytes on allocated disk
  2. Start logstash

When starting up, Logstash will check the total amount of space required for PQ's on a specified file system against the amount of disk left on that file system, logging a warning when the total amount of space is exceeded.

However, the warning message emitted is difficult to follow and provide the correct remediating action:

I set up a config on my laptop where I have 312Gi free on my local drive, and configured two pipelines, each with

queue.max_bytes: 300gb

configured and started up logstash.

I received the following warning message:

[2023-01-11T17:43:50,148][WARN ][logstash.persistedqueueconfigvalidator] The persistent queue on path "/Users/robbavey/logstash-8.5.0/data/queue/test" won't fit in file system "/dev/disk1s2" when full. Please free or allocate 643171352576 more bytes. The persistent queue on path "/Users/robbavey/logstash-8.5.0/data/queue/test2" won't fit in file system "/dev/disk1s2" when full. Please free or allocate 643171352576 more bytes.

This number - Please free or allocate 643171352576 more bytes. - feels a little confusing as I actually need fewer bytes than that to successfully allow the PQ's to operate.

The number appears to
(Total Size of required disk across all PQ) - (disk used across all PQ)

But the disk may not be dedicated to PQ and the number may be misleading.

It may be more useful to report

  • the total number of bytes that logstash requires for PQ storage for the specific file system
  • the number of bytes actually free on that file system
  • the number of bytes being used by PQ on that file system
  • the requirements for each PQ on that file system

It may also be worth strengthening the warning to state that Logstash may fail to start if this is not resolved

@donoghuc
Copy link
Member

donoghuc commented Nov 7, 2024

As i'm still learning these concepts, can I extend your example to check my understanding of this proposed improvement for you to review before moving forward with an implementation?

In your example we configure two pipelines with persistent queues. The relevant config settings are:
PQ1

queue.max_bytes: 300gb
queue.path: /Users/robbavey/logstash-8.5.0/data/queue/test1

PQ2

queue.max_bytes: 300gb
queue.path: /Users/robbavey/logstash-8.5.0/data/queue/test1

The LogStash::PersistetedQueueConfigValidator#check

def check(running_pipelines, pipeline_configs)
# Compare value of new pipeline config and pipeline registry and cache
has_update = queue_configs_update?(running_pipelines, pipeline_configs) && cache_check_fail?(pipeline_configs)
@last_check_pipeline_configs = pipeline_configs
return unless has_update
warn_msg = []
err_msg = []
queue_path_file_system = Hash.new # (String: queue path, String: file system)
required_free_bytes = Hash.new # (String: file system, Integer: size)
pipeline_configs.select { |config| config.settings.get('queue.type') == 'persisted'}
.select { |config| config.settings.get('queue.max_bytes').to_i != 0 }
.each do |config|
max_bytes = config.settings.get("queue.max_bytes").to_i
page_capacity = config.settings.get("queue.page_capacity").to_i
pipeline_id = config.settings.get("pipeline.id")
queue_path = ::File.join(config.settings.get("path.queue"), pipeline_id)
pq_page_glob = ::File.join(queue_path, "page.*")
create_dirs(queue_path)
used_bytes = get_page_size(pq_page_glob)
file_system = get_file_system(queue_path)
check_page_capacity(err_msg, pipeline_id, max_bytes, page_capacity)
check_queue_usage(warn_msg, pipeline_id, max_bytes, used_bytes)
queue_path_file_system[queue_path] = file_system
if used_bytes < max_bytes
required_free_bytes[file_system] = required_free_bytes.fetch(file_system, 0) + max_bytes - used_bytes
end
end
method is used for computing resource requirements relevant for this warning.

Currently for each of these configs we read in the max_bytes from the config and use the path to determine what filesystem that queue will occupy. In our case both /Users/robbavey/logstash-8.5.0/data/queue/test1 and /Users/robbavey/logstash-8.5.0/data/queue/test2 will occupy /dev/disk1s2. Given we have set the max_bytes greater than the capacity of /dev/disk1s2 we add to the total for that filesystem

if used_bytes < max_bytes
required_free_bytes[file_system] = required_free_bytes.fetch(file_system, 0) + max_bytes - used_bytes
end
.

There are several shortcomings for this approach:

  1. The filesystem required bytes for PQs only appears to be updated when one of the configs is computed to not have enough space.
  2. We assume that the filesystem is dedicated to just PQ storage which is probably not a likely assumption.
  3. It is not clear how to action this as a consumer of the warning.

Proposed improvement:
Here is a proposed example warning

The `max_bytes` allocated for persistent queues for filesystem '/dev/dist1s2' exceed available space.
'/dev/dist1s2' filesystem status:
- Total space required: 600gb
- Currently free space: 312gb
- Current PQ usage: 50gb
- Additional space needed: 288gb

Individual queue requirements on  '/dev/dist1s2'  filesystem:
 /Users/robbavey/logstash-8.5.0/data/queue/test1:
    Current size: 30gb
    Maximum size: 300gb
  /Users/robbavey/logstash-8.5.0/data/queue/test1:
    Current size: 20gb
    Maximum size: 300gb

Please either:
1. Free up disk space
2. Reduce queue.max_bytes in your pipeline configurations
3. Move PQ storage to a filesystem with more available space
Note: Logstash may fail to start if this is not resolved.

What this would involve is passing through all the configs to build up all the required max_bytes and find all the paths. Once we have this information we can partition by the file systems and build a warning (in our example it is just a single filesystem, but there could be several if paths are configured on multiple filesystems).

Implementation notes: I think from reviewing the existing methods in that class we could compute the space available on a file system as well as the space used under a given queue path. This should allow us to distinguish between what we are explicitly using for logstash queue storage vs what may be used by other entities on the system. I see this util module

def human_readable(number)
value, unit = if number > PB
[number / PB, "pb"]
elsif number > TB
[number / TB, "tb"]
elsif number > GB
[number / GB, "gb"]
elsif number > MB
[number / MB, "mb"]
elsif number > KB
[number / KB, "kb"]
else
[number, "b"]
end
format("%.2d%s", value, unit)
end
which i think would be nicer for human readable bytes.

donoghuc added a commit to donoghuc/logstash that referenced this issue Nov 7, 2024
This commit refactors the `PersistedQueueConfigValidator` class to provide a
more detailed, accurate and actionable warning when pipeline's PQ configs are at
risk of running out of disk space. See
elastic#14839 for design considerations. The
highlights of the changes include accurately determining the free resources on a
filesystem disk and then providing a breakdown of the usage for each of the
paths configured for a queue.
donoghuc added a commit to donoghuc/logstash that referenced this issue Nov 7, 2024
This commit refactors the `PersistedQueueConfigValidator` class to provide a
more detailed, accurate and actionable warning when pipeline's PQ configs are at
risk of running out of disk space. See
elastic#14839 for design considerations. The
highlights of the changes include accurately determining the free resources on a
filesystem disk and then providing a breakdown of the usage for each of the
paths configured for a queue.
donoghuc added a commit to donoghuc/logstash that referenced this issue Nov 7, 2024
This commit refactors the `PersistedQueueConfigValidator` class to provide a
more detailed, accurate and actionable warning when pipeline's PQ configs are at
risk of running out of disk space. See
elastic#14839 for design considerations. The
highlights of the changes include accurately determining the free resources on a
filesystem disk and then providing a breakdown of the usage for each of the
paths configured for a queue.
donoghuc added a commit to donoghuc/logstash that referenced this issue Nov 8, 2024
This commit refactors the `PersistedQueueConfigValidator` class to provide a
more detailed, accurate and actionable warning when pipeline's PQ configs are at
risk of running out of disk space. See
elastic#14839 for design considerations. The
highlights of the changes include accurately determining the free resources on a
filesystem disk and then providing a breakdown of the usage for each of the
paths configured for a queue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants