Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backpressure option for bulk import #5023

Open
ddanielr opened this issue Oct 30, 2024 · 1 comment · May be fixed by #5104
Open

Add backpressure option for bulk import #5023

ddanielr opened this issue Oct 30, 2024 · 1 comment · May be fixed by #5104
Assignees
Labels
enhancement This issue describes a new feature, improvement, or optimization.

Comments

@ddanielr
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Bulk import will continue to push files into tablets to a point where user scan performance can degrade.
A user can monitor the number of queued compactions before submitting new bulk import operations.
However, this approach blocks all bulk import operations even if the data was going to tablets which would not degrade scan performance.

Describe the solution you'd like
A bulk import limit threshold property based off of the table.max.file property value should be added.
This would allow bulk import to continue importing into tablets that are not exceeding this new property value and either wait indefinitely or block the fate on those specific tablets.

The bulk import operation should then export the tablets information in order for the user or a separate process can take that input and schedule higher priority compaction jobs to unblock the bulk import operation.

@ddanielr ddanielr added the enhancement This issue describes a new feature, improvement, or optimization. label Oct 30, 2024
@dlmarion
Copy link
Contributor

In #5026 I increased the priority of major compactions where the compaction manager realizes that the tablet is over the file size threshold and there is no compaction queued up. This might alleviate the condition to some degree.

@keith-turner keith-turner self-assigned this Oct 31, 2024
keith-turner added a commit to keith-turner/accumulo that referenced this issue Nov 24, 2024
Bulk imports can add files to a tablet faster than compactions can
shrink the number of files.  There are many scenarios that can cause
this. The following are some of the situations that could cause this.

 * Compactors are all busy when new bulk imports arrive.
 * Many processes bulk import a few files to a single tablet at around
   the same time
 * A single process bulk imports a lot of files to a single tablet

When a tablet has too many files it can eventually cause cascading
problems for compaction and scan.  The change adds two properties to
help avoid this problem.

The first property `table.file.pause`.  The behavior of this
property is to pause bulk imports, and eventually minor compactions,
when a tablets current file counts exceeds this property.  The default
is unlimited and therefore the default will never pause.

The second property is `table.bulk.max.tablet.files`.  This property
determines the maximum number of files a bulk import can add to a single
tablet.  When this limit is exceeded the bulk import operation will fail
w/o making changes to any tablets.

Below is an example of how these properties behave.

 1. Set table.file.pause=30
 2. Set table.bulk.max.tablet.files=100
 3. Import 20 files into tablet A, this causes tablet A to have 20 files
 4. Import 20 files into tablet A, this causes tablet A to have 40 files
 5. Import 20 files into tablet A. Because the tablet currently has 40
    files and the pause limit is 30, this bulk import will pause.
 6. Tablet A compacts 10 files, this causes tablet A to have 31 files.
    It is still above the pause limit so the bulk import does not
    progress.
 7. Tablet A compacts 10 files, this causes tablet A to have 22 files.
 8. The paused bulk import proceeds, this causes tablet A to have 42
    files.
 9. Import 200 files into tablet B and one file in tablet A.  This
    operation fails w/o changing tablet A or B because 200 exceeds the
    value of table.bulk.max.tablet.files.

While making this change ran into two preexisting problems.  One was
with bulk import setting time.  For the case of multiple files the
behavior of setting time was incorrect and inconsistent depending on the
table time type and if the tablet was hosted or not.  Made the behavior
consistent for hosted or unhosted and the two table time types. The
behavior is that single time stamp is allocated for all files in all
cases. The code used to allocate different number of timestamps in the
four different cases.  This behavior was causing tablet refresh to fail
and these changes to fail.  Fixed this existing issue since progress
could not be made on these changes without fixing it. The new test in
this PR that add lots of files to a single tablet and set request bulk
import to set time uncovered the existing problem.

The second problem was the existing code had handling for the case of a
subset of files being added to a tablet by bulk import. This should
never happen because files are added via a mutation. Expect either the
entire mutation to go through or nothing.  Removed this handling for a
subset and changed the code to throw an exception if a subset other than
the empty set is seen. This change greatly simplified implementing this
feature.

fixes apache#5023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This issue describes a new feature, improvement, or optimization.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants