forked from postgres/postgres
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace #3
Open
ololobus
wants to merge
3
commits into
postgrespro:master_ci
Choose a base branch
from
ololobus:tablespace3
base: master_ci
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
REINDEX already does full relation rewrite, this patch adds a possibility to specify a new tablespace where new relfilenode will be created.
Is this fine ? It says "cannot reindex system catalogs concurrently" (once), and hits the pg_toast tables for information_schema. Should it skip toast indexes (like it said) ? Or should it REINDEX them on the same tablespace? template1=# REINDEX DATABASE CONCURRENTLY template1 TABLESPACE pg_default; 2020-03-09 15:33:51.792 CDT [6464] WARNING: cannot reindex system catalogs concurrently, skipping all WARNING: cannot reindex system catalogs concurrently, skipping all 2020-03-09 15:33:51.794 CDT [6464] WARNING: skipping tablespace change of "pg_toast_12558_index" 2020-03-09 15:33:51.794 CDT [6464] DETAIL: Cannot move system relation, only REINDEX CONCURRENTLY is performed. WARNING: skipping tablespace change of "pg_toast_12558_index" DETAIL: Cannot move system relation, only REINDEX CONCURRENTLY is performed. 2020-03-09 15:33:51.924 CDT [6464] WARNING: skipping tablespace change of "pg_toast_12543_index" 2020-03-09 15:33:51.924 CDT [6464] DETAIL: Cannot move system relation, only REINDEX CONCURRENTLY is performed. WARNING: skipping tablespace change of "pg_toast_12543_index" DETAIL: Cannot move system relation, only REINDEX CONCURRENTLY is performed. 2020-03-09 15:33:51.982 CDT [6464] WARNING: skipping tablespace change of "pg_toast_12548_index" 2020-03-09 15:33:51.982 CDT [6464] DETAIL: Cannot move system relation, only REINDEX CONCURRENTLY is performed. WARNING: skipping tablespace change of "pg_toast_12548_index" DETAIL: Cannot move system relation, only REINDEX CONCURRENTLY is performed. 2020-03-09 15:33:52.048 CDT [6464] WARNING: skipping tablespace change of "pg_toast_12553_index" 2020-03-09 15:33:52.048 CDT [6464] DETAIL: Cannot move system relation, only REINDEX CONCURRENTLY is performed. WARNING: skipping tablespace change of "pg_toast_12553_index" DETAIL: Cannot move system relation, only REINDEX CONCURRENTLY is performed. REINDEX
ololobus
force-pushed
the
master_ci
branch
2 times, most recently
from
April 8, 2020 12:14
3adf730
to
ec97e14
Compare
nik1tam
pushed a commit
that referenced
this pull request
Aug 15, 2022
We've heard a couple of reports of people having trouble with multi-gigabyte-sized query-texts files. It occurred to me that on 32-bit platforms, there could be an issue with integer overflow of calculations associated with the total query text size. Address that with several changes: 1. Limit pg_stat_statements.max to INT_MAX / 2 not INT_MAX. The hashtable code will bound it to that anyway unless "long" is 64 bits. We still need overflow guards on its use, but this helps. 2. Add a check to prevent extending the query-texts file to more than MaxAllocHugeSize. If it got that big, qtext_load_file would certainly fail, so there's not much point in allowing it. Without this, we'd need to consider whether extent, query_offset, and related variables shouldn't be off_t not size_t. 3. Adjust the comparisons in need_gc_qtexts() to be done in 64-bit arithmetic on all platforms. It appears possible that under duress those multiplications could overflow 32 bits, yielding a false conclusion that we need to garbage-collect the texts file, which could lead to repeatedly garbage-collecting after every hash table insertion. Per report from Bruno da Silva. I'm not convinced that these issues fully explain his problem; there may be some other bug that's contributing to the query-texts file becoming so large in the first place. But it did get that big, so #2 is a reasonable defense, and #3 could explain the reported performance difficulties. (See also commit 8bbe4cb, which addressed some related bugs. The second Discussion: link is the thread that led up to that.) This issue is old, and is primarily a problem for old platforms, so back-patch. Discussion: https://postgr.es/m/CAB+Nuk93fL1Q9eLOCotvLP07g7RAv4vbdrkm0cVQohDVMpAb9A@mail.gmail.com Discussion: https://postgr.es/m/[email protected]
glukhovn
pushed a commit
that referenced
this pull request
Feb 10, 2023
We've heard a couple of reports of people having trouble with multi-gigabyte-sized query-texts files. It occurred to me that on 32-bit platforms, there could be an issue with integer overflow of calculations associated with the total query text size. Address that with several changes: 1. Limit pg_stat_statements.max to INT_MAX / 2 not INT_MAX. The hashtable code will bound it to that anyway unless "long" is 64 bits. We still need overflow guards on its use, but this helps. 2. Add a check to prevent extending the query-texts file to more than MaxAllocHugeSize. If it got that big, qtext_load_file would certainly fail, so there's not much point in allowing it. Without this, we'd need to consider whether extent, query_offset, and related variables shouldn't be off_t not size_t. 3. Adjust the comparisons in need_gc_qtexts() to be done in 64-bit arithmetic on all platforms. It appears possible that under duress those multiplications could overflow 32 bits, yielding a false conclusion that we need to garbage-collect the texts file, which could lead to repeatedly garbage-collecting after every hash table insertion. Per report from Bruno da Silva. I'm not convinced that these issues fully explain his problem; there may be some other bug that's contributing to the query-texts file becoming so large in the first place. But it did get that big, so #2 is a reasonable defense, and #3 could explain the reported performance difficulties. (See also commit 8bbe4cb, which addressed some related bugs. The second Discussion: link is the thread that led up to that.) This issue is old, and is primarily a problem for old platforms, so back-patch. Discussion: https://postgr.es/m/CAB+Nuk93fL1Q9eLOCotvLP07g7RAv4vbdrkm0cVQohDVMpAb9A@mail.gmail.com Discussion: https://postgr.es/m/[email protected]
danolivo
pushed a commit
that referenced
this pull request
Jul 2, 2024
1. TruncateMultiXact() performs the SLRU truncations in a critical section. Deleting the SLRU segments calls ForwardSyncRequest(), which will try to compact the request queue if it's full (CompactCheckpointerRequestQueue()). That in turn allocates memory, which is not allowed in a critical section. Backtrace: TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981 postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e] postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d] postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e] postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb] postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a] postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1] postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b] postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3] postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66] postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d] postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead] postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e] postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb] postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e] /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45] postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31] To fix, bail out in CompactCheckpointerRequestQueue() without doing anything, if it's called in a critical section. That covers the above call path, as well as any other similar cases where RegisterSyncRequest might be called in a critical section. 2. After fixing that, another problem became apparent: Autovacuum process doing that truncation can deadlock with the checkpointer process. TruncateMultiXact() sets "MyProc->delayChkptFlags |= DELAY_CHKPT_START". If the sync request queue is full and cannot be compacted, the process will repeatedly sleep and retry, until there is room in the queue. However, if the checkpointer is trying to start a checkpoint at the same time, and is waiting for the DELAY_CHKPT_START processes to finish, the queue will never shrink. More concretely, the autovacuum process is stuck here: #0 0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=<optimized out>) at ../src/backend/storage/ipc/latch.c:1570 #2 WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=<optimized out>, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516 #3 0x000056220b243224 in WaitLatch (latch=<optimized out>, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:538 #4 0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614 #5 0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495 postgres#6 0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566 postgres#7 0x000056220af98e1b in PerformMembersTruncation (oldestOffset=<optimized out>, newOldestOffset=<optimized out>) at ../src/backend/access/transam/multixact.c:3006 postgres#8 TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201 postgres#9 0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=<optimized out>, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917 postgres#10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760 postgres#11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550 postgres#12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/autovacuum.c:1569 and the checkpointer is stuck here: #0 0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50 #3 0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098 #4 0x000056220b1c6e86 in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:464 To fix, add AbsorbSyncRequests() to the loops where the checkpointer waits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to finish. Backpatch to v14. Before that, SLRU deletion didn't call RegisterSyncRequest, which avoided this failure. I'm not sure if there are other similar scenarios on older versions, but we haven't had any such reports. Discussion: https://www.postgresql.org/message-id/[email protected]
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Test pull-request