Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add table function fuse_vacuum2() #16049

Open
wants to merge 142 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
142 commits
Select commit Hold shift + click to select a range
3f74fcd
add TableSnapshot v5
SkyFan2002 Jul 15, 2024
1468857
fix
SkyFan2002 Jul 15, 2024
5c3f646
Merge branch 'main' into vacuum2
SkyFan2002 Jul 15, 2024
835636f
fix
SkyFan2002 Jul 15, 2024
8a4d607
add transaction time limit
SkyFan2002 Jul 15, 2024
fffa4e3
set lvt when vacuum begin
SkyFan2002 Jul 15, 2024
a906f38
embed timestamps in paths
SkyFan2002 Jul 16, 2024
f25f31c
refactor
SkyFan2002 Jul 16, 2024
841ac41
add base_snapshot_timestamp
SkyFan2002 Jul 17, 2024
360df0c
fix timestamp
SkyFan2002 Jul 17, 2024
fea0966
refactor
SkyFan2002 Jul 18, 2024
491a542
collect files to be gc
SkyFan2002 Jul 19, 2024
8b0cbfc
fix setting
SkyFan2002 Jul 19, 2024
13f8eb9
add test
SkyFan2002 Jul 21, 2024
d535bb0
add test
SkyFan2002 Jul 22, 2024
7dfea7c
add test
SkyFan2002 Jul 22, 2024
6836ca9
fix compact segment
SkyFan2002 Jul 22, 2024
2854424
add test
SkyFan2002 Jul 22, 2024
71b16bf
fix overflow
SkyFan2002 Jul 22, 2024
7b40b2f
add assertion
SkyFan2002 Jul 22, 2024
fb1bad2
fix block_id_from_location
SkyFan2002 Jul 22, 2024
5389802
Merge remote-tracking branch 'upstream/main' into vacuum2
SkyFan2002 Jul 22, 2024
fe0929f
fix merge error
SkyFan2002 Jul 22, 2024
8f22bde
fix downcast
SkyFan2002 Jul 22, 2024
545a18a
fix list_by_snapshot_id
SkyFan2002 Jul 22, 2024
35bf576
fix logic test
SkyFan2002 Jul 22, 2024
098358a
adjust assertion
SkyFan2002 Jul 23, 2024
10c9fcc
remove unsupported test
SkyFan2002 Jul 23, 2024
c9a80f9
fix test
SkyFan2002 Jul 23, 2024
892743b
fix stream
SkyFan2002 Jul 23, 2024
e92e6f4
modify comment
SkyFan2002 Jul 24, 2024
e50e23c
adjust assertion
SkyFan2002 Jul 24, 2024
0c76677
remove unused setting
SkyFan2002 Jul 24, 2024
72bde42
fix logic test
SkyFan2002 Jul 24, 2024
c1d3e41
remove files in batch
SkyFan2002 Jul 24, 2024
c948d41
add ut
SkyFan2002 Jul 24, 2024
6585daa
Merge remote-tracking branch 'upstream/main' into vacuum2
SkyFan2002 Jul 25, 2024
b88be2b
fix merge error
SkyFan2002 Jul 25, 2024
55076a5
fix merge error
SkyFan2002 Jul 25, 2024
f6bb685
fix logic test
SkyFan2002 Jul 25, 2024
666fa80
move txn time limit to TableSnapshot::new
SkyFan2002 Jul 26, 2024
cc36499
add test
SkyFan2002 Jul 26, 2024
4650654
Merge branch 'main' into vacuum2
SkyFan2002 Jul 26, 2024
a248916
add test
SkyFan2002 Jul 26, 2024
adddfb8
return vacuumed files
SkyFan2002 Jul 26, 2024
f1897b6
fix as_simple
SkyFan2002 Jul 28, 2024
012364e
fix typo
SkyFan2002 Jul 28, 2024
1da47db
Merge branch 'main' into vacuum2
SkyFan2002 Jul 28, 2024
05ba89f
make lint
SkyFan2002 Jul 28, 2024
726a4e5
add test result
SkyFan2002 Jul 28, 2024
5ce2774
fix test result
SkyFan2002 Jul 28, 2024
1829e75
fix test result
SkyFan2002 Jul 26, 2024
3351efe
add more log
SkyFan2002 Jul 29, 2024
c503a8f
Merge remote-tracking branch 'upstream/main' into vacuum2
SkyFan2002 Jul 29, 2024
326bdf4
fix merge error
SkyFan2002 Jul 29, 2024
ac1fa3d
adjust base timestamp
SkyFan2002 Jul 30, 2024
e994f23
rm useless test
SkyFan2002 Jul 30, 2024
f76d8e9
Merge branch 'main' into vacuum2
SkyFan2002 Jul 30, 2024
27b0ed3
fix missing header
SkyFan2002 Jul 30, 2024
439a584
rm unused deps
SkyFan2002 Jul 30, 2024
5e5e205
modify test
SkyFan2002 Jul 30, 2024
48b965b
fix test result
SkyFan2002 Jul 30, 2024
da71301
fix ut
SkyFan2002 Jul 30, 2024
e66873c
rm useless modify
SkyFan2002 Jul 31, 2024
f494aa8
Merge branch 'main' into vacuum2
SkyFan2002 Jul 31, 2024
47afa3c
rm useless modify
SkyFan2002 Aug 1, 2024
bd340df
avoid potential deadlocks
SkyFan2002 Aug 1, 2024
b5a3d4e
add comment
SkyFan2002 Aug 1, 2024
1ab2875
remove index, adjust vacuum order
SkyFan2002 Aug 1, 2024
c30fd57
use latest snapshot as gc root when retention=0
SkyFan2002 Aug 1, 2024
a37cff4
make lint
SkyFan2002 Aug 1, 2024
ceeb2fb
Merge remote-tracking branch 'upstream/main' into vacuum2
SkyFan2002 Aug 1, 2024
d96bfbe
verify vacuum result
SkyFan2002 Aug 1, 2024
5dfbf91
rm unused modify
SkyFan2002 Aug 1, 2024
d313e25
Merge branch 'main' into vacuum2
SkyFan2002 Jul 28, 2024
aee3f21
fix merge
SkyFan2002 Aug 2, 2024
ccfc5f2
adjust vacuum order
SkyFan2002 Aug 2, 2024
3aea988
remove useless modify
SkyFan2002 Aug 2, 2024
9e47594
fix ut
SkyFan2002 Aug 2, 2024
8f90632
fix set_lvt when retention=0
SkyFan2002 Aug 2, 2024
d263ce9
Merge remote-tracking branch 'upstream/main' into vacuum2
SkyFan2002 Aug 3, 2024
fa9be36
fix merge
SkyFan2002 Aug 3, 2024
2c76c93
fix missing header
SkyFan2002 Aug 3, 2024
091aa01
fix missing header
SkyFan2002 Aug 3, 2024
2d14622
add unit test cases compatible with older versions
SkyFan2002 Aug 5, 2024
d252cf9
make lint
SkyFan2002 Aug 5, 2024
359eb49
Merge branch 'main' into vacuum2
SkyFan2002 Aug 5, 2024
7df823f
fix test
SkyFan2002 Aug 5, 2024
783feb5
Merge branch 'main' into vacuum2
SkyFan2002 Aug 5, 2024
c154b26
Merge branch 'main' into vacuum2
SkyFan2002 Aug 5, 2024
e85ad3c
chore: code dedup
dantengsky Aug 5, 2024
7ef2c4e
chore: use databend_storages_common_table_meta::meta::TableMetaTimest…
dantengsky Aug 5, 2024
fa2dd1d
chore: cargo fmt
dantengsky Aug 5, 2024
ba89996
use eixsiting utils mod
dantengsky Aug 5, 2024
a1591c8
Merge pull request #12 from dantengsky/sky_fan_vacuum2
SkyFan2002 Aug 6, 2024
a12eb75
fix typo
SkyFan2002 Aug 6, 2024
c6ab4c6
use old version location generator in ut
SkyFan2002 Aug 6, 2024
d3c8e10
fix typo
SkyFan2002 Aug 6, 2024
19f5a64
chore: lock retention period & set ctx status
dantengsky Aug 6, 2024
5408788
Merge pull request #13 from dantengsky/sky_fan_vacuum2
SkyFan2002 Aug 6, 2024
e25e427
simplify select_gc_root
SkyFan2002 Aug 6, 2024
29c8721
Merge branch 'main' into vacuum2
SkyFan2002 Aug 6, 2024
1a1e29f
chore: check_mutable during apply vacuum
dantengsky Aug 6, 2024
3dece86
support vacuum all
SkyFan2002 Aug 7, 2024
364504a
Merge branch 'main' into vacuum2
SkyFan2002 Aug 7, 2024
00a8239
adjust test
SkyFan2002 Aug 20, 2024
f171755
Merge branch 'main' into vacuum2
SkyFan2002 Aug 2, 2024
49a0360
Merge remote-tracking branch 'upstream/main' into vacuum2
SkyFan2002 Aug 28, 2024
78cf267
fix merge error
SkyFan2002 Aug 28, 2024
f9928d8
Merge branch 'main' into vacuum2
SkyFan2002 Sep 1, 2024
dd66834
Merge remote-tracking branch 'upstream/main' into vacuum2
SkyFan2002 Sep 4, 2024
cd9ee51
make lint
SkyFan2002 Sep 4, 2024
68810b3
Merge branch 'main' into vacuum2
SkyFan2002 Sep 4, 2024
d790fa7
remove v5
SkyFan2002 Sep 4, 2024
0783c92
remove unused dep
SkyFan2002 Sep 4, 2024
2d9ffd1
Merge remote-tracking branch 'upstream/main' into vacuum2
SkyFan2002 Oct 11, 2024
bd8dfe3
fix merge
SkyFan2002 Oct 11, 2024
76e39db
adjust prefix
SkyFan2002 Oct 12, 2024
a30f92b
Merge branch 'main' into vacuum2
SkyFan2002 Oct 12, 2024
7e85add
fix ut
SkyFan2002 Oct 12, 2024
e448b18
fix ut
SkyFan2002 Oct 12, 2024
56e8b9a
Merge remote-tracking branch 'upstream/main' into vacuum2
SkyFan2002 Oct 21, 2024
c671963
reduce mem usage
SkyFan2002 Oct 21, 2024
9fc5cd1
Merge branch 'main' into vacuum2
SkyFan2002 Oct 22, 2024
0cfe806
fix list
SkyFan2002 Oct 24, 2024
1255fe3
Merge branch 'main' into vacuum2
SkyFan2002 Oct 24, 2024
f80380c
limit delete requests
SkyFan2002 Oct 28, 2024
ce12966
Merge branch 'main' into vacuum2
SkyFan2002 Oct 28, 2024
8c0d979
Merge remote-tracking branch 'upstream/main' into vacuum2
SkyFan2002 Nov 7, 2024
0f36e00
fix merge
SkyFan2002 Nov 7, 2024
66d1a22
simplify timestamp
SkyFan2002 Nov 7, 2024
0a439ed
fix ut
SkyFan2002 Nov 7, 2024
abb56ef
fix ut
SkyFan2002 Nov 8, 2024
3db39b0
introduce new setting
SkyFan2002 Nov 11, 2024
59cf125
make lint
SkyFan2002 Nov 11, 2024
25ec0cf
fix ut
SkyFan2002 Nov 11, 2024
65f31dd
add optional args `respect_flash_back`
SkyFan2002 Nov 12, 2024
2454568
modify test
SkyFan2002 Nov 12, 2024
ebe99dd
Merge remote-tracking branch 'upstream/main' into vacuum2
SkyFan2002 Nov 25, 2024
aa02cce
make lint
SkyFan2002 Nov 25, 2024
de4f42c
Merge remote-tracking branch 'upstream/main' into vacuum2
SkyFan2002 Dec 23, 2024
1cc5561
fix
SkyFan2002 Dec 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 10 additions & 0 deletions src/query/catalog/src/catalog/interface.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ use databend_common_exception::ErrorCode;
use databend_common_exception::Result;
use databend_common_meta_app::schema::database_name_ident::DatabaseNameIdent;
use databend_common_meta_app::schema::dictionary_name_ident::DictionaryNameIdent;
use databend_common_meta_app::schema::least_visible_time_ident::LeastVisibleTimeIdent;
use databend_common_meta_app::schema::CatalogInfo;
use databend_common_meta_app::schema::CommitTableMetaReply;
use databend_common_meta_app::schema::CommitTableMetaReq;
Expand Down Expand Up @@ -63,6 +64,7 @@ use databend_common_meta_app::schema::GetSequenceReq;
use databend_common_meta_app::schema::GetTableCopiedFileReply;
use databend_common_meta_app::schema::GetTableCopiedFileReq;
use databend_common_meta_app::schema::IndexMeta;
use databend_common_meta_app::schema::LeastVisibleTime;
use databend_common_meta_app::schema::ListDictionaryReq;
use databend_common_meta_app::schema::ListDroppedTableReq;
use databend_common_meta_app::schema::ListIndexesByIdReq;
Expand Down Expand Up @@ -523,5 +525,13 @@ pub trait Catalog: DynClone + Send + Sync + Debug {
req: ListDictionaryReq,
) -> Result<Vec<(String, DictionaryMeta)>>;

async fn set_table_lvt(
&self,
_name_ident: &LeastVisibleTimeIdent,
_value: &LeastVisibleTime,
) -> Result<LeastVisibleTime> {
unimplemented!()
}

async fn rename_dictionary(&self, req: RenameDictionaryReq) -> Result<()>;
}
9 changes: 8 additions & 1 deletion src/query/catalog/src/table.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ use databend_common_pipeline_core::Pipeline;
use databend_common_storage::Histogram;
use databend_common_storage::StorageMetrics;
use databend_storages_common_table_meta::meta::SnapshotId;
use databend_storages_common_table_meta::meta::TableMetaTimestamps;
use databend_storages_common_table_meta::meta::TableSnapshot;
use databend_storages_common_table_meta::table::ChangeType;
use databend_storages_common_table_meta::table::OPT_KEY_TEMP_PREFIX;
Expand Down Expand Up @@ -234,7 +235,12 @@ pub trait Table: Sync + Send {
}

/// Assembly the pipeline of appending data to storage
fn append_data(&self, ctx: Arc<dyn TableContext>, pipeline: &mut Pipeline) -> Result<()> {
fn append_data(
&self,
ctx: Arc<dyn TableContext>,
pipeline: &mut Pipeline,
_table_meta_timestamps: TableMetaTimestamps,
) -> Result<()> {
let (_, _) = (ctx, pipeline);

Err(ErrorCode::Unimplemented(format!(
Expand All @@ -253,6 +259,7 @@ pub trait Table: Sync + Send {
overwrite: bool,
prev_snapshot_id: Option<SnapshotId>,
_deduplicated_label: Option<String>,
_table_meta_timestamps: TableMetaTimestamps,
) -> Result<()> {
let (_, _, _, _, _, _) = (
ctx,
Expand Down
7 changes: 7 additions & 0 deletions src/query/catalog/src/table_context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ use databend_common_users::GrantObjectVisibilityChecker;
use databend_storages_common_session::SessionState;
use databend_storages_common_session::TxnManagerRef;
use databend_storages_common_table_meta::meta::Location;
use databend_storages_common_table_meta::meta::TableMetaTimestamps;
use databend_storages_common_table_meta::meta::TableSnapshot;
use parking_lot::Mutex;
use parking_lot::RwLock;
use xorf::BinaryFuse16;
Expand Down Expand Up @@ -328,6 +330,11 @@ pub trait TableContext: Send + Sync {

fn has_bloom_runtime_filters(&self, id: usize) -> bool;
fn txn_mgr(&self) -> TxnManagerRef;
fn get_table_meta_timestamps(
&self,
table_id: u64,
previous_snapshot: Option<Arc<TableSnapshot>>,
) -> Result<TableMetaTimestamps>;

fn get_read_block_thresholds(&self) -> BlockThresholds;
fn set_read_block_thresholds(&self, _thresholds: BlockThresholds);
Expand Down
1 change: 1 addition & 0 deletions src/query/ee/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ jwt-simple = { workspace = true }
log = { workspace = true }
opendal = { workspace = true }
tempfile = { workspace = true }
uuid = { workspace = true }

# aws sdk
aws-config = { workspace = true, features = ["behavior-version-latest"] }
Expand Down
1 change: 1 addition & 0 deletions src/query/ee/src/storages/fuse/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,5 @@ pub mod operations;
pub use io::snapshots::get_snapshot_referenced_segments;
pub use operations::vacuum_drop_tables::vacuum_drop_tables;
pub use operations::vacuum_table::do_vacuum;
pub use operations::vacuum_table_v2::do_vacuum2;
pub use operations::virtual_columns::do_refresh_virtual_column;
10 changes: 10 additions & 0 deletions src/query/ee/src/storages/fuse/operations/handler.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ use databend_enterprise_vacuum_handler::VacuumHandler;
use databend_enterprise_vacuum_handler::VacuumHandlerWrapper;

use crate::storages::fuse::do_vacuum;
use crate::storages::fuse::operations::vacuum_table_v2::do_vacuum2;
use crate::storages::fuse::operations::vacuum_temporary_files::do_vacuum_temporary_files;
use crate::storages::fuse::vacuum_drop_tables;
pub struct RealVacuumHandler {}
Expand All @@ -44,6 +45,15 @@ impl VacuumHandler for RealVacuumHandler {
do_vacuum(fuse_table, ctx, retention_time, dry_run).await
}

async fn do_vacuum2(
&self,
fuse_table: &FuseTable,
ctx: Arc<dyn TableContext>,
respect_flash_back: bool,
) -> Result<Vec<String>> {
do_vacuum2(fuse_table, ctx, respect_flash_back).await
}

async fn do_vacuum_drop_tables(
&self,
threads_nums: usize,
Expand Down
2 changes: 1 addition & 1 deletion src/query/ee/src/storages/fuse/operations/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
pub mod handler;
pub mod vacuum_drop_tables;
pub mod vacuum_table;
pub mod vacuum_table_v2;
pub mod vacuum_temporary_files;
pub mod virtual_columns;

pub use handler::RealVacuumHandler;
4 changes: 2 additions & 2 deletions src/query/ee/src/storages/fuse/operations/vacuum_table.rs
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ pub async fn do_gc_orphan_files(
let location_gen = fuse_table.meta_location_generator();
let segment_locations_to_be_purged = get_orphan_files_to_be_purged(
fuse_table,
location_gen.segment_info_prefix(),
location_gen.segment_location_prefix(),
referenced_files.segments,
retention_time,
)
Expand Down Expand Up @@ -298,7 +298,7 @@ pub async fn do_dry_run_orphan_files(
// 2. Get purge orphan segment files.
let segment_locations_to_be_purged = get_orphan_files_to_be_purged(
fuse_table,
location_gen.segment_info_prefix(),
location_gen.segment_location_prefix(),
referenced_files.segments,
retention_time,
)
Expand Down
Loading
Loading