Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: switch madsim integration and recovery tests to sql backend #18678

Merged
merged 37 commits into from
Oct 7, 2024
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
744d1b5
switch madsim integration tests to sql backend
yezizp2012 Sep 24, 2024
8c8b3f8
Merge branch 'main' into feat/switch-sim-integration-tests
yezizp2012 Sep 24, 2024
bc254f2
fix flaky state table ids
yezizp2012 Sep 24, 2024
fd29b0a
fix throttle madsim test
yezizp2012 Sep 24, 2024
7d1302b
Merge branch 'main' into feat/switch-sim-integration-tests
yezizp2012 Sep 25, 2024
f6a40f4
fix flaky cancel tests
yezizp2012 Sep 25, 2024
5c31d02
change workflows
yezizp2012 Sep 26, 2024
3b2b8a5
Merge branch 'main' into feat/switch-sim-integration-tests
yezizp2012 Sep 26, 2024
5d6a057
export RUST_MIN_STACK
yezizp2012 Sep 26, 2024
f79fc4b
avoid using same sqlite file
yezizp2012 Sep 26, 2024
7f4753c
delete sqlite file to avoid access by other tests
yezizp2012 Sep 26, 2024
b66a7ea
fix path format
yezizp2012 Sep 26, 2024
2d8b68b
fix seed
yezizp2012 Sep 27, 2024
4ad2f51
set log level
yezizp2012 Sep 27, 2024
563730e
try tmpfs
yezizp2012 Sep 27, 2024
10e1611
adjust timeout
yezizp2012 Sep 27, 2024
8b218e1
ignore election for sqlite metastore
yezizp2012 Sep 27, 2024
1dd581e
clippy
yezizp2012 Sep 27, 2024
adb4c35
fix
yezizp2012 Sep 27, 2024
d182fad
update kill
yezizp2012 Sep 29, 2024
98ccf5c
fix cluster limit
yezizp2012 Sep 29, 2024
1717723
fix cluster limit
yezizp2012 Sep 29, 2024
51c79fc
fix
yezizp2012 Sep 29, 2024
d1a8c5e
Merge branch 'main' into feat/switch-sim-integration-tests
yezizp2012 Sep 29, 2024
fba64a8
fix throttle
yezizp2012 Sep 29, 2024
3e05cb6
remove assert of max_committed_epoch
yezizp2012 Sep 29, 2024
6992a39
Merge branch 'main' into feat/switch-sim-integration-tests
yezizp2012 Sep 30, 2024
537c8d8
update timeout
yezizp2012 Sep 30, 2024
c191193
Merge branch 'main' into feat/switch-sim-integration-tests
kwannoel Sep 30, 2024
c227f11
fix catalog exists
kwannoel Sep 30, 2024
f7ca4fd
Merge branch 'main' into feat/switch-sim-integration-tests
kwannoel Sep 30, 2024
cd0f6ca
ban kill meta
kwannoel Sep 30, 2024
f894890
add tracing for extended_e2e_test
kwannoel Oct 1, 2024
9aed64f
Merge branch 'main' into feat/switch-sim-integration-tests
kwannoel Oct 1, 2024
7900b05
bump timeout
kwannoel Oct 1, 2024
c2fc3ee
half test num to 32
kwannoel Oct 1, 2024
e2aa697
Merge branch 'main' into feat/switch-sim-integration-tests
kwannoel Oct 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions ci/scripts/common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ export GCLOUD_DOWNLOAD_TGZ=https://rw-ci-deps-dist.s3.amazonaws.com/google-cloud
export NEXTEST_HIDE_PROGRESS_BAR=true
export RW_TELEMETRY_TYPE=test
export RW_SECRET_STORE_PRIVATE_KEY_HEX="0123456789abcdef"
export RUST_MIN_STACK=4194304

unset LANG
if [ -n "${BUILDKITE_COMMIT:-}" ]; then
Expand Down
17 changes: 8 additions & 9 deletions ci/scripts/deterministic-e2e-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,31 +25,30 @@ git clone https://"$GITHUB_TOKEN"@github.com/risingwavelabs/sqlsmith-query-snaps
# popd
popd

export RUST_LOG=info
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change make it more or less verbose?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aiming to decrease the test duration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀 how's log level related with test duration?

export LOGDIR=.risingwave/log

mkdir -p $LOGDIR

echo "--- deterministic simulation e2e, ci-3cn-2fe, ddl"
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation ./e2e_test/ddl/\*\*/\*.slt 2> $LOGDIR/ddl-{}.log && rm $LOGDIR/ddl-{}.log'
seq "$TEST_NUM" | parallel './risingwave_simulation ./e2e_test/ddl/\*\*/\*.slt 2> $LOGDIR/ddl-{}.log && rm $LOGDIR/ddl-{}.log'

echo "--- deterministic simulation e2e, ci-3cn-2fe, streaming"
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation ./e2e_test/streaming/\*\*/\*.slt 2> $LOGDIR/streaming-{}.log && rm $LOGDIR/streaming-{}.log'
seq "$TEST_NUM" | parallel './risingwave_simulation ./e2e_test/streaming/\*\*/\*.slt 2> $LOGDIR/streaming-{}.log && rm $LOGDIR/streaming-{}.log'

echo "--- deterministic simulation e2e, ci-3cn-2fe, batch"
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation ./e2e_test/batch/\*\*/\*.slt 2> $LOGDIR/batch-{}.log && rm $LOGDIR/batch-{}.log'
seq "$TEST_NUM" | parallel './risingwave_simulation ./e2e_test/batch/\*\*/\*.slt 2> $LOGDIR/batch-{}.log && rm $LOGDIR/batch-{}.log'

echo "--- deterministic simulation e2e, ci-3cn-2fe, kafka source"
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation --kafka-datadir=./scripts/source/test_data ./e2e_test/source/basic/kafka\*.slt 2> $LOGDIR/source-{}.log && rm $LOGDIR/source-{}.log'
seq "$TEST_NUM" | parallel './risingwave_simulation --kafka-datadir=./scripts/source/test_data ./e2e_test/source/basic/kafka\*.slt 2> $LOGDIR/source-{}.log && rm $LOGDIR/source-{}.log'

echo "--- deterministic simulation e2e, ci-3cn-2fe, parallel, streaming"
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation -j 16 ./e2e_test/streaming/\*\*/\*.slt 2> $LOGDIR/parallel-streaming-{}.log && rm $LOGDIR/parallel-streaming-{}.log'
seq "$TEST_NUM" | parallel './risingwave_simulation -j 16 ./e2e_test/streaming/\*\*/\*.slt 2> $LOGDIR/parallel-streaming-{}.log && rm $LOGDIR/parallel-streaming-{}.log'

echo "--- deterministic simulation e2e, ci-3cn-2fe, parallel, batch"
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation -j 16 ./e2e_test/batch/\*\*/\*.slt 2> $LOGDIR/parallel-batch-{}.log && rm $LOGDIR/parallel-batch-{}.log'
seq "$TEST_NUM" | parallel './risingwave_simulation -j 16 ./e2e_test/batch/\*\*/\*.slt 2> $LOGDIR/parallel-batch-{}.log && rm $LOGDIR/parallel-batch-{}.log'

echo "--- deterministic simulation e2e, ci-3cn-2fe, fuzzing (pre-generated-queries)"
timeout 10m seq 64 | parallel MADSIM_TEST_SEED={} RUST_MIN_STACK=4194304 './risingwave_simulation --run-sqlsmith-queries ./src/tests/sqlsmith/tests/sqlsmith-query-snapshots/{} 2> $LOGDIR/fuzzing-{}.log && rm $LOGDIR/fuzzing-{}.log'
timeout 10m seq 64 | parallel RUST_MIN_STACK=4194304 './risingwave_simulation --run-sqlsmith-queries ./src/tests/sqlsmith/tests/sqlsmith-query-snapshots/{} 2> $LOGDIR/fuzzing-{}.log && rm $LOGDIR/fuzzing-{}.log'

echo "--- deterministic simulation e2e, ci-3cn-2fe, e2e extended mode test"
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation -e 2> $LOGDIR/extended-{}.log && rm $LOGDIR/extended-{}.log'
seq "$TEST_NUM" | parallel './risingwave_simulation -e 2> $LOGDIR/extended-{}.log && rm $LOGDIR/extended-{}.log'
kwannoel marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 1 addition & 1 deletion ci/scripts/deterministic-it-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ mv target/ci-sim target/sim
TEST_PATTERN="$@"

echo "--- Run integration tests in deterministic simulation mode"
seq "$TEST_NUM" | parallel "MADSIM_TEST_SEED={} NEXTEST_PROFILE=ci-sim \
seq "$TEST_NUM" | parallel "NEXTEST_PROFILE=ci-sim \
cargo nextest run \
--no-fail-fast \
--cargo-metadata target/nextest/cargo-metadata.json \
Expand Down
29 changes: 13 additions & 16 deletions ci/scripts/deterministic-recovery-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ risingwave_meta::rpc::ddl_controller=debug,\
risingwave_meta::barrier::mod=debug,\
risingwave_simulation=debug,\
risingwave_meta::stream::stream_manager=debug,\
risingwave_meta::barrier::progress=debug"
risingwave_meta::barrier::progress=debug,\
sqlx=error"

# Extra logs you can enable if the existing trace does not give enough info.
#risingwave_stream::executor::backfill=trace,
Expand Down Expand Up @@ -48,52 +49,48 @@ trap filter_stack_trace_for_all_logs ERR
# NOTE(kwannoel): We must use `export` here, because the variables are not substituted
# directly via bash subtitution. Instead, the `parallel` command substitutes the variables
# from the environment. If they are declared without `export`, `parallel` can't read them from the env.
export EXTRA_ARGS=""

if [[ -n "${USE_SQL_BACKEND:-}" ]]; then
export EXTRA_ARGS="--sqlite-data-dir=."
fi
export EXTRA_ARGS="--sqlite-data-dir=."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we default to use SQL backend with a temporary directory instead, so that we don't have to specify this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm curious how we emulate SQLite? Will it really perform disk I/O? Can we simply use a in-memory instance that's accessible within the same process?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think we can use in-memory instance indeed, now that we no longer kill meta-node (for now).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we no longer kill meta-node (for now)

In-memory database can also be shared by multiple lifespans of meta, as long as we keep holding a reference ourselves.

https://arc.net/l/quote/lrgjtbyn

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I kept this config is that I still prefer to use in-memory mode for simulation if it's not configured. But currently we can't, because we still have kill logic for meta in some integration tests. 🥵
Besides the disk I/O indeed affected the duration of the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still prefer to use in-memory mode for simulation if it's not configured

By in-memory, do you mean in-memory kv? Can we switch to in-memory SQLite instead (also for playground)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By in-memory, do you mean in-memory kv? Can we switch to in-memory SQLite instead (also for playground)?

I mean in-memory SQLite. Playground also uses it.


if [[ -n "${USE_ARRANGEMENT_BACKFILL:-}" ]]; then
export EXTRA_ARGS="$EXTRA_ARGS --use-arrangement-backfill"
fi

echo "--- EXTRA_ARGS: ${EXTRA_ARGS}"

echo "--- deterministic simulation e2e, ci-3cn-2fe-3meta, recovery, background_ddl"
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation \
echo "--- deterministic simulation e2e, ci-3cn-2fe-1meta, recovery, background_ddl"
seq "$TEST_NUM" | parallel './risingwave_simulation \
--kill \
--kill-rate=${KILL_RATE} \
${EXTRA_ARGS:-} \
./e2e_test/background_ddl/sim/basic.slt \
2> $LOGDIR/recovery-background-ddl-{}.log && rm $LOGDIR/recovery-background-ddl-{}.log'

echo "--- deterministic simulation e2e, ci-3cn-2fe-3meta, recovery, ddl"
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation \
echo "--- deterministic simulation e2e, ci-3cn-2fe-1meta, recovery, ddl"
seq "$TEST_NUM" | parallel './risingwave_simulation \
--kill \
--kill-rate=${KILL_RATE} \
--background-ddl-rate=${BACKGROUND_DDL_RATE} \
${EXTRA_ARGS:-} \
./e2e_test/ddl/\*\*/\*.slt 2> $LOGDIR/recovery-ddl-{}.log && rm $LOGDIR/recovery-ddl-{}.log'

echo "--- deterministic simulation e2e, ci-3cn-2fe-3meta, recovery, streaming"
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation \
echo "--- deterministic simulation e2e, ci-3cn-2fe-1meta, recovery, streaming"
seq "$TEST_NUM" | parallel './risingwave_simulation \
--kill \
--kill-rate=${KILL_RATE} \
--background-ddl-rate=${BACKGROUND_DDL_RATE} \
${EXTRA_ARGS:-} \
./e2e_test/streaming/\*\*/\*.slt 2> $LOGDIR/recovery-streaming-{}.log && rm $LOGDIR/recovery-streaming-{}.log'

echo "--- deterministic simulation e2e, ci-3cn-2fe-3meta, recovery, batch"
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation \
echo "--- deterministic simulation e2e, ci-3cn-2fe-1meta, recovery, batch"
seq "$TEST_NUM" | parallel './risingwave_simulation \
--kill \
--kill-rate=${KILL_RATE} \
--background-ddl-rate=${BACKGROUND_DDL_RATE} \
${EXTRA_ARGS:-} \
./e2e_test/batch/\*\*/\*.slt 2> $LOGDIR/recovery-batch-{}.log && rm $LOGDIR/recovery-batch-{}.log'

echo "--- deterministic simulation e2e, ci-3cn-2fe-3meta, recovery, kafka source,sink"
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation \
echo "--- deterministic simulation e2e, ci-3cn-2fe-1meta, recovery, kafka source,sink"
seq "$TEST_NUM" | parallel './risingwave_simulation \
--kill \
--kill-rate=${KILL_RATE} \
--kafka-datadir=./scripts/source/test_data \
Expand Down
2 changes: 1 addition & 1 deletion ci/scripts/run-deterministic-fuzz-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,4 @@ download-and-decompress-artifact risingwave_simulation .
chmod +x ./risingwave_simulation

echo "--- deterministic simulation e2e, ci-3cn-2fe, fuzzing (seed)"
seq 32 | parallel MADSIM_TEST_SEED={} './risingwave_simulation --sqlsmith 100 ./src/tests/sqlsmith/tests/testdata 2> $LOGDIR/fuzzing-{}.log && rm $LOGDIR/fuzzing-{}.log'
seq 32 | parallel './risingwave_simulation --sqlsmith 100 ./src/tests/sqlsmith/tests/testdata 2> $LOGDIR/fuzzing-{}.log && rm $LOGDIR/fuzzing-{}.log'
24 changes: 2 additions & 22 deletions ci/workflows/main-cron.yml
Original file line number Diff line number Diff line change
Expand Up @@ -367,27 +367,7 @@ steps:
timeout_in_minutes: 80
retry: *auto-retry

# sql backend recovery tests
- label: "recovery test (sql,madsim)"
key: "recovery-test-deterministic-sql"
command: "TEST_NUM=12 KILL_RATE=1.0 BACKGROUND_DDL_RATE=0.0 USE_SQL_BACKEND=true timeout 65m ci/scripts/deterministic-recovery-test.sh"
# NOTE(kwannoel): It will only run when the recovery tests label is added currently.
# This is because there are currently some bugs which cause the test to fail.
if: |
build.pull_request.labels includes "ci/run-sql-recovery-test-deterministic-simulation"
|| build.env("CI_STEPS") =~ /(^|,)sql-recovery-tests?-deterministic-simulation(,|$$)/
depends_on: "build-simulation"
plugins:
- docker-compose#v5.1.0:
run: rw-build-env
config: ci/docker-compose.yml
mount-buildkite-agent: true
# Only upload zipped files, otherwise the logs is too much.
- ./ci/plugins/upload-failure-logs-zipped
timeout_in_minutes: 70
retry: *auto-retry

- label: "recovery test (etcd,madsim)"
- label: "recovery test (madsim)"
key: "recovery-test-deterministic"
command: "TEST_NUM=12 KILL_RATE=1.0 BACKGROUND_DDL_RATE=0.0 timeout 65m ci/scripts/deterministic-recovery-test.sh"
if: |
Expand All @@ -406,7 +386,7 @@ steps:
retry: *auto-retry

# Ddl statements will randomly run with background_ddl.
- label: "background_ddl, arrangement_backfill recovery test (etcd,madsim)"
- label: "background_ddl, arrangement_backfill recovery test (madsim)"
key: "background-ddl-arrangement-backfill-recovery-test-deterministic"
command: "TEST_NUM=12 KILL_RATE=1.0 BACKGROUND_DDL_RATE=0.8 USE_ARRANGEMENT_BACKFILL=true timeout 65m ci/scripts/deterministic-recovery-test.sh"
if: |
Expand Down
8 changes: 4 additions & 4 deletions ci/workflows/pull-request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -569,7 +569,7 @@ steps:
retry: *auto-retry

- label: "end-to-end test (deterministic simulation)"
command: "TEST_NUM=16 ci/scripts/deterministic-e2e-test.sh"
command: "TEST_NUM=4 ci/scripts/deterministic-e2e-test.sh"
if: |
!(build.pull_request.labels includes "ci/pr/run-selected") && build.env("CI_STEPS") == null
|| build.pull_request.labels includes "ci/run-e2e-test-deterministic-simulation"
Expand All @@ -586,12 +586,12 @@ steps:
environment:
- GITHUB_TOKEN
- ./ci/plugins/upload-failure-logs
timeout_in_minutes: 25
timeout_in_minutes: 30
cancel_on_build_failing: true
retry: *auto-retry

- label: "recovery test (deterministic simulation)"
command: "TEST_NUM=8 KILL_RATE=1.0 BACKGROUND_DDL_RATE=0.0 ci/scripts/deterministic-recovery-test.sh"
command: "TEST_NUM=4 KILL_RATE=1.0 BACKGROUND_DDL_RATE=0.0 ci/scripts/deterministic-recovery-test.sh"
if: |
!(build.pull_request.labels includes "ci/pr/run-selected") && build.env("CI_STEPS") == null
|| build.pull_request.labels includes "ci/run-recovery-test-deterministic-simulation"
Expand All @@ -610,7 +610,7 @@ steps:
# - test-collector#v1.0.0:
# files: "*-junit.xml"
# format: "junit"
timeout_in_minutes: 25
timeout_in_minutes: 30
cancel_on_build_failing: true
retry: *auto-retry

Expand Down
4 changes: 4 additions & 0 deletions src/config/ci-sim.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,7 @@ max_concurrent_creating_streaming_jobs = 0

[meta]
meta_leader_lease_secs = 10

[meta.developer]
meta_actor_cnt_per_worker_parallelism_soft_limit = 65536
meta_actor_cnt_per_worker_parallelism_hard_limit = 65536
Comment on lines +11 to +13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's this related with etcd vs sql? 👀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add this to the list of things to follow up on.

1 change: 0 additions & 1 deletion src/meta/model_v2/src/hummock_version_delta.rs
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ impl From<Model> for PbHummockVersionDelta {
let ret = value.full_version_delta.to_protobuf();
assert_eq!(value.id, ret.id as i64);
assert_eq!(value.prev_id, ret.prev_id as i64);
assert_eq!(value.max_committed_epoch, ret.max_committed_epoch as i64);
assert_eq!(value.trivial_move, ret.trivial_move);
ret
}
Expand Down
8 changes: 2 additions & 6 deletions src/meta/node/src/server.rs
Original file line number Diff line number Diff line change
Expand Up @@ -94,9 +94,7 @@ use crate::manager::{
};
use crate::rpc::cloud_provider::AwsEc2Client;
use crate::rpc::election::etcd::EtcdElectionClient;
use crate::rpc::election::sql::{
MySqlDriver, PostgresDriver, SqlBackendElectionClient, SqliteDriver,
};
use crate::rpc::election::sql::{MySqlDriver, PostgresDriver, SqlBackendElectionClient};
use crate::rpc::metrics::{
start_fragment_info_monitor, start_worker_info_monitor, GLOBAL_META_METRICS,
};
Expand Down Expand Up @@ -223,9 +221,7 @@ pub async fn rpc_serve(
let id = address_info.advertise_addr.clone();
let conn = meta_store_sql.conn.clone();
let election_client: ElectionClientRef = match conn.get_database_backend() {
DbBackend::Sqlite => {
Arc::new(SqlBackendElectionClient::new(id, SqliteDriver::new(conn)))
}
DbBackend::Sqlite => Arc::new(DummyElectionClient::new(id)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disable election for sqlite metastore:

  1. When multiple meta nodes connect to the same SQLite file, the error (code: 5) database is locked is likely to occur.
  2. When there is only one meta node, election is unnecessary and will only increase overhead.

Copy link
Contributor

@kwannoel kwannoel Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if multiple meta-nodes are instantiated with sqlite backend, how do we decide which one becomes the leader?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if multiple meta-nodes are instantiated with sqlite backend, how do we decide which one becomes the leader?

I don't think this is a reasonable use case... 🤔 Given that SQLite database should not be shared with NFS (or similar), it should not be used for high-availability purposes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just disallow multiple meta when it's SQLite..? (I'm fine leaving it as UB..)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ban multiple meta sounds good. UB for now.

DbBackend::Postgres => {
Arc::new(SqlBackendElectionClient::new(id, PostgresDriver::new(conn)))
}
Expand Down
76 changes: 33 additions & 43 deletions src/meta/src/barrier/command.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1031,50 +1031,40 @@ impl CommandContext {
.unregister_table_ids(table_fragments.all_table_ids().map(TableId::new))
.await?;

match &self.barrier_manager_context.metadata_manager {
MetadataManager::V1(mgr) => {
// NOTE(kwannoel): At this point, catalog manager has persisted the tables already.
// We need to cleanup the table state. So we can do it here.
// The logic is the same as above, for hummock_manager.unregister_table_ids.
if let Err(e) = mgr
.catalog_manager
.cancel_create_materialized_view_procedure(
table_fragments.table_id().table_id,
table_fragments.internal_table_ids(),
)
.await
{
let table_id = table_fragments.table_id().table_id;
tracing::warn!(
table_id,
error = %e.as_report(),
"cancel_create_table_procedure failed for CancelStreamingJob",
);
// If failed, check that table is not in meta store.
// If any table is, just panic, let meta do bootstrap recovery.
// Otherwise our persisted state is dirty.
let mut table_ids = table_fragments.internal_table_ids();
table_ids.push(table_id);
mgr.catalog_manager.assert_tables_deleted(table_ids).await;
}

// We need to drop table fragments here,
// since this is not done in stream manager (foreground ddl)
// OR barrier manager (background ddl)
mgr.fragment_manager
.drop_table_fragments_vec(&HashSet::from_iter(std::iter::once(
table_fragments.table_id(),
)))
.await?;
}
MetadataManager::V2(mgr) => {
mgr.catalog_controller
.try_abort_creating_streaming_job(
kwannoel marked this conversation as resolved.
Show resolved Hide resolved
table_fragments.table_id().table_id as _,
true,
)
.await?;
if let MetadataManager::V1(mgr) = &self.barrier_manager_context.metadata_manager {
// NOTE(kwannoel): At this point, catalog manager has persisted the tables already.
// We need to cleanup the table state. So we can do it here.
// The logic is the same as above, for hummock_manager.unregister_table_ids.
if let Err(e) = mgr
.catalog_manager
.cancel_create_materialized_view_procedure(
table_fragments.table_id().table_id,
table_fragments.internal_table_ids(),
)
.await
{
let table_id = table_fragments.table_id().table_id;
tracing::warn!(
table_id,
error = %e.as_report(),
"cancel_create_table_procedure failed for CancelStreamingJob",
);
// If failed, check that table is not in meta store.
// If any table is, just panic, let meta do bootstrap recovery.
// Otherwise our persisted state is dirty.
let mut table_ids = table_fragments.internal_table_ids();
table_ids.push(table_id);
mgr.catalog_manager.assert_tables_deleted(table_ids).await;
}

// We need to drop table fragments here,
// since this is not done in stream manager (foreground ddl)
// OR barrier manager (background ddl)
mgr.fragment_manager
.drop_table_fragments_vec(&HashSet::from_iter(std::iter::once(
table_fragments.table_id(),
)))
.await?;
}
}

Expand Down
21 changes: 6 additions & 15 deletions src/meta/src/barrier/recovery.rs
Original file line number Diff line number Diff line change
Expand Up @@ -193,21 +193,12 @@ impl GlobalBarrierManagerContext {
) -> MetaResult<bool> {
let (dropped_actors, cancelled) = scheduled_barriers.pre_apply_drop_cancel_scheduled();
let applied = !dropped_actors.is_empty() || !cancelled.is_empty();
if !cancelled.is_empty() {
match &self.metadata_manager {
MetadataManager::V1(mgr) => {
mgr.fragment_manager
.drop_table_fragments_vec(&cancelled)
.await?;
}
MetadataManager::V2(mgr) => {
for job_id in cancelled {
mgr.catalog_controller
.try_abort_creating_streaming_job(job_id.table_id as _, true)
.await?;
}
}
};
if !cancelled.is_empty()
&& let MetadataManager::V1(mgr) = &self.metadata_manager
{
mgr.fragment_manager
.drop_table_fragments_vec(&cancelled)
.await?;
// no need to unregister state table id from hummock manager here, because it's expected that
// we call `purge_state_table_from_hummock` anyway after the current method returns.
}
Expand Down
6 changes: 3 additions & 3 deletions src/meta/src/stream/stream_graph/fragment.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.

use std::collections::{HashMap, HashSet};
use std::collections::{BTreeMap, HashMap, HashSet};
use std::num::NonZeroUsize;
use std::ops::{Deref, DerefMut};
use std::sync::LazyLock;
Expand Down Expand Up @@ -424,8 +424,8 @@ impl StreamFragmentGraph {
}

/// Retrieve the internal tables map of the whole graph.
pub fn internal_tables(&self) -> HashMap<u32, Table> {
let mut tables = HashMap::new();
pub fn internal_tables(&self) -> BTreeMap<u32, Table> {
let mut tables = BTreeMap::new();
for fragment in self.fragments.values() {
for table in fragment.extract_internal_tables() {
let table_id = table.id;
Expand Down
Loading
Loading