-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: switch madsim integration and recovery tests to sql backend #18678
Changes from 27 commits
744d1b5
8c8b3f8
bc254f2
fd29b0a
7d1302b
f6a40f4
5c31d02
3b2b8a5
5d6a057
f79fc4b
7f4753c
b66a7ea
2d8b68b
4ad2f51
563730e
10e1611
8b218e1
1dd581e
adb4c35
d182fad
98ccf5c
1717723
51c79fc
d1a8c5e
fba64a8
3e05cb6
6992a39
537c8d8
c191193
c227f11
f7ca4fd
cd0f6ca
f894890
9aed64f
7900b05
c2fc3ee
e2aa697
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,7 +15,8 @@ risingwave_meta::rpc::ddl_controller=debug,\ | |
risingwave_meta::barrier::mod=debug,\ | ||
risingwave_simulation=debug,\ | ||
risingwave_meta::stream::stream_manager=debug,\ | ||
risingwave_meta::barrier::progress=debug" | ||
risingwave_meta::barrier::progress=debug,\ | ||
sqlx=error" | ||
|
||
# Extra logs you can enable if the existing trace does not give enough info. | ||
#risingwave_stream::executor::backfill=trace, | ||
|
@@ -48,52 +49,48 @@ trap filter_stack_trace_for_all_logs ERR | |
# NOTE(kwannoel): We must use `export` here, because the variables are not substituted | ||
# directly via bash subtitution. Instead, the `parallel` command substitutes the variables | ||
# from the environment. If they are declared without `export`, `parallel` can't read them from the env. | ||
export EXTRA_ARGS="" | ||
|
||
if [[ -n "${USE_SQL_BACKEND:-}" ]]; then | ||
export EXTRA_ARGS="--sqlite-data-dir=." | ||
fi | ||
export EXTRA_ARGS="--sqlite-data-dir=." | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we default to use SQL backend with a temporary directory instead, so that we don't have to specify this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I'm curious how we emulate SQLite? Will it really perform disk I/O? Can we simply use a in-memory instance that's accessible within the same process? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh I think we can use in-memory instance indeed, now that we no longer kill meta-node (for now). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In-memory database can also be shared by multiple lifespans of meta, as long as we keep holding a reference ourselves. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason why I kept this config is that I still prefer to use in-memory mode for simulation if it's not configured. But currently we can't, because we still have kill logic for meta in some integration tests. 🥵 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
By in-memory, do you mean in-memory kv? Can we switch to in-memory SQLite instead (also for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I mean in-memory SQLite. Playground also uses it. |
||
|
||
if [[ -n "${USE_ARRANGEMENT_BACKFILL:-}" ]]; then | ||
export EXTRA_ARGS="$EXTRA_ARGS --use-arrangement-backfill" | ||
fi | ||
|
||
echo "--- EXTRA_ARGS: ${EXTRA_ARGS}" | ||
|
||
echo "--- deterministic simulation e2e, ci-3cn-2fe-3meta, recovery, background_ddl" | ||
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation \ | ||
echo "--- deterministic simulation e2e, ci-3cn-2fe-1meta, recovery, background_ddl" | ||
seq "$TEST_NUM" | parallel './risingwave_simulation \ | ||
--kill \ | ||
--kill-rate=${KILL_RATE} \ | ||
${EXTRA_ARGS:-} \ | ||
./e2e_test/background_ddl/sim/basic.slt \ | ||
2> $LOGDIR/recovery-background-ddl-{}.log && rm $LOGDIR/recovery-background-ddl-{}.log' | ||
|
||
echo "--- deterministic simulation e2e, ci-3cn-2fe-3meta, recovery, ddl" | ||
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation \ | ||
echo "--- deterministic simulation e2e, ci-3cn-2fe-1meta, recovery, ddl" | ||
seq "$TEST_NUM" | parallel './risingwave_simulation \ | ||
--kill \ | ||
--kill-rate=${KILL_RATE} \ | ||
--background-ddl-rate=${BACKGROUND_DDL_RATE} \ | ||
${EXTRA_ARGS:-} \ | ||
./e2e_test/ddl/\*\*/\*.slt 2> $LOGDIR/recovery-ddl-{}.log && rm $LOGDIR/recovery-ddl-{}.log' | ||
|
||
echo "--- deterministic simulation e2e, ci-3cn-2fe-3meta, recovery, streaming" | ||
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation \ | ||
echo "--- deterministic simulation e2e, ci-3cn-2fe-1meta, recovery, streaming" | ||
seq "$TEST_NUM" | parallel './risingwave_simulation \ | ||
--kill \ | ||
--kill-rate=${KILL_RATE} \ | ||
--background-ddl-rate=${BACKGROUND_DDL_RATE} \ | ||
${EXTRA_ARGS:-} \ | ||
./e2e_test/streaming/\*\*/\*.slt 2> $LOGDIR/recovery-streaming-{}.log && rm $LOGDIR/recovery-streaming-{}.log' | ||
|
||
echo "--- deterministic simulation e2e, ci-3cn-2fe-3meta, recovery, batch" | ||
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation \ | ||
echo "--- deterministic simulation e2e, ci-3cn-2fe-1meta, recovery, batch" | ||
seq "$TEST_NUM" | parallel './risingwave_simulation \ | ||
--kill \ | ||
--kill-rate=${KILL_RATE} \ | ||
--background-ddl-rate=${BACKGROUND_DDL_RATE} \ | ||
${EXTRA_ARGS:-} \ | ||
./e2e_test/batch/\*\*/\*.slt 2> $LOGDIR/recovery-batch-{}.log && rm $LOGDIR/recovery-batch-{}.log' | ||
|
||
echo "--- deterministic simulation e2e, ci-3cn-2fe-3meta, recovery, kafka source,sink" | ||
seq "$TEST_NUM" | parallel MADSIM_TEST_SEED={} './risingwave_simulation \ | ||
echo "--- deterministic simulation e2e, ci-3cn-2fe-1meta, recovery, kafka source,sink" | ||
seq "$TEST_NUM" | parallel './risingwave_simulation \ | ||
--kill \ | ||
--kill-rate=${KILL_RATE} \ | ||
--kafka-datadir=./scripts/source/test_data \ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,3 +7,7 @@ max_concurrent_creating_streaming_jobs = 0 | |
|
||
[meta] | ||
meta_leader_lease_secs = 10 | ||
|
||
[meta.developer] | ||
meta_actor_cnt_per_worker_parallelism_soft_limit = 65536 | ||
meta_actor_cnt_per_worker_parallelism_hard_limit = 65536 | ||
Comment on lines
+11
to
+13
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How's this related with etcd vs sql? 👀 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let me add this to the list of things to follow up on. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -94,9 +94,7 @@ use crate::manager::{ | |
}; | ||
use crate::rpc::cloud_provider::AwsEc2Client; | ||
use crate::rpc::election::etcd::EtcdElectionClient; | ||
use crate::rpc::election::sql::{ | ||
MySqlDriver, PostgresDriver, SqlBackendElectionClient, SqliteDriver, | ||
}; | ||
use crate::rpc::election::sql::{MySqlDriver, PostgresDriver, SqlBackendElectionClient}; | ||
use crate::rpc::metrics::{ | ||
start_fragment_info_monitor, start_worker_info_monitor, GLOBAL_META_METRICS, | ||
}; | ||
|
@@ -223,9 +221,7 @@ pub async fn rpc_serve( | |
let id = address_info.advertise_addr.clone(); | ||
let conn = meta_store_sql.conn.clone(); | ||
let election_client: ElectionClientRef = match conn.get_database_backend() { | ||
DbBackend::Sqlite => { | ||
Arc::new(SqlBackendElectionClient::new(id, SqliteDriver::new(conn))) | ||
} | ||
DbBackend::Sqlite => Arc::new(DummyElectionClient::new(id)), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Disable election for sqlite metastore:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if multiple meta-nodes are instantiated with sqlite backend, how do we decide which one becomes the leader? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't think this is a reasonable use case... 🤔 Given that SQLite database should not be shared with NFS (or similar), it should not be used for high-availability purposes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we just disallow multiple meta when it's SQLite..? (I'm fine leaving it as UB..) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ban multiple meta sounds good. UB for now. |
||
DbBackend::Postgres => { | ||
Arc::new(SqlBackendElectionClient::new(id, PostgresDriver::new(conn))) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change make it more or less verbose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aiming to decrease the test duration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀 how's log level related with test duration?