Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add single node execution #24172

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kewang1024
Copy link
Collaborator

@kewang1024 kewang1024 commented Nov 29, 2024

Description

== RELEASE NOTES ==

General Changes
* Add single worker execution. To improve latency of tiny queries running on a large cluster, we introduce single worker execution mode: query will only use one node to execute and plan would be optimized accordingly. This feature can be turned on by config `single-node-execution-enabled` or session property `single_node_execution_enabled`.:pr:`24172`

@kewang1024 kewang1024 force-pushed the native-single-worker branch 2 times, most recently from 0d9137e to d71d256 Compare December 2, 2024 08:45
@tdcmeehan
Copy link
Contributor

Can you help to explain why this feature is or has to be tied to native execution? Perhaps describe the background and motivation?

Copy link
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level comments

  1. Do we want to support single node execution on per-query basis?

This can be useful to improve latency of tiny queries running on a large cluster. For example a user may know that a query is small and may decide to run it on a multi node cluster in a single node mode.

If decided to support it is necessary to make sure the session property is used consistently through the code.

If decided not to support it for now I think the session property should be removed and a configuration property should only be used.

  1. Should the single node execution mode be native specific?

When running a Java cluster deployment with a dedicated coordinator and a dedicated worker (workers) additional exchanges at worker - coordinator boundary are necessary.

I'm thinking if a simpler mental model would be to always add coordinator-to-worker exchanges when single node execution is requested?

@@ -328,6 +328,7 @@ public final class SystemSessionProperties
// TODO: Native execution related session properties that are temporarily put here. They will be relocated in the future.
public static final String NATIVE_AGGREGATION_SPILL_ALL = "native_aggregation_spill_all";
private static final String NATIVE_EXECUTION_ENABLED = "native_execution_enabled";
private static final String NATIVE_SINGLE_WORKER_EXECUTION = "native_single_worker_execution";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we stay consistent and use node instead of worker (e.g.: query_max_memory_per_node, force_single_node_output, etc.). Also maybe add the _enabled suffix to make it sound more natural, e.g.: single_node_execution_enabled, isSingleNodeExecutionEnabled(...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried singleNodeExecutionEnabled, but then realized it would cause confusion with forceSingleNode

Node can either be worker or coordinator, but what we want is explicitly worker. So singleWorkerExecutionEnabled makes more sense

}

@Override
public PlanNode visitTableFinish(TableFinishNode node, RewriteContext<Void> context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@kewang1024 kewang1024 Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setCoordinatorOnlyDistribution currently works for ExplainAnalyze, TableFinish, MetadataDelete and StatisticsWriterNode

MetadataDelete we don't need to add exchange (looks like it would be a metadata operation: 02b1bf7)
I have added the exchange for the rest

@@ -813,7 +814,11 @@ public PlanOptimizers(
costCalculator,
ImmutableSet.of(new ScaledWriterRule())));

if (!forceSingleNode) {
if (featuresConfig.isNativeExecutionEnabled() && featuresConfig.isNativeSingleWorkerExecution()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a simpler mental model would be to always add worker to coordinator exchanges when single worker execution is enabled.

This way:

  1. single worker execution can be enabled for normal clusters with more than a single worker and a single coordinator (with schedule on coordinator disabled) on per query basis
  2. For Java execution if coordinator scheduling is enabled an extra exchange is not going to hurt

For example this condition can be kept as isSingleNodeExecutionEnabled(session) and we can call the AddExchangeForNativeSingleWorker as AddWorkerToCoordinatorExchanges

Copy link
Collaborator Author

@kewang1024 kewang1024 Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for some cases, it would be exchange from coordinator to worker? For example, scanning system table

Aggregation [Worker]
|
Exchange
|
TableScan (system table) [Coordinator]

}

@Override
public void testScaleWriters() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scale writers should have no effect in single node? Do we nee this test?

Copy link
Collaborator Author

@kewang1024 kewang1024 Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to use this test to see scaled-writer under single worker execution mode, it wouldn't scale to multiple worker tasks

@tdcmeehan tdcmeehan self-assigned this Dec 2, 2024
@kaikalur
Copy link
Contributor

kaikalur commented Dec 2, 2024

High level comments

  1. Do we want to support single node execution on per-query basis?

This can be useful to improve latency of tiny queries running on a large cluster. For example a user may know that a query is small and may decide to run it on a multi node cluster in a single node mode.

We can also potentially use HBO/CBO to decide to run some in single node mode

@kewang1024 kewang1024 force-pushed the native-single-worker branch 8 times, most recently from 15bf3b5 to 5aa836f Compare December 4, 2024 09:11
Copy link
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % nits and fixing test failures

@kewang1024 kewang1024 changed the title Add single worker execution mode for native execution Add single node execution mode Dec 5, 2024
@kewang1024 kewang1024 force-pushed the native-single-worker branch 5 times, most recently from 7685df7 to 3de09d3 Compare December 5, 2024 07:29
@kewang1024 kewang1024 changed the title Add single node execution mode Add single node execution Dec 5, 2024
@kewang1024 kewang1024 force-pushed the native-single-worker branch from 3de09d3 to 8aa6631 Compare December 5, 2024 07:34
@kewang1024 kewang1024 force-pushed the native-single-worker branch from 46d7ec3 to 1ee77a7 Compare December 5, 2024 10:02
@steveburnett
Copy link
Contributor

Thanks for the release note entry! Some formatting nits.

== RELEASE NOTES ==

General Changes
* Add single worker execution. To improve latency of tiny queries running on a large cluster, we introduce single worker execution mode: query will only use one node to execute and plan would be optimized accordingly. This feature can be turned on by the configuration property ``single-node-execution-enabled`` or session property ``single_node_execution_enabled``. :pr:`24172`

Also, consider adding documentation for the new configuration property and session property to either the Presto [Configuration, Session] Properties pages, or the Presto C++ pages, as appropriate.

arhimondr
arhimondr previously approved these changes Dec 5, 2024
Comment on lines +100 to +102
if (containsSystemTableScan(plan)) {
plan = gatheringExchange(idAllocator.getNextId(), REMOTE_STREAMING, plan);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these three lines of code the reason we've extended so many test cases to apply to Presto - single node - native? If so, I'm wondering if a few, targetted example based tests are more appropriate. I'm concerned the additional tests don't add value and will make our CI slower and more expensive.

Copy link
Collaborator Author

@kewang1024 kewang1024 Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only so, essentially under single-node mode, we will not use addExchange optimizer. Theoretically we should test many cases that originally have remote exchange (especially partitioned), would they still return correct result under single-node mode.

And also scheduling has changes accordingly, one of the test cases actually caught a issue for it.

I understand your concern, I remove some of tests (that I think could potentially be redundant in terms of exchange pattern)

@kewang1024 kewang1024 force-pushed the native-single-worker branch 3 times, most recently from 2024bac to 55ca461 Compare December 5, 2024 18:50
@kewang1024 kewang1024 requested a review from tdcmeehan December 5, 2024 18:50
@kewang1024 kewang1024 force-pushed the native-single-worker branch 2 times, most recently from ac7073b to 57fce4d Compare December 6, 2024 00:41
@kewang1024
Copy link
Collaborator Author

It won't let me rerun some sporadic flaky test, have to push to force rerun :(

@tdcmeehan
Copy link
Contributor

@kewang1024 this feature could be useful for lower latency deployments to support canned or bounded queries, where latency is expected to be low. Could you please introduce some documentation for it as @steveburnett requested?

@tdcmeehan
Copy link
Contributor

High level comments

  1. Do we want to support single node execution on per-query basis?

This can be useful to improve latency of tiny queries running on a large cluster. For example a user may know that a query is small and may decide to run it on a multi node cluster in a single node mode.

We can also potentially use HBO/CBO to decide to run some in single node mode

@kewang1024 /@kaikalur can you create an issue for this so we don't lose track of this suggestion?

I think we could also consider to toggle this feature via a resource group, similar to per-query-limits in resource groups. This might make the feature more convenient to toggle, and in a multinode cluster, this would more accurate value for the hard/soft concurrency limit (and perhaps make it safer to run in a multitenant deployment). I can create an issue for that, I don't think it needs to be added here.

@kewang1024
Copy link
Collaborator Author

@steveburnett the Presto C++ pages is not a good place since this is not limited to c++, I failed to find the [Configuration, Session] one you're referring to, can u give me a pointer

@steveburnett
Copy link
Contributor

@steveburnett the Presto C++ pages is not a good place since this is not limited to c++, I failed to find the [Configuration, Session] one you're referring to, can u give me a pointer

Of course! I was referring the to Presto Configuration Properties page
https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/admin/properties.rst

or the Presto Session Properties page
https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/admin/properties-session.rst

@kewang1024
Copy link
Collaborator Author

Thanks @steveburnett for the prompt response, updated the doc.
cc: @tdcmeehan

@kewang1024 kewang1024 force-pushed the native-single-worker branch from 040866e to 61d73da Compare December 9, 2024 18:46
@kewang1024 kewang1024 requested a review from tdcmeehan December 9, 2024 18:46
@kewang1024
Copy link
Collaborator Author

@tdcmeehan Updated accordingly, can you help take another look? Thanks!

Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, wondering what's preventing us from using FixedBucketNodeMap, since it seems that BucketNodeMap is aligned with this use case (only using a single node-bucket).

@Override
public boolean isDynamic()
{
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why is this true? Shouldn't this be false, since I think there is a single node and a single task?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only applicable for grouped execution. Grouped execution is currently not supported for single node mode. It can be supported if needed, but generally the idea is that only small queries should run single node, while grouped execution is generally applicable for very large queries.

@Override
public boolean hasInitialMap()
{
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be true? Since there's only one node being returned in getBucketToNode?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true. @kewang1024 I wonder if instead we can simply use a DynamicBucketNodeMap((split) -> 0, 1). Basically pretending there's only a single bucket for all splits to avoid a custom override?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, can make that change now

@kewang1024 kewang1024 force-pushed the native-single-worker branch 2 times, most recently from 8376f0c to 6ae5e35 Compare December 11, 2024 05:51
To improve performance for small queries which can be executed
within a single node, we introduce single worker execution mode:
query will only use one node to execute and plan would be optimized
accordingly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants