-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python]Enable state cache to 100 MB #28781
Merged
Merged
Changes from 2 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
b1ab7a3
change state_cache_mb to 100 MB as default
AnandInguva dbea4fd
Address comments
AnandInguva 9a4572c
Reword help of pipeline option
AnandInguva dae15e3
Reword help of pipeline option
AnandInguva 5e0464c
Merge remote-tracking branch 'origin/master' into enable_state_cache
AnandInguva b91dd61
Merge remote-tracking branch 'AnandInguva/enable_state_cache' into en…
AnandInguva 3995385
Fix doc string
AnandInguva 2ada878
reword docstring
AnandInguva b84f53b
Set default in the pipeline options
AnandInguva 2518fa7
Reword documentation
AnandInguva 62ed478
Add warning
AnandInguva 3f7ab0d
Address comments
AnandInguva aa94309
Update CHANGES.md
AnandInguva File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,6 +36,7 @@ | |
from apache_beam.options.pipeline_options import PipelineOptions | ||
from apache_beam.options.pipeline_options import ProfilingOptions | ||
from apache_beam.options.pipeline_options import SetupOptions | ||
from apache_beam.options.pipeline_options import WorkerOptions | ||
from apache_beam.options.value_provider import RuntimeValueProvider | ||
from apache_beam.portability.api import endpoints_pb2 | ||
from apache_beam.runners.internal import names | ||
|
@@ -46,6 +47,7 @@ | |
|
||
_LOGGER = logging.getLogger(__name__) | ||
_ENABLE_GOOGLE_CLOUD_PROFILER = 'enable_google_cloud_profiler' | ||
_STATE_CACHE_SIZE_BYTES = 100 << 20 | ||
|
||
|
||
def _import_beam_plugins(plugins): | ||
|
@@ -159,7 +161,8 @@ def create_harness(environment, dry_run=False): | |
control_address=control_service_descriptor.url, | ||
status_address=status_service_descriptor.url, | ||
worker_id=_worker_id, | ||
state_cache_size=_get_state_cache_size(experiments), | ||
state_cache_size=_get_state_cache_size_bytes( | ||
options=sdk_pipeline_options), | ||
data_buffer_time_limit_ms=_get_data_buffer_time_limit_ms(experiments), | ||
profiler_factory=profiler.Profile.factory_from_options( | ||
sdk_pipeline_options.view_as(ProfilingOptions)), | ||
|
@@ -239,24 +242,26 @@ def _parse_pipeline_options(options_json): | |
return PipelineOptions.from_dictionary(_load_pipeline_options(options_json)) | ||
|
||
|
||
def _get_state_cache_size(experiments): | ||
"""Defines the upper number of state items to cache. | ||
|
||
Note: state_cache_size is an experimental flag and might not be available in | ||
future releases. | ||
def _get_state_cache_size_bytes(options): | ||
"""Return the maximun size of state cache in bytes. | ||
AnandInguva marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Returns: | ||
an int indicating the maximum number of megabytes to cache. | ||
Default is 0 MB | ||
""" | ||
|
||
for experiment in experiments: | ||
# There should only be 1 match so returning from the loop | ||
if re.match(r'state_cache_size=', experiment): | ||
return int( | ||
re.match(r'state_cache_size=(?P<state_cache_size>.*)', | ||
experiment).group('state_cache_size')) << 20 | ||
return 0 | ||
max_cache_memory_usage_mb = options.view_as( | ||
WorkerOptions).max_cache_memory_usage_mb | ||
if not max_cache_memory_usage_mb: | ||
# to maintain backward compatibility | ||
experiments = options.view_as(DebugOptions).experiments or [] | ||
for experiment in experiments: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pretty sure there is already a helper that does this parsing. |
||
# There should only be 1 match so returning from the loop | ||
if re.match(r'state_cache_size=', experiment): | ||
return int( | ||
re.match(r'state_cache_size=(?P<state_cache_size>.*)', | ||
experiment).group('state_cache_size')) << 20 | ||
return _STATE_CACHE_SIZE_BYTES | ||
return max_cache_memory_usage_mb << 20 | ||
|
||
|
||
def _get_data_buffer_time_limit_ms(experiments): | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any concerns to define the 100mb default here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current flow: If it is None here, it gives us an opportunity to look in
--experiements
forstate_cache_size
.If the value is defined here as 100 MB, and if the user passes
--experiments=state_cache_size
, we should override 100 MB for the--experiments=state_cache_size
.I don't see any concerns of setting default here. might need to change some code though