From 052e5159722f5af7fbb62ba6773a1c6bf45ec4ad Mon Sep 17 00:00:00 2001 From: Paul Higinbotham Date: Tue, 18 Jun 2019 15:52:05 -0700 Subject: [PATCH 1/8] Submit draft of RFC for ForEach-Object -Parallel proposal --- 1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md | 99 ++++++++++++++++++++++ 1 file changed, 99 insertions(+) create mode 100644 1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md diff --git a/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md b/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md new file mode 100644 index 00000000..d4dcca16 --- /dev/null +++ b/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md @@ -0,0 +1,99 @@ +--- +RFC: RFCnnnn +Author: Paul Higinbotham +Status: Draft +SupercededBy: N/A +Version: 1.0 +Area: Engine +Comments Due: July 18, 2019 +Plan to implement: Yes +--- + +# PowerShell ForEach-Object -Parallel Cmdlet + +This RFC proposes a new parameter set for the existing ForEach-Object cmdlet to parallelize script block executions, instead of running them sequentially as it does now. + +## Motivation + + As a PowerShell User, + I can do simple fan-out concurrency with the PowerShell ForEach-Object cmdlet, without having to obtain and load a separate module, or deal with PowerShell jobs unless I want to. + +## Specification + +There will be two new parameter sets added to the existing ForeEach-Object cmdlet to support both synchronous and asynchronous operations for parallel script block execution. +For the synchronous case, the `ForEach-Object` cmdlet will not return until all parallel executions complete. +For the asynchronous case, the `ForEach-Object` cmdlet will immediately return a PowerShell job object that contains child jobs of each parallel execution. + +### Implementation details + +Implementation will be similar to the ThreadJob module. +Script block execution will be run for each piped input on a separate thread and runspace. +The number of threads that run at a time will be limited by a `-ThrottleLimit` parameter with a default value. +Piped input that exceeds the allowed number of threads will be queued until a thread is available. +For synchronous operation, a `-Timeout` parameter will be available that terminates the wait for completion after a specified time. +Without a `-Timeout` parameter, the cmdlet will wait indefinitely for completion. + +### Synchronous parameter set + +Synchronous ForEach-Object -Parallel returns after all script blocks complete running or timeout + +```powershell +ForEach-Object -Parallel -ThrottleLimit 10 -TimeoutSecs 1800 -ScriptBlock {} +``` + +- `-Parallel` : parameter switch specifies fan-out parallel script block execution + +- `-ThrottleLimit` : parameter takes an integer value that determines the maximum number threads + +- `-TimeoutSecs` : parameter takes an integer that specifies the maximum time to wait for completion in seconds + +### Asynchronous parameter set + +Asynchronous ForEach-Object -Parallel immediately returns a job object for monitoring parallel script block execution + +```powershell +ForEach-Object -Parallel -ThrottleLimit 5 -AsJob -ScriptBlock {} +``` + +- `-Parallel` : parameter switch specifies fan-out parallel script block execution + +- `-ThrottleLimit` : parameter takes an integer value that determines the maximum number threads + +- `-AsJob` : parameter switch returns a job object + +### Variable passing + +ForEach-Object -Parallel will support the PowerShell `$_` current piped item variable within each script block. +It will also support the `$using:` directive for passing variables from script scope into the parallel executed script block scope. + +### Examples + +```powershell +$computerNames = 'computer1','computer2','computer3','computer4','computer5' +$logs = $computerNames | ForEach-Object -Parallel -ThrottleLimit 10 -TimeoutSecs 1800 -ScriptBlock { + Get-Logs -ComputerName $_ +} +``` + +```powershell +$computerNames = 'computer1','computer2','computer3','computer4','computer5' +$job = ForEach-Object -Parallel -ThrottleLimit 10 -InputObject $computerNames -AsJob -ScriptBlock { + Get-Logs -ComputerName $_ +} +$logs = $job | Wait-Job | Receive-Job +``` + +```powershell +$computerNames = 'computer1','computer2','computer3','computer4','computer5' +$logNames = 'System','SQL' +$logs = ForEach-Object -Parallel -InputObject $computerNames -ScriptBlock { + Get-Logs -ComputerName $_ -LogNames $using:logNames +} +``` + +## Alternate Proposals and Considerations + +Another option (and a previous RFC proposal) is to resurrect the PowerShell Windows workflow script `foreach -parallel` keyword to be used in normal PowerShell script to perform parallel execution of foreach loop iterations. +However, the majority of the community felt it would be more useful to update the existing ForeEach-Object cmdlet with a -parallel parameter set. +We may want to eventually implement both solutions. +But the ForEach-Object -Parallel proposal in this RFC should be implemented first since it is currently the most popular. From b75a572fe97821fe554cd3377d00686b9d7c585e Mon Sep 17 00:00:00 2001 From: Paul Higinbotham Date: Mon, 1 Jul 2019 15:16:08 -0700 Subject: [PATCH 2/8] Updated to reflect feed back and explain narrow focus --- 1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md | 129 +++++++++++++++------ 1 file changed, 92 insertions(+), 37 deletions(-) diff --git a/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md b/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md index d4dcca16..a94144e0 100644 --- a/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md +++ b/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md @@ -16,68 +16,67 @@ This RFC proposes a new parameter set for the existing ForEach-Object cmdlet to ## Motivation As a PowerShell User, - I can do simple fan-out concurrency with the PowerShell ForEach-Object cmdlet, without having to obtain and load a separate module, or deal with PowerShell jobs unless I want to. + I can execute foreach-object piped input in script blocks running in parallel threads, either synchronously or asynchronously, while limiting the number of threads running at a given time. ## Specification -There will be two new parameter sets added to the existing ForeEach-Object cmdlet to support both synchronous and asynchronous operations for parallel script block execution. -For the synchronous case, the `ForEach-Object` cmdlet will not return until all parallel executions complete. -For the asynchronous case, the `ForEach-Object` cmdlet will immediately return a PowerShell job object that contains child jobs of each parallel execution. +A new `-Parallel` parameter set will be added to the existing ForEach-Object cmdlet that supports running piped input concurrently in a provided script block. -### Implementation details - -Implementation will be similar to the ThreadJob module. -Script block execution will be run for each piped input on a separate thread and runspace. -The number of threads that run at a time will be limited by a `-ThrottleLimit` parameter with a default value. -Piped input that exceeds the allowed number of threads will be queued until a thread is available. -For synchronous operation, a `-Timeout` parameter will be available that terminates the wait for completion after a specified time. -Without a `-Timeout` parameter, the cmdlet will wait indefinitely for completion. - -### Synchronous parameter set - -Synchronous ForEach-Object -Parallel returns after all script blocks complete running or timeout - -```powershell -ForEach-Object -Parallel -ThrottleLimit 10 -TimeoutSecs 1800 -ScriptBlock {} -``` - -- `-Parallel` : parameter switch specifies fan-out parallel script block execution +- `-Parallel` parameter switch specifies parallel script block execution -- `-ThrottleLimit` : parameter takes an integer value that determines the maximum number threads +- `-ScriptBlock` parameter takes a script block that is executed in parallel for each piped input variable -- `-TimeoutSecs` : parameter takes an integer that specifies the maximum time to wait for completion in seconds +- `-ThrottleLimit` parameter takes an integer value that determines the maximum number of script blocks running at the same time -### Asynchronous parameter set +- `-TimeoutSecs` parameter takes an integer that specifies the maximum time to wait for completion before the command is aborted -Asynchronous ForEach-Object -Parallel immediately returns a job object for monitoring parallel script block execution +- `-AsJob` parameter switch indicates that a job is returned, which represents the command running asynchronously -```powershell -ForEach-Object -Parallel -ThrottleLimit 5 -AsJob -ScriptBlock {} -``` +The 'ForEach-Object -Parallel' command will return only after all piped input have been processed. +Unless the '-AsJob' switch is used, in which case a job object is returned immediately that monitors the ongoing execution state and collects generated data. +The returned job object can be used with all PowerShell cmdlets that manipulate jobs. -- `-Parallel` : parameter switch specifies fan-out parallel script block execution +### Implementation details -- `-ThrottleLimit` : parameter takes an integer value that determines the maximum number threads +Implementation will be similar to the ThreadJob module in that thread script block execution will be contained within a PSThreadChildJob object. +The jobs will be run concurrently on separate runspaces/threads up to the ThrottleLimit value, and the remainder queued to wait for an available runspace/thread to run on. +Initial implementation will not attempt to reuse threads and runspaces when running queued items, due to concerns of stale state breaking script execution. +For example, PowerShell uses thread local storage to store per thread default runspaces. +And even though there is a runspace 'ResetRunspaceState' API method, it only resets session variables and debug/transaction managers. +Imported modules and function definitions are not affected. +A script that defines a constant function would fail if the function is already defined. +The initial assumption will be that runspace/thread creation time is insignificant compared to the time needed to execute the script block, either because of high compute needs or because of long wait times for results. +If this assumption is not true then the user should consider batching the work load to each foreach-object iteration, or simply use the sequential/non-parallel form of the cmdlet. -- `-AsJob` : parameter switch returns a job object +The 'TimeoutSecs' parameter will attempt to halt all script block executions after the timeout time has passed, however it may not be immediately successful if the running script is calling a native command or API, in which case it needs for the call to return before it can halt the running script. ### Variable passing ForEach-Object -Parallel will support the PowerShell `$_` current piped item variable within each script block. -It will also support the `$using:` directive for passing variables from script scope into the parallel executed script block scope. +It will also support the `$using:` directive for passing variables from script scope into the parallel executed script block scope. +If the passed in variable is a value type, a copy of the value is passed to the script block. +If the passed in variable is a reference type, the reference is passed and each running script block can modify it. +Since the script blocks are running in different threads, modifying a reference type that is not thread safe will result in undefined behavior. + +### Supported scenarios -### Examples +```powershell +# Ensure needed module is installed on local system +if (! (Get-Module -Name MyLogsModule -ListAvailable)) { + Install-Module -Name MyLogsModule -Force +} +``` ```powershell $computerNames = 'computer1','computer2','computer3','computer4','computer5' $logs = $computerNames | ForEach-Object -Parallel -ThrottleLimit 10 -TimeoutSecs 1800 -ScriptBlock { Get-Logs -ComputerName $_ -} +} -InitializationScript $initScript ``` ```powershell $computerNames = 'computer1','computer2','computer3','computer4','computer5' -$job = ForEach-Object -Parallel -ThrottleLimit 10 -InputObject $computerNames -AsJob -ScriptBlock { +$job = ForEach-Object -Parallel -ThrottleLimit 10 -InputObject $computerNames -TimeoutSecs 1800 -AsJob -ScriptBlock { Get-Logs -ComputerName $_ } $logs = $job | Wait-Job | Receive-Job @@ -91,9 +90,65 @@ $logs = ForEach-Object -Parallel -InputObject $computerNames -ScriptBlock { } ``` +```powershell +$computerNames = 'computer1','computer2','computer3','computer4','computer5' +$logNames = 'System','SQL','AD','IIS' +$logResults = ForEach-Object -Parallel -InputObject $computerNames -ScriptBlock { + Get-Logs -ComputerName $_ -LogNames $using:logNames +} | ForEach-Object -Parallel -ScriptBlock { + Process-Log $_ +} +``` + +### Unsupported scenarios + +```powershell +# Variables must be passed in via $using: keyword +$LogNameToUse = "IISLogs" +$computers | ForEach-Object -Parallel -ScriptBlock { + # This will fail because $LogName has not been defined in this scope + Get-Log -ComputerName $_ -LogName $LogNameToUse +} +``` + +```powershell +# Passed in reference variables should not be assigned to +$MyLogs = @() +$computers | ForEach-Object -Parallel -ScriptBlock { + # Not thread safe, undefined behavior + # Cannot assign to using variable + $using:MyLogs += Get-Logs -ComputerName $_ +} + +$dict = [System.Collections.Generic.Dictionary[string,object]]::New() +$computers | ForEach-Object -Parallel -ScriptBlock { + $dict = $using:dict + $logs = Get-Logs -ComputerName $_ + # Not thread safe, undefined behavior + $dict.Add($_, $logs) +} +``` + +```powershell +# Value types not passed by reference +$count = 0 +$computers | ForEach-Object -Parallel -ScriptBlock { + # Can't assign to using variable + $using:count += 1 + $logs = Get-Logs -ComputerName $_ + return @{ + ComputerName = $_ + Count = $count + Logs = $logs + } +} +``` + ## Alternate Proposals and Considerations Another option (and a previous RFC proposal) is to resurrect the PowerShell Windows workflow script `foreach -parallel` keyword to be used in normal PowerShell script to perform parallel execution of foreach loop iterations. However, the majority of the community felt it would be more useful to update the existing ForeEach-Object cmdlet with a -parallel parameter set. We may want to eventually implement both solutions. -But the ForEach-Object -Parallel proposal in this RFC should be implemented first since it is currently the most popular. + +There are currently other proposals to create a more general framework to support running arbitrary scripts and cmdlets in parallel, by marking them as able to support parallelism (see RFC #206). +That is outside the scope of this RFC, which focuses on extending just the ForEach-Object cmdlet to support parallel execution, and is intended to allow users to do parallel script/command execution without having to resort to PowerShell APIs. From fb0017be3e08b8f51fb8bc6653b2ea429ad12b22 Mon Sep 17 00:00:00 2001 From: Paul Higinbotham Date: Mon, 1 Jul 2019 15:46:03 -0700 Subject: [PATCH 3/8] Fixed two errors --- 1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md b/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md index a94144e0..c9044ecc 100644 --- a/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md +++ b/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md @@ -71,7 +71,7 @@ if (! (Get-Module -Name MyLogsModule -ListAvailable)) { $computerNames = 'computer1','computer2','computer3','computer4','computer5' $logs = $computerNames | ForEach-Object -Parallel -ThrottleLimit 10 -TimeoutSecs 1800 -ScriptBlock { Get-Logs -ComputerName $_ -} -InitializationScript $initScript +} ``` ```powershell @@ -106,7 +106,7 @@ $logResults = ForEach-Object -Parallel -InputObject $computerNames -ScriptBlock # Variables must be passed in via $using: keyword $LogNameToUse = "IISLogs" $computers | ForEach-Object -Parallel -ScriptBlock { - # This will fail because $LogName has not been defined in this scope + # This will fail because $LogNameToUse has not been defined in this scope Get-Log -ComputerName $_ -LogName $LogNameToUse } ``` From 6b78263435ebaf6325fdbdabbc11e1f932f739e6 Mon Sep 17 00:00:00 2001 From: Paul Higinbotham Date: Tue, 9 Jul 2019 14:00:26 -0700 Subject: [PATCH 4/8] Added more implementation details for clarity --- 1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md | 38 +++++++++++++++++++--- 1 file changed, 33 insertions(+), 5 deletions(-) diff --git a/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md b/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md index c9044ecc..ea335525 100644 --- a/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md +++ b/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md @@ -28,7 +28,7 @@ A new `-Parallel` parameter set will be added to the existing ForEach-Object cmd - `-ThrottleLimit` parameter takes an integer value that determines the maximum number of script blocks running at the same time -- `-TimeoutSecs` parameter takes an integer that specifies the maximum time to wait for completion before the command is aborted +- `-TimeoutSeconds` parameter takes an integer that specifies the maximum time to wait for completion before the command is aborted - `-AsJob` parameter switch indicates that a job is returned, which represents the command running asynchronously @@ -48,7 +48,7 @@ A script that defines a constant function would fail if the function is already The initial assumption will be that runspace/thread creation time is insignificant compared to the time needed to execute the script block, either because of high compute needs or because of long wait times for results. If this assumption is not true then the user should consider batching the work load to each foreach-object iteration, or simply use the sequential/non-parallel form of the cmdlet. -The 'TimeoutSecs' parameter will attempt to halt all script block executions after the timeout time has passed, however it may not be immediately successful if the running script is calling a native command or API, in which case it needs for the call to return before it can halt the running script. +The 'TimeoutSeconds' parameter will attempt to halt all script block executions after the timeout time has passed, however it may not be immediately successful if the running script is calling a native command or API, in which case it needs for the call to return before it can halt the running script. ### Variable passing @@ -56,7 +56,35 @@ ForEach-Object -Parallel will support the PowerShell `$_` current piped item var It will also support the `$using:` directive for passing variables from script scope into the parallel executed script block scope. If the passed in variable is a value type, a copy of the value is passed to the script block. If the passed in variable is a reference type, the reference is passed and each running script block can modify it. -Since the script blocks are running in different threads, modifying a reference type that is not thread safe will result in undefined behavior. +Since the script blocks are running in different threads, modifying a reference type that is not thread safe will result in undefined behavior. + +Script block variables will be special cased because they have runspace affinity. +Therefore script block variables will not be passed by reference and instead a new script block object instance will be created from the original script block variable Ast (abstract syntax tree). + +### Exceptions + +For critical exceptions, such as out of memory or stack overflow, the CLR will crash the process. +Since all parallel running script blocks run in different threads in the same process, all running script blocks will terminate, and queued script blocks will never run. +This is different from PowerShell jobs (Start-Job) where each job script runs in a separate child process, and therefore has better isolation to crashes. +The lack of process isolation is one of the costs of better performance while using threads for parallelization. + +For all other catchable exceptions, PowerShell will catch them from each thread and write them as non-terminating error records to the error data stream. +If the `ErrorAction` parameter is set to 'Stop' then cmdlet will attempt to stop the parallel execution on any error. + +### Stop behavior + +Whenever a timeout, a terminating error (-ErrorAction Stop), or a stop command (Ctrl+C) occurs, a stop signal will be sent to all running script blocks, and any queued script block iterations will be dequeued. +This does not guarantee that a running script will stop immediately, if that script is running a native command or making an API call. +So it is possible for a stop command to be ineffective if one running thread is busy or hung. + +We can consider including some kind of 'forcetimeout' parameter that would kill any threads that did not end in a specified time. + +If a job object is returned (-AsJob) the child jobs that were dequeued by the stop command will remain at 'NotStarted' state. + +### Data streams + +Warning, Error, Debug, Verbose data streams will be written to the cmdlet data streams as received from each running parallel script block. +Progress data streams will not be supported, but can be added later if desired. ### Supported scenarios @@ -69,14 +97,14 @@ if (! (Get-Module -Name MyLogsModule -ListAvailable)) { ```powershell $computerNames = 'computer1','computer2','computer3','computer4','computer5' -$logs = $computerNames | ForEach-Object -Parallel -ThrottleLimit 10 -TimeoutSecs 1800 -ScriptBlock { +$logs = $computerNames | ForEach-Object -Parallel -ThrottleLimit 10 -TimeoutSeconds 1800 -ScriptBlock { Get-Logs -ComputerName $_ } ``` ```powershell $computerNames = 'computer1','computer2','computer3','computer4','computer5' -$job = ForEach-Object -Parallel -ThrottleLimit 10 -InputObject $computerNames -TimeoutSecs 1800 -AsJob -ScriptBlock { +$job = ForEach-Object -Parallel -ThrottleLimit 10 -InputObject $computerNames -TimeoutSeconds 1800 -AsJob -ScriptBlock { Get-Logs -ComputerName $_ } $logs = $job | Wait-Job | Receive-Job From c51c9604b4cd0fc4ff53888d702dd9dddcdeedd7 Mon Sep 17 00:00:00 2001 From: Paul Higinbotham Date: Mon, 5 Aug 2019 12:13:55 -0700 Subject: [PATCH 5/8] Updated to reflect new parameter set --- 1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md b/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md index ea335525..c2ac89ce 100644 --- a/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md +++ b/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md @@ -22,9 +22,7 @@ This RFC proposes a new parameter set for the existing ForEach-Object cmdlet to A new `-Parallel` parameter set will be added to the existing ForEach-Object cmdlet that supports running piped input concurrently in a provided script block. -- `-Parallel` parameter switch specifies parallel script block execution - -- `-ScriptBlock` parameter takes a script block that is executed in parallel for each piped input variable +- `-Parallel` parameter takes a script block that is executed in parallel for each piped input variable - `-ThrottleLimit` parameter takes an integer value that determines the maximum number of script blocks running at the same time @@ -79,7 +77,7 @@ So it is possible for a stop command to be ineffective if one running thread is We can consider including some kind of 'forcetimeout' parameter that would kill any threads that did not end in a specified time. -If a job object is returned (-AsJob) the child jobs that were dequeued by the stop command will remain at 'NotStarted' state. +If a job object is returned (-AsJob) the child jobs that were dequeued by the stop command will be at 'NotStarted' state. ### Data streams @@ -97,14 +95,14 @@ if (! (Get-Module -Name MyLogsModule -ListAvailable)) { ```powershell $computerNames = 'computer1','computer2','computer3','computer4','computer5' -$logs = $computerNames | ForEach-Object -Parallel -ThrottleLimit 10 -TimeoutSeconds 1800 -ScriptBlock { +$logs = $computerNames | ForEach-Object -ThrottleLimit 10 -TimeoutSeconds 1800 -Parallel { Get-Logs -ComputerName $_ } ``` ```powershell $computerNames = 'computer1','computer2','computer3','computer4','computer5' -$job = ForEach-Object -Parallel -ThrottleLimit 10 -InputObject $computerNames -TimeoutSeconds 1800 -AsJob -ScriptBlock { +$job = ForEach-Object -ThrottleLimit 10 -InputObject $computerNames -TimeoutSeconds 1800 -AsJob -Parallel { Get-Logs -ComputerName $_ } $logs = $job | Wait-Job | Receive-Job @@ -113,7 +111,7 @@ $logs = $job | Wait-Job | Receive-Job ```powershell $computerNames = 'computer1','computer2','computer3','computer4','computer5' $logNames = 'System','SQL' -$logs = ForEach-Object -Parallel -InputObject $computerNames -ScriptBlock { +$logs = ForEach-Object -InputObject $computerNames -Parallel { Get-Logs -ComputerName $_ -LogNames $using:logNames } ``` @@ -121,7 +119,7 @@ $logs = ForEach-Object -Parallel -InputObject $computerNames -ScriptBlock { ```powershell $computerNames = 'computer1','computer2','computer3','computer4','computer5' $logNames = 'System','SQL','AD','IIS' -$logResults = ForEach-Object -Parallel -InputObject $computerNames -ScriptBlock { +$logResults = ForEach-Object -InputObject $computerNames -Parallel { Get-Logs -ComputerName $_ -LogNames $using:logNames } | ForEach-Object -Parallel -ScriptBlock { Process-Log $_ @@ -133,7 +131,7 @@ $logResults = ForEach-Object -Parallel -InputObject $computerNames -ScriptBlock ```powershell # Variables must be passed in via $using: keyword $LogNameToUse = "IISLogs" -$computers | ForEach-Object -Parallel -ScriptBlock { +$computers | ForEach-Object -Parallel { # This will fail because $LogNameToUse has not been defined in this scope Get-Log -ComputerName $_ -LogName $LogNameToUse } @@ -142,14 +140,14 @@ $computers | ForEach-Object -Parallel -ScriptBlock { ```powershell # Passed in reference variables should not be assigned to $MyLogs = @() -$computers | ForEach-Object -Parallel -ScriptBlock { +$computers | ForEach-Object -Parallel { # Not thread safe, undefined behavior # Cannot assign to using variable $using:MyLogs += Get-Logs -ComputerName $_ } $dict = [System.Collections.Generic.Dictionary[string,object]]::New() -$computers | ForEach-Object -Parallel -ScriptBlock { +$computers | ForEach-Object -Parallel { $dict = $using:dict $logs = Get-Logs -ComputerName $_ # Not thread safe, undefined behavior @@ -160,7 +158,7 @@ $computers | ForEach-Object -Parallel -ScriptBlock { ```powershell # Value types not passed by reference $count = 0 -$computers | ForEach-Object -Parallel -ScriptBlock { +$computers | ForEach-Object -Parallel { # Can't assign to using variable $using:count += 1 $logs = Get-Logs -ComputerName $_ From 5592e87f3ecc522b9a1ee7b3d6b6578f506a93f9 Mon Sep 17 00:00:00 2001 From: Paul Higinbotham Date: Mon, 19 Aug 2019 13:56:43 -0700 Subject: [PATCH 6/8] Update to clarify and reflect current implementation --- 1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md b/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md index c2ac89ce..76c68d02 100644 --- a/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md +++ b/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md @@ -30,8 +30,8 @@ A new `-Parallel` parameter set will be added to the existing ForEach-Object cmd - `-AsJob` parameter switch indicates that a job is returned, which represents the command running asynchronously -The 'ForEach-Object -Parallel' command will return only after all piped input have been processed. -Unless the '-AsJob' switch is used, in which case a job object is returned immediately that monitors the ongoing execution state and collects generated data. +The `ForEach-Object -Parallel` command will stream output to the console until all piped input has been processed. +If the `-AsJob` switch is used then a job object is returned and remains in the running state while input is being processed. The returned job object can be used with all PowerShell cmdlets that manipulate jobs. ### Implementation details @@ -56,8 +56,10 @@ If the passed in variable is a value type, a copy of the value is passed to the If the passed in variable is a reference type, the reference is passed and each running script block can modify it. Since the script blocks are running in different threads, modifying a reference type that is not thread safe will result in undefined behavior. -Script block variables will be special cased because they have runspace affinity. -Therefore script block variables will not be passed by reference and instead a new script block object instance will be created from the original script block variable Ast (abstract syntax tree). +ScriptBlock variables are a special case because they have runspace affinity, and cannot be safely passed to other runspace script blocks for parallel execution. +Consequently, an error will be generated if a ScriptBlock object is directly passed through the input pipeline, or if passed to the parallel script block via the `$using:` directive. +However, it is still possible to pass in a ScriptBlock object indirectly such as through an object method returning a ScriptBlock. +This is not recommended and will result in undefined behavior. ### Exceptions From f5cfefd51121f16751649168a82a67a84a3dc42c Mon Sep 17 00:00:00 2001 From: Paul Higinbotham Date: Wed, 21 Aug 2019 15:34:54 -0700 Subject: [PATCH 7/8] Fix examples to be correct. --- 1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md b/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md index 76c68d02..890a3e26 100644 --- a/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md +++ b/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md @@ -128,6 +128,15 @@ $logResults = ForEach-Object -InputObject $computerNames -Parallel { } ``` +```powershell +$threadSafeDictionary = [System.Collections.Concurrent.ConcurrentDictionary[string,object]]::new() +Get-Process | ForEach-Object -Parallel { + # This works because the passed in object is a concurrent dictionary that is thread safe + $dict = $using:threadSafeDictionary + $dict.TryAdd($_.ProcessName, $_) +} +``` + ### Unsupported scenarios ```powershell @@ -143,10 +152,15 @@ $computers | ForEach-Object -Parallel { # Passed in reference variables should not be assigned to $MyLogs = @() $computers | ForEach-Object -Parallel { - # Not thread safe, undefined behavior - # Cannot assign to using variable + # Throws error, cannot assign to using variable $using:MyLogs += Get-Logs -ComputerName $_ } +At line:3 char:5 ++ $using:MyLogs += Get-Logs -ComputerName $_ ++ ~~~~~~~~~~~~~ +The assignment expression is not valid. The input to an assignment operator must be an object that is able to accept assignments, such as a variable or a property. ++ CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException ++ FullyQualifiedErrorId : InvalidLeftHandSide $dict = [System.Collections.Generic.Dictionary[string,object]]::New() $computers | ForEach-Object -Parallel { From 4596ac4d6e56fabb0dbe153190bafea2de94c373 Mon Sep 17 00:00:00 2001 From: Joey Aiello Date: Mon, 9 Sep 2019 12:58:13 -0700 Subject: [PATCH 8/8] Accept RFC0044 on ForEach-Object -Parallel --- .../RFC0044-ForEach-Parallel-Cmdlet.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) rename 1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md => 4-Experimental-Accepted/RFC0044-ForEach-Parallel-Cmdlet.md (97%) diff --git a/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md b/4-Experimental-Accepted/RFC0044-ForEach-Parallel-Cmdlet.md similarity index 97% rename from 1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md rename to 4-Experimental-Accepted/RFC0044-ForEach-Parallel-Cmdlet.md index 890a3e26..b7121fd4 100644 --- a/1-Draft/RFCnnnn-ForEach-Parallel-Cmdlet.md +++ b/4-Experimental-Accepted/RFC0044-ForEach-Parallel-Cmdlet.md @@ -1,7 +1,7 @@ --- -RFC: RFCnnnn +RFC: RFC0044 Author: Paul Higinbotham -Status: Draft +Status: Experimental-Accepted SupercededBy: N/A Version: 1.0 Area: Engine