PowerShell ForEach-Object elimination

Question

Let's consider a collection of collections, and an operation that needs to be performed inside a pipeline on each element of the inner collection.

For the sake of simplicity, let it be an array of arrays, and the operation is simply printing to screen. For my question to be represented, let us also have an array whose elements are not collections:

$Array = "A", "B", "C"
$ArrayOfArrays = (1, 2, 3), (4, 5, 6), (7, 8, 9)

We know that piping will break a collection down to elements, like this:

$Array | & {process {Write-Host $_}}
$ArrayOfArrays | & {process {Write-Host $_}}

Now, to my surprise, when I run this, it is not breaking down the inner array to its elements:

$ArrayOfArrays | % -process {Write-Host $_} (1)

neither this:

$ArrayOfArrays | % -process {% -process {Write-Host $_}} (2)

(however this latter might seem an unnecessary attempt, seeing that (1) does not do that, but I tried it...)
I expected try (1) to do that, because I thought that piping does one breakdown, and when an element is received by ForEach-Object, it will further break it down, if it is a collection.

I could only solve it with inner piping:

$ArrayOfArrays | % -process {$_ | % -process {Write-Host $_}} (3)

however with this approach I can eliminate the ForEach-Object, of course:

$ArrayOfArrays | & {process {$_ | & {process {Write-Host $_}}}} (4)

So my 2 questions are:

1,

How to access an element of a collection that is in the collection in a pipeline, other than tries (3) and (4), or is this the only way to do that?

2,

If the only way to do what question 1 is asking is tries (3) and (4), then what is a valid use case of ForEach-Object, where it can not be eliminated? I mean it can be a logical case, but also performance vs a script block. The fact that it is nicer than a script block with one pair of braces less is just not really enough for me...

.
EDIT after Manuel Batsching's answer:

As the ForEach-Object returns a collection's elements after its processing, we can do this (I let go of Write-Host, maybe it wasn't a good arbitrary operation, so let it be GetType):

$ArrayOfArrays | % -process {$_} | & {process {$_.GetType()}}

But we also know that if something returns a new object in the pipeline, it will trigger a breakdown if it is further piped and if it is a collection. So to do the breakdown, we can again eliminate ForEach-Object and do this:

$ArrayOfArrays | & {process {$_}} | & {process {$_.GetType()}}

And this dummy operation can be syntactically reduced if I define a filter like this:

Filter §
{
    param (
            [Parameter (Mandatory = $True, ValueFromPipeline = $True)]
            [Object]
            $ToBeTriggeredForBreakDown
    ) # end param

    $ToBeTriggeredForBreakDown

}

and use it like this:

$Array | § | & {process {$_.GetType()}}
$ArrayOfArrays | § | & {process {$_.GetType()}}

$ArrayOfArraysOfArrays = ((1, 2), (3, 4)), ((5, 6), (7, 8))
$ArrayOfArraysOfArrays | § | & {process {$_.GetType()}}
$ArrayOfArraysOfArrays | § | § | & {process {$_.GetType()}}

So it is still hard to see for me when I would use ForEach-Object, it seems to me it is completely useless - except for reasons I look for in my questions.

.
EDIT after research:

Some collections provide their own methods, e.g. since v4 arrays have a ForEach method, so besides (3) and (4), one can do this (again a dummy operation, but with less code):

$ArrayOfArrays.ForEach{$_} | & {process {$_.GetType()}}

so this partially covers question 1.

In PowerShell 7, Foreach-Object has the -Parallel switch for parallel execution. The -Process parameter takes an array of script blocks. So you could technically perform different processing scripts against each piped object. Foreach-Object also supports operation statements. Technically you don't have to do anything, but 1,2,3 | Foreach ToString is arguably more readable than 1,2,3 | & { process { $_.ToString() }}. — AdminOfThings
– AdminOfThings, Commented Feb 27, 2020 at 19:42
Foreach-Object also has the -InputObject parameter where you can process the entire object as one item. That is its way of preventing the array unwrapping that you see in the pipeline. You can do that with your method, but you must do obscure array wrapping yourself like ,@(1,2,3) before sending down the pipeline. — AdminOfThings
– AdminOfThings, Commented Feb 27, 2020 at 19:47
Since Foreach-Object is a cmdlet, you gain access to common parameters. So you can utilize -PipelineVariable for example to use output from this command to a command in a deeper pipeline. — AdminOfThings
– AdminOfThings, Commented Feb 27, 2020 at 20:00
My test cases show that the data | & { process {}} method is faster than data | foreach-object -process {}. So it appears to be a tradeoff as to what you want to get out of it. — AdminOfThings
– AdminOfThings, Commented Feb 27, 2020 at 20:24
@AdminOfThings Many thanks for your comments. The access to common parameters and the parallelization feature both validate its existence (however I am not using v7 yet). I new about -InputObject, but if I want to hold the array together, I would just not feed it to a pipeline - of course, when it is a result of a cmdlet already in the pipeline, it is a different story. You could post these as an answer, and if there is no better - and I think you probably covered the most important things - I will accept it. — Dávid Laczkó
– Dávid Laczkó, Commented Feb 27, 2020 at 21:53

Manuel Batsching · Accepted Answer · 2020-02-19 12:02:21Z

In my understanding, the unwrapping of arrays is done, once they are passed down the pipeline or to the output stream.

You will see the this behaviour with all of the following approaches:

$ArrayOfArrays | % -process { $_ }
$ArrayOfArrays | & { process { $_ } }
foreach ($arr in $ArrayOfArrays) { $arr }

Now what ruins the unwrapping in your example is the Write-Host cmdlet. As this cmdlet is writing not to the output stream but to your console, it casts the input object to [string]. That is why you see a string represenation of the inner arrays on your console.

Replace Write-Host with Write-Output and the inner arrays will be properly unwrapped:

 PS> $ArrayOfArrays | % -process { Write-Output $_ }
1
2
3
4
5
6
7
8
9

EDIT:

You can use a debugger to determine exactly, where the unwrapping is done. Use for example the following code in VSCode:

$ArrayOfArrays = (1, 2, 3), (4, 5, 6), (7, 8, 9)
$foo = $null
$foo = $ArrayOfArrays | % { Write-Output $_ }

Set a breakpoint to the line $foo = $null, add the variables $foo and $_ to the watchlist, hit F5 to start the debugger and watch the variables change, while you hit F11 to step into the individual processing steps.

$_ will show the inner array which is the current element in the pipeline.
$foo will receive only the unwrapped elements after the pipeline execution ends

Thanks - you have almost convinced me! :) Almost, because it is true that you say about Write-Host vs Write-Output. But when I do $ArrayOfArrays | & { process { $_.GetType() } } and $ArrayOfArrays | % -process { $_.GetType() }, I get System.Array by both. So when "the unwrapping of arrays is done" is the case is still a mytery to me - or would mean a test of each cmdlet/method to me, till I grab it...
As I said, the unwrapping magic happens, once the inner arrays are passed on further down the pipeline or to the output stream. You can use a debugger to see the moment when that happens. I edited my answer accordingly.
Thanks, that makes it clearer - this is testable also like this: $ArrayOfArrays | % -process {$_} | & {process {Write-Host $_ "delimiter"}} - however, also only with a dummy step.

Dávid Laczkó · Accepted Answer · 2020-03-04 14:34:40Z

In PowerShell 7, Foreach-Object has the -Parallel switch for parallel execution. This is not necessarily fast for all types of processing. You will have to experiment with this.

Foreach-Object's -Process parameter takes an array of script blocks. So you could technically perform different processing scripts against each piped object.

1,2,3 | Foreach-Object -begin {"First loop iteration"} -process {$_ + 1},{$_ + 2},{$_ + 3} -End {"Last loop iteration"}
First loop iteration
2
3
4
3
4
5
4
5
6
Last loop iteration

# Example of already having script blocks defined
$sb1,$sb2,$sb3 = { $_ + 1 },{$_ + 2},{$_ + 3}
1,2,3 | Foreach-Object -begin {"Starting the loop"} -process $sb1,$sb2,$sb3 -end {"the loop finished"}
Starting the loop
2
3
4
3
4
5
4
5
6
the loop finished

Foreach-Object also supports operation statements. Technically you don't have to do anything, but 1,2,3 | Foreach ToString is arguably more readable than 1,2,3 | & { process { $_.ToString() }}.

Foreach-Object also has the -InputObject parameter where you can process the entire object as one item. That is its way of preventing the array unwrapping that you see in the pipeline. You can do that with your method, but you must do obscure array wrapping yourself like ,@(1,2,3) before sending down the pipeline.

# Single pipeline object
$count = 1
ForEach-Object -InputObject 1,2,3 -Process {"Iteration Number: $count"; $_; $count++}
Iteration Number: 1
1
2
3

# array unwrapping down pipeline

$count = 1
1,2,3 | ForEach-Object -Process {"Iteration Number: $count"; $_; $count++}
Iteration Number: 1
1
Iteration Number: 2
2
Iteration Number: 3
3

Since Foreach-Object is a cmdlet, you gain access to Common Parameters. So you can utilize -PipelineVariable for example to use output from this command to a command in a deeper pipeline.

# Using OutVariable
1,2,3 | Foreach-Object {$_ + 100} -OutVariable numbers |
    Foreach-Object -process { "Current Number: $_"; "Numbers Processed So Far: $numbers" }
Current Number: 101
Numbers Processed So Far: 101
Current Number: 102
Numbers Processed So Far: 101 102
Current Number: 103
Numbers Processed So Far: 101 102 103

# Using PipeLineVariable
1,2,3 | Foreach-Object {$_ + 100} -PipeLineVariable first |
    Foreach-Object {$_ * 2} -PipelineVariablesecond |
        Foreach-Object {"First number is $first"; "second number is $second"; "final calculation is $($_*3)" }
First number is 101
second number is 202
final calculation is 606
First number is 102
second number is 204
final calculation is 612
First number is 103
second number is 206
final calculation is 618

My test cases show that the data | & { process {}} method is faster than data | foreach-object -process {}. So it appears to be a tradeoff as to what you want to get out of it.

Measure-Command {1..100000 | & { process {$_}}}


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 107
Ticks             : 1074665
TotalDays         : 1.24382523148148E-06
TotalHours        : 2.98518055555556E-05
TotalMinutes      : 0.00179110833333333
TotalSeconds      : 0.1074665
TotalMilliseconds : 107.4665


Measure-Command {1..100000 | Foreach-Object {$_}}


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 768
Ticks             : 7686545
TotalDays         : 8.89646412037037E-06
TotalHours        : 0.000213515138888889
TotalMinutes      : 0.0128109083333333
TotalSeconds      : 0.7686545
TotalMilliseconds : 768.6545

When running Foreach-Object, all code is run in the current caller's scope including the contents of the script block. & runs code in a child scope and anything changed in that scope may not be reflected when returning to the parent scope (calling scope). You will need to use . to call in the current scope.

# Notice $a outputs nothing outside of the loop
PS > 1,2,3 | & { begin {$a = 100} process { $_ } end {$a}}
1
2
3
100
PS > $a

PS >

# Notice with . $a is updated
PS > 1,2,3 | . { begin {$a = 100} process { $_ } end {$a}}
1
2
3
100
PS > $a
100
PS >

# foreach updates current scope (used a different variable, because  
# $a was already added by the previous command)
PS > 1,2,3 | foreach-object -begin {$b = 333} -process {$_} -end {$b}
1
2
3
333
PS > $b
333
PS >

I accepted it with some conditions in my mind. I mean this is very informative, on the other hand there are still cases where the ForEach-Object can not stand its place: e.g. 1,2,3 | Foreach-Object -begin {"First loop iteration"} -process {$_ + 1},{$_ + 2},{$_ + 3} -End {"Last loop iteration"} can be achieved with a script block, as it can run multiple commands, so the end result is the same with this: 1,2,3 | & {begin {"First loop iteration"} process {$_ + 1; $_ + 2; $_ + 3} End {"Last loop iteration"}}.
For that matter, keeping ForEach-Object but using only one script block results the same: 1,2,3 | Foreach-Object -begin {"First loop iteration"} -process {$_ + 1; $_+ 2; $_ + 3} -End {"Last loop iteration"}. However it is worth the investigation when it is better to have multiple blocks, e.g. they are strored in variables or they are functions/filters on their own. But that is another story.
Yes, you can use the one script block. However, you can be given multiple script blocks and pass them directly into Foreach-Object. Without, you are forced to use more obscure & scenarios or using the unsecure Invoke-Expression. I added an example of that.
@DávidLaczkó, I thought about another difference. When you use &, everything runs in a child scope as opposed to using . which runs in the current scope. foreach-object runs in the current scope. I added an example of this.
You can do it this way too: foreach { 'begin' } { 'process' } { 'end' } -remainingscripts will take up the extra ones.

Collectives™ on Stack Overflow

PowerShell ForEach-Object elimination

2 Answers 2

3 Comments

5 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

5 Comments

Related