4

I'm trying to sort a large CSV file. The issue is with using sort-oject on a coloum called Combined. It doesn't sort the way I'd expect it to! For testing, I have removed all of the data that isn't being sorted and also reduced the ammount of lines.

Orginal CSV:

Combined
1A THE BIG
7
3A SPRING
19
LUZREN
21
23
25
29
1 HONEY
5
3 THE GOOD
11
ARVALA
BRASAID
13
MEADOWCLAW

Powershell:

Import-Csv orginal.csv -delimiter ',' | Sort-Object -Unique Combined 

Combined
--------
1 HONEY
11
13
19
1A THE BIG
21
23
25
29
3 THE GOOD
3A SPRING
5
7
ARVALA
BRASAID
LUZREN
MEADOWCLAW

What I expected was this:

Combined
1A THE BIG
5
7
11
13
19
21
23
25
29
1 HONEY
3 THE GOOD
3A SPRING
ARVALA
BRASAID
LUZREN
MEADOWCLAW

In a perfect world I'd like this:

Combined
1A THE BIG
1 HONEY
3 THE GOOD
3A SPRING
5
7
11
13
19
21
23
25
29
ARVALA
BRASAID
LUZREN
MEADOWCLAW

Please can someone explain to me why this is happening I've banging my head againt my monitor for a week now. Also, how could I get the output that I'd like?

Thanks in adance!

2
  • Default sort is symbol by symbol. How come '1A THE BIG' should go before '1 HONEY'? Commented Mar 1, 2013 at 13:12
  • It's sorting them like strings, which is how it should work. If you treat each number like a letter that comes before A it makes perfect sense as an alphabetical sort. Commented Mar 1, 2013 at 13:14

3 Answers 3

4

Not sure I understand the sorting logic, but try this:

$StartsWithNumber = { if ($_.Combined -match '^\d*\D' ) { $_.Combined } }
$IsNumber = { if ($_.Combined -match '^\d*$') { [int]$_.Combined } }
$OnlyLetters = { if ($_.Combined -imatch '^[a-z ]*$') { $_.Combined } }

Import-Csv original.csv | 
Sort-Object $OnlyLetters, $IsNumber, $StartsWithNumber -Unique

Combined
--------
1 HONEY
1A THE BIG
3 THE GOOD
3A SPRING
5
7
11
13
19
21
23
25
29
ARVALA
BRASAID
LUZREN
MEADOWCLAW
Sign up to request clarification or add additional context in comments.

2 Comments

+1 I think regex in $startsWithNumer is better like this '^\d*\D' in case some value start with more that one single digit. Right?
Perfect! My perfect world output was wrong (my bad). Your solution outputs what I need. Thanks!
1

Try this: (not in the perfect world )

Import-Csv original.csv -delimiter ',' | 
Sort-Object { [int]([regex]::Replace( $_.combined , "\D" , "" )) } ,  `
{ [regex]::Replace( $_.combined , "\d" , "" ) } -unique

Combined
--------
ARVALA
BRASAID
LUZREN
MEADOWCLAW
1 HONEY
1A THE BIG
3 THE GOOD
3A SPRING
5
7
11
13
19
21
23
25
29

Following the nice answer of @Aryadev this code sort also value starting with more than one digit:

$allToNumbers = { [int]([regex]::Replace( $_.combined , "\D" , "" )) }
$StartsWithNumber = { if ($_.Combined -match '^\d+\D') { $_.Combined } }
$IsNumber = { if ($_.Combined -match '^\d*$') { [int]$_.Combined } }
$OnlyLetters = { if ($_.Combined -imatch '^[a-z ]*$') { $_.Combined } }

Import-Csv original.csv | 
Sort-Object  $OnlyLetters, $allnumbers, $IsNumber,  $StartsWithNumber -Unique

Comments

0

Import-CSV imports all values as string by default. That's why you get the result you see. It's sorted alphabetical. The "priority" in alphanumeric is: spaces, numbers, letters.

I don't think you can get the output you want without some serious code. My only suggestion is parsing the pure number to int so ex. a number 30 will come after 5. This can be done line this:

$intvalue = 10000
import-csv .\test.csv | % {
    if ([int]::TryParse($_.Combined, [ref]$intvalue)) {
        $_.Combined = $intvalue
    }
    $_
} | Sort-Object -Unique Combined

Combined
--------
1 HONEY
5
7
11
13
19
1A THE BIG
21
23
25
29
3 THE GOOD
3A SPRING
ARVALA
BRASAID
LUZREN 
MEADOWCLAW 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.